Abdullah Al Mamun
Distributed Systems
Distributed SystemsGoReliability

Backpressure Strategies in High-Traffic APIs

Without backpressure, a traffic spike doesn't just slow your system down — it takes it down. Here are the patterns that actually work.

AA

Abdullah Al Mamun

September 18, 2023 · 7 min read

What Backpressure Actually Means

Backpressure is the mechanism by which a system signals to its upstream that it's overwhelmed. Without it, a traffic spike causes unbounded queue growth, memory exhaustion, and eventually a crash.

The goal isn't to handle infinite traffic — it's to degrade gracefully and recover quickly.

Pattern 1: Token Bucket Rate Limiting

The token bucket algorithm allows bursts while enforcing an average rate. Tokens are added at a fixed rate. Each request consumes a token. If no tokens are available, the request is rejected.

limiter := rate.NewLimiter(rate.Limit(1000), 100) // 1000 req/s, burst of 100
if !limiter.Allow() {
  http.Error(w, "rate limit exceeded", http.StatusTooManyRequests)
  return
}

This is the right choice for external-facing APIs where you want to protect against individual clients overwhelming the system.

Pattern 2: Semaphore-Based Concurrency Limiting

Rate limiting controls requests per second. Concurrency limiting controls simultaneous in-flight requests. For CPU-bound or DB-bound operations, concurrency limiting is often more effective.

sem := make(chan struct{}, 100) // max 100 concurrent requests
sem <- struct{}{}
defer func() { <-sem }()

Pattern 3: Load Shedding

When the system is overloaded, it's better to reject some requests immediately than to queue them all and serve them slowly. A slow response is often worse than a fast rejection — it ties up client connections and downstream resources.

We implemented load shedding based on queue depth: if the worker queue exceeds a threshold, new requests get a 503 immediately.

Pattern 4: Circuit Breaker

For downstream dependencies, a circuit breaker prevents cascading failures. If a downstream service starts failing, the circuit opens and requests fail fast instead of waiting for timeouts.

The Combination That Works

In our ad serving system, we use all four in layers:

1. Token bucket at the edge (per-publisher rate limit) 2. Concurrency semaphore at the auction evaluator (protects Redis) 3. Load shedding at the impression logger (protects Kafka) 4. Circuit breaker on the DSP bid request (protects against slow DSPs)

Each layer handles a different failure mode. Together, they make the system resilient to traffic spikes, slow dependencies, and cascading failures.