Retries Need Backoff, Jitter, and a Clear Budget

Retries are one of the easiest resilience features to get wrong because they feel harmless at small scale.

A single service retrying a failed request is not a problem. Ten thousand instances retrying on the same schedule often is.

That is where retry logic turns from recovery mechanism into load amplifier.

The Naive Pattern

This is common and dangerous:

for (let attempt = 0; attempt < 3; attempt += 1) {
  try {
    return await callPaymentApi();
  } catch (error) {
    await sleep(1000);
  }
}

All callers fail together. All callers sleep for the same amount of time. All callers wake up together and hammer the dependency again.

What Better Retry Logic Includes

Good retry behavior usually combines:

exponential backoff
jitter
a maximum retry budget

For example:

function backoffMs(attempt: number) {
  const base = 250 * 2 ** attempt;
  const jitter = Math.random() * 0.3 * base;
  return Math.min(base + jitter, 5000);
}

The point is not mathematical elegance. The point is to stop thousands of clients from retrying in lockstep.

The Trade-Off

Retries only make sense for errors that are likely to be transient. They are usually wrong for:

validation errors
permanent authorization errors
requests that are unsafe to repeat without idempotency

That is why retry policy and idempotency policy belong together.

Retries Need Backoff, Jitter, and a Clear Budget_

The Naive Pattern

What Better Retry Logic Includes

The Trade-Off

Further Reading

Related Writing.

Financial APIs Need Idempotency Before They Need Fancy Retries

Node, Deno, and Bun Reflect Different Runtime Priorities