Retries are one of the easiest resilience features to get wrong because they feel harmless at small scale.
A single service retrying a failed request is not a problem. Ten thousand instances retrying on the same schedule often is.
That is where retry logic turns from recovery mechanism into load amplifier.
The Naive Pattern
This is common and dangerous:
for (let attempt = 0; attempt < 3; attempt += 1) {
try {
return await callPaymentApi();
} catch (error) {
await sleep(1000);
}
}
All callers fail together. All callers sleep for the same amount of time. All callers wake up together and hammer the dependency again.
What Better Retry Logic Includes
Good retry behavior usually combines:
- exponential backoff
- jitter
- a maximum retry budget
For example:
function backoffMs(attempt: number) {
const base = 250 * 2 ** attempt;
const jitter = Math.random() * 0.3 * base;
return Math.min(base + jitter, 5000);
}
The point is not mathematical elegance. The point is to stop thousands of clients from retrying in lockstep.
The Trade-Off
Retries only make sense for errors that are likely to be transient. They are usually wrong for:
- validation errors
- permanent authorization errors
- requests that are unsafe to repeat without idempotency
That is why retry policy and idempotency policy belong together.
Further Reading