Node.js OOMKilled: How to Find the Real Memory Problem_
What actually causes Node.js services to get OOMKilled, how to inspect the heap, and when raising the memory limit helps versus hiding the leak.
When a Node.js service gets OOMKilled in Kubernetes, the root cause is not always "the pod needs more memory" and it is not always "V8 leaked".
You have at least three moving parts:
- the V8 heap
- memory outside the V8 heap, such as native buffers
- the container memory limit enforced by the runtime
If you do not separate those, you can spend days fixing the wrong problem.
Start With the Right Mental Model
process.memoryUsage() reports several memory buckets:
const usage = process.memoryUsage();
console.log({
rss: usage.rss,
heapTotal: usage.heapTotal,
heapUsed: usage.heapUsed,
external: usage.external,
arrayBuffers: usage.arrayBuffers,
});
The important distinction is:
heapUsedis JavaScript objects managed by V8externalandarrayBuffersoften represent memory that still counts against the container limitrssis the process resident set and is usually the number operations teams care about during incidents
You can have a healthy-looking heap and still get killed because native memory or buffers keep growing.
--max-old-space-size Is a Tool, Not a Diagnosis
This flag raises the size of V8's old generation heap:
node --max-old-space-size=4096 server.js
That can be the right fix when:
- the service legitimately needs more live heap
- garbage collection is healthy
- the container limit has enough headroom
It is the wrong fix when:
- the memory leak is in userland objects that should be collectible
- the memory growth is in buffers or native addons
- the pod is already too close to the container limit
Raising the heap limit on a leaky process just delays the crash.
Common Causes of Memory Growth
In production Node services, I usually check these first:
- Long-lived maps or caches without eviction
- Event listeners attached repeatedly and never removed
- Queues that accept work faster than workers can drain it
- Large JSON payloads buffered fully in memory
- Streams that were replaced with
await response.json() - Native addons or image/video processing libraries holding memory outside the heap
None of these are exotic. They are normal engineering mistakes under load.
Heap Snapshots Are Worth the Friction
If the suspect is heap growth, take a snapshot and inspect retained size:
import { writeHeapSnapshot } from "node:v8";
const filename = writeHeapSnapshot();
console.log(`Heap snapshot written to ${filename}`);
Then open the snapshot in Chrome DevTools and look for:
- large retaining paths
- unexpectedly large arrays or maps
- duplicated objects that should have short lifetimes
- closures retaining request-specific state
The question is not "which object is big?" It is "why is this object still reachable?"
Leaks Often Hide in Convenience Code
This pattern is more dangerous than it looks:
const pending = new Map<string, RequestContext>();
export function trackRequest(id: string, ctx: RequestContext) {
pending.set(id, ctx);
}
Without a clear delete path, that map becomes an accidental in-memory database.
The fix is usually not clever. It is lifecycle discipline:
- remove entries when work completes
- bound caches
- stream large payloads
- prefer backpressure over buffering everything
Containers Change the Failure Mode
Inside Kubernetes, the process is competing with the container limit, not just with V8 defaults.
That means:
- watch
rss, not just heap - leave headroom for native allocations
- avoid setting
--max-old-space-sizeclose to the container limit
A process with a 4 GB heap in a 4 GB container is not "efficient". It is fragile.
A Practical Incident Loop
When a Node service starts getting OOMKilled:
- graph
rss,heapUsed, and request volume together - check whether growth resets after traffic drops
- inspect buffer-heavy code paths and large payload handling
- capture a heap snapshot if heap growth looks suspicious
- only then decide whether heap tuning is justified
That order matters. Tuning before understanding usually creates a slower incident, not a better system.
Further Reading
Related Writing.
Continue with closely related articles on software engineering, architecture, and implementation trade-offs.
Node.js Still Freezes When You Put CPU Work on the Main Thread
Node is excellent at non-blocking I/O, but synchronous CPU-heavy operations still block the event loop. That distinction is where many production incidents start.
Node.js Scheduling Makes More Sense Once You Separate Microtasks from Phases
The order between `process.nextTick`, promises, timers, and `setImmediate` is easier to reason about when you understand where each queue sits.