almessadi.
Back to Index

Node.js OOMKilled: How to Find the Real Memory Problem_

What actually causes Node.js services to get OOMKilled, how to inspect the heap, and when raising the memory limit helps versus hiding the leak.

PublishedMarch 12, 2024
Reading Time15 min read

When a Node.js service gets OOMKilled in Kubernetes, the root cause is not always "the pod needs more memory" and it is not always "V8 leaked".

You have at least three moving parts:

  • the V8 heap
  • memory outside the V8 heap, such as native buffers
  • the container memory limit enforced by the runtime

If you do not separate those, you can spend days fixing the wrong problem.

Start With the Right Mental Model

process.memoryUsage() reports several memory buckets:

const usage = process.memoryUsage();

console.log({
  rss: usage.rss,
  heapTotal: usage.heapTotal,
  heapUsed: usage.heapUsed,
  external: usage.external,
  arrayBuffers: usage.arrayBuffers,
});

The important distinction is:

  • heapUsed is JavaScript objects managed by V8
  • external and arrayBuffers often represent memory that still counts against the container limit
  • rss is the process resident set and is usually the number operations teams care about during incidents

You can have a healthy-looking heap and still get killed because native memory or buffers keep growing.

--max-old-space-size Is a Tool, Not a Diagnosis

This flag raises the size of V8's old generation heap:

node --max-old-space-size=4096 server.js

That can be the right fix when:

  • the service legitimately needs more live heap
  • garbage collection is healthy
  • the container limit has enough headroom

It is the wrong fix when:

  • the memory leak is in userland objects that should be collectible
  • the memory growth is in buffers or native addons
  • the pod is already too close to the container limit

Raising the heap limit on a leaky process just delays the crash.

Common Causes of Memory Growth

In production Node services, I usually check these first:

  1. Long-lived maps or caches without eviction
  2. Event listeners attached repeatedly and never removed
  3. Queues that accept work faster than workers can drain it
  4. Large JSON payloads buffered fully in memory
  5. Streams that were replaced with await response.json()
  6. Native addons or image/video processing libraries holding memory outside the heap

None of these are exotic. They are normal engineering mistakes under load.

Heap Snapshots Are Worth the Friction

If the suspect is heap growth, take a snapshot and inspect retained size:

import { writeHeapSnapshot } from "node:v8";

const filename = writeHeapSnapshot();
console.log(`Heap snapshot written to ${filename}`);

Then open the snapshot in Chrome DevTools and look for:

  • large retaining paths
  • unexpectedly large arrays or maps
  • duplicated objects that should have short lifetimes
  • closures retaining request-specific state

The question is not "which object is big?" It is "why is this object still reachable?"

Leaks Often Hide in Convenience Code

This pattern is more dangerous than it looks:

const pending = new Map<string, RequestContext>();

export function trackRequest(id: string, ctx: RequestContext) {
  pending.set(id, ctx);
}

Without a clear delete path, that map becomes an accidental in-memory database.

The fix is usually not clever. It is lifecycle discipline:

  • remove entries when work completes
  • bound caches
  • stream large payloads
  • prefer backpressure over buffering everything

Containers Change the Failure Mode

Inside Kubernetes, the process is competing with the container limit, not just with V8 defaults.

That means:

  • watch rss, not just heap
  • leave headroom for native allocations
  • avoid setting --max-old-space-size close to the container limit

A process with a 4 GB heap in a 4 GB container is not "efficient". It is fragile.

A Practical Incident Loop

When a Node service starts getting OOMKilled:

  1. graph rss, heapUsed, and request volume together
  2. check whether growth resets after traffic drops
  3. inspect buffer-heavy code paths and large payload handling
  4. capture a heap snapshot if heap growth looks suspicious
  5. only then decide whether heap tuning is justified

That order matters. Tuning before understanding usually creates a slower incident, not a better system.

Further Reading