Node.js OOMKilled: How to Find the Real Memory Problem

When a Node.js service gets OOMKilled in Kubernetes, the root cause is not always "the pod needs more memory" and it is not always "V8 leaked".

You have at least three moving parts:

the V8 heap
memory outside the V8 heap, such as native buffers
the container memory limit enforced by the runtime

If you do not separate those, you can spend days fixing the wrong problem.

Start With the Right Mental Model

process.memoryUsage() reports several memory buckets:

const usage = process.memoryUsage();
 
console.log({
  rss: usage.rss,
  heapTotal: usage.heapTotal,
  heapUsed: usage.heapUsed,
  external: usage.external,
  arrayBuffers: usage.arrayBuffers,
});

The important distinction is:

heapUsed is JavaScript objects managed by V8
external and arrayBuffers often represent memory that still counts against the container limit
rss is the process resident set and is usually the number operations teams care about during incidents

You can have a healthy-looking heap and still get killed because native memory or buffers keep growing.

`--max-old-space-size` Is a Tool, Not a Diagnosis

This flag raises the size of V8's old generation heap:

node --max-old-space-size=4096 server.js

That can be the right fix when:

the service legitimately needs more live heap
garbage collection is healthy
the container limit has enough headroom

It is the wrong fix when:

the memory leak is in userland objects that should be collectible
the memory growth is in buffers or native addons
the pod is already too close to the container limit

Raising the heap limit on a leaky process just delays the crash.

Common Causes of Memory Growth

In production Node services, I usually check these first:

Long-lived maps or caches without eviction
Event listeners attached repeatedly and never removed
Queues that accept work faster than workers can drain it
Large JSON payloads buffered fully in memory
Streams that were replaced with await response.json()
Native addons or image/video processing libraries holding memory outside the heap

None of these are exotic. They are normal engineering mistakes under load.

Heap Snapshots Are Worth the Friction

If the suspect is heap growth, take a snapshot and inspect retained size:

import { writeHeapSnapshot } from "node:v8";
 
const filename = writeHeapSnapshot();
console.log(`Heap snapshot written to ${filename}`);

Then open the snapshot in Chrome DevTools and look for:

large retaining paths
unexpectedly large arrays or maps
duplicated objects that should have short lifetimes
closures retaining request-specific state

The question is not "which object is big?" It is "why is this object still reachable?"

Leaks Often Hide in Convenience Code

This pattern is more dangerous than it looks:

const pending = new Map<string, RequestContext>();
 
export function trackRequest(id: string, ctx: RequestContext) {
  pending.set(id, ctx);
}

Without a clear delete path, that map becomes an accidental in-memory database.

The fix is usually not clever. It is lifecycle discipline:

remove entries when work completes
bound caches
stream large payloads
prefer backpressure over buffering everything

Containers Change the Failure Mode

Inside Kubernetes, the process is competing with the container limit, not just with V8 defaults.

That means:

watch rss, not just heap
leave headroom for native allocations
avoid setting --max-old-space-size close to the container limit

A process with a 4 GB heap in a 4 GB container is not "efficient". It is fragile.

A Practical Incident Loop

When a Node service starts getting OOMKilled:

graph rss, heapUsed, and request volume together
check whether growth resets after traffic drops
inspect buffer-heavy code paths and large payload handling
capture a heap snapshot if heap growth looks suspicious
only then decide whether heap tuning is justified

That order matters. Tuning before understanding usually creates a slower incident, not a better system.

Node.js OOMKilled: How to Find the Real Memory Problem_

Start With the Right Mental Model

`--max-old-space-size` Is a Tool, Not a Diagnosis

Common Causes of Memory Growth

Heap Snapshots Are Worth the Friction

Leaks Often Hide in Convenience Code

Containers Change the Failure Mode

A Practical Incident Loop

Further Reading

Related Writing.

Node.js Still Freezes When You Put CPU Work on the Main Thread

Node.js Scheduling Makes More Sense Once You Separate Microtasks from Phases

Node.js OOMKilled: How to Find the Real Memory Problem_

Start With the Right Mental Model

--max-old-space-size Is a Tool, Not a Diagnosis

Common Causes of Memory Growth

Heap Snapshots Are Worth the Friction

Leaks Often Hide in Convenience Code

Containers Change the Failure Mode

A Practical Incident Loop

Further Reading

Related Writing.

Node.js Still Freezes When You Put CPU Work on the Main Thread

Node.js Scheduling Makes More Sense Once You Separate Microtasks from Phases

`--max-old-space-size` Is a Tool, Not a Diagnosis