almessadi.
Back to Index

Scaling WebSockets Means Scaling State and Fanout_

Persistent connections are not the hard part by themselves. The real problems are fanout, backpressure, connection distribution, and what runtime model you choose.

PublishedMay 10, 2024
Reading Time3 min read

Engineers sometimes frame WebSocket scaling as "can runtime X hold N connections?"

That is too narrow.

The harder question is usually:

"Can this system manage long-lived connection state, selective fanout, and slow clients without collapsing?"

The Real Costs

A WebSocket system has to deal with:

  • persistent connection state
  • authentication and reconnect handling
  • fanout to many subscribers
  • backpressure from slow consumers
  • cross-node coordination if you scale horizontally

Connection count matters. Fanout shape matters more.

Ten thousand mostly idle connections are a different problem from ten thousand subscribers receiving frequent broadcasts.

Why Node Often Struggles

Node can absolutely power WebSocket systems.

The pressure starts when the application does too much work per message on the same event loop that is responsible for keeping sockets healthy. Once broadcast loops, serialization, and per-client bookkeeping pile up, latency becomes visible quickly.

That is not an indictment of Node. It is a reminder that runtime model and workload shape must match.

Why Elixir/Phoenix Gets Mentioned So Often

Elixir and the BEAM ecosystem are popular in this space because they are designed around large numbers of lightweight processes and message passing. That can make connection-oriented systems simpler to reason about operationally.

The real advantage is not "Elixir can do a million sockets" as a slogan. It is that supervision, isolation, and concurrency are part of the model rather than something bolted on around it.

The Design Rule That Matters

If you need large-scale realtime delivery, focus on:

  • keeping per-connection state small
  • partitioning rooms or topics cleanly
  • applying backpressure
  • measuring fanout cost
  • choosing a runtime that fits the workload

Sometimes Node is good enough. Sometimes Go or Elixir is a better fit.

The serious engineering question is not which language wins internet arguments. It is which runtime makes the failure modes manageable for your traffic pattern.

Further Reading