Back to all posts
May 11, 2026  ·  14 min  ·  Govind Mehta

WebRTC at Scale: Architecting SFUs and MCUs for Multi-User Apps

WebRTCScaleArchitecture

The Mesh Problem

Peer-to-peer (Mesh) WebRTC is great for 2 people. But as you add more users, the bandwidth and CPU required by each peer grows exponentially (n * (n-1)). A 10-person call in Mesh mode would require every user to upload 9 video streams—crashing most mobile devices. To scale, we need a Media Server.

SFU: The Modern Gold Standard

A Selective Forwarding Unit (SFU) is the architecture of choice for apps like Zoom, Teams, and Google Meet. Instead of peers connecting to each other, they each send one stream to the server. The server then "forwards" that stream to everyone else. This reduces the upload bandwidth to a constant factor, regardless of the number of participants.

MCU: The Legacy Workhorse

A Multipoint Control Unit (MCU) goes a step further. It receives all streams, mixes them into a single video frame, and sends that one stream back to every user. While this is extremely easy for the client device to process, it is extremely expensive for the server, as it requires real-time video encoding for every participant.

Scaling to the Thousands: Simulcast and SVC

How do you handle a user on a high-speed fiber connection and a user on a shaky 3G connection in the same call? We use Simulcast. The client sends three versions of their video (Low, Medium, High resolution). The SFU then "selects" which version to forward to each recipient based on their individual bandwidth. For even more efficiency, Scalable Video Coding (SVC) allows the server to drop "layers" of a single stream without re-encoding.

Production Tooling


Frequently Asked Questions

When should I stop using P2P and move to an SFU?

Generally, once your app needs to support more than 3-4 people in a single call, or if you need features like server-side recording or broadcasting to RTMP, you should move to an SFU architecture.

Does an SFU break end-to-end encryption?

Technically, yes. The server must terminate the encryption to forward packets. However, modern protocols like SFrame are being developed to allow end-to-end encryption even through an SFU by encrypting the media payload separately from the network headers.

Can I use WebRTC for one-to-many live streaming?

Yes. While traditional streaming (HLS/DASH) has 10-30 seconds of latency, WebRTC-based streaming provides sub-second latency. This is critical for interactive auctions, betting, and live sports commentary.

Scaling is not about making one server bigger; it's about making the network smarter.