From Bench to Browser: Streaming 1 kHz Sensor Data Without Dropping a Sample
How to push high-rate sensor frames from a C++ binary through a Python FastAPI bridge to a browser, without back-pressure killing you. Lessons from the radar pipeline, generalised.
After shipping the CFAR radar pipeline last month, the question I kept getting from other students was the same one: how do you actually get data out of a fast C++ process and onto a screen, in real time, without the browser eventually freezing? This post is the answer I wish I had when I started.
The naive version of this is two lines of code. The version that survives a 1 kHz sample rate for hours without a memory leak takes a bit more thinking. I'll walk through the version that ships.
The shape of the problem
You have a producer that emits frames much faster than a browser can render them. The browser can comfortably composite at 60 Hz; your sensor is producing 1,000 frames per second; your network can carry them but your tab cannot draw them. Somewhere in the middle, something has to give.
There are three places back-pressure can build: the C++ process's stdout buffer, the Python bridge's async queue, and the WebSocket send buffer. If any of them grows without bound you have a leak. If any of them drops the wrong frame you have a glitch. The job is to drop the right frames in the right place.
Layer 1: the C++ side
The producer writes newline-delimited JSON to stdout. The single most important line in the entire program is `std::cout.flush()` after every frame. Without it the OS pipe buffers up to 64 KB before releasing anything; the Python reader blocks forever waiting for the first newline; you spend an hour adding logging to a system that is working perfectly except for one missing flush.
for (;;) {
auto frame = pipeline.process_one();
serialise_json(std::cout, frame);
std::cout << '\n';
std::cout.flush(); // <- non-negotiable
}I deliberately do not buffer in C++. Buffering at the producer is the wrong place to do it: the buffer is invisible to the consumer, which means the consumer cannot make an informed drop decision.
Layer 2: the asyncio bridge
FastAPI spawns the C++ binary as a subprocess and reads its stdout line by line. Each connected WebSocket client gets its own bounded asyncio.Queue with maxsize=3. Three is not magic; it is enough to absorb a single GC pause on the client without dropping, and small enough that a wedged client never accumulates seconds of stale data.
class Hub:
def __init__(self):
self._subs: set[asyncio.Queue] = set()
def subscribe(self) -> asyncio.Queue:
q = asyncio.Queue(maxsize=3)
self._subs.add(q)
return q
async def broadcast(self, frame: dict):
for q in list(self._subs):
if q.full():
_ = q.get_nowait() # drop oldest
q.put_nowait(frame)Drop the oldest, not the newest. A stale frame is always less useful than the current one for live visualisation; if you drop the newest you guarantee the screen lags by the queue depth forever.
Layer 3: the WebSocket
Browsers expose a `bufferedAmount` property on a WebSocket. I never trust it. Instead the server caps its send rate to 30 Hz with `asyncio.sleep(1/30)` between frames per client. The producer still runs at full speed; the per-client coroutine simply pulls the latest frame from the queue and writes it.
async def client_loop(ws: WebSocket, q: asyncio.Queue):
last_sent = 0.0
while True:
frame = await q.get()
now = time.monotonic()
if now - last_sent < 1/30:
continue
await ws.send_text(json.dumps(frame, separators=(',', ':')))
last_sent = nowFrame coalescing
If two frames arrive between sends, you have two reasonable strategies. The first is to send only the latest (this is what I do for raw sensor frames). The second is to merge them, useful when frames are deltas rather than snapshots.
Snapshots (full radar profiles, full pose matrices): keep the newest.
Deltas (events, detections, telemetry counters): merge into a single batched frame.
Mixed payloads: split the channel — one WebSocket per semantic stream, each tuned independently.
Binary vs JSON
I stayed on JSON for this pipeline. Each frame is ~8 KB; at 30 Hz that is 240 KB/s, well under what any browser will struggle with. The debuggability win (open DevTools, see the frame, paste into a notebook) is worth the 3x size penalty over a Float32Array binary frame. Switch to binary when bandwidth, not engineering time, is the bottleneck.
What it looks like under load
I ran a soak test: producer at 50,000 fps, three browser clients, one of them deliberately throttled to 5 fps render. Over 12 hours the server's RSS grew by 4 MB (Python interpreter warmup). The throttled client received an even 5 fps of recent frames; the other two received a steady 30 fps; no queue ever exceeded depth 2. No leaks, no glitches, no tab crashes.
Memory growth: +4 MB across 12 hours
Dropped frames at fast client: 0
Dropped frames at slow client: 1,944,121 (expected, and the right ones)
p99 server→client latency: 12 ms over localhost
Mistakes I made first
I started with an unbounded queue. The slow tab grew to 1.2 GB in 40 minutes. I then switched to dropping the newest, and watched the visualisation lag by an ever-growing constant. The third version — bounded queue, drop oldest — has been boring ever since, which is what you want from infrastructure.
High-rate streaming is not a throughput problem, it is a drop-policy problem. The right place to drop a frame is the place that knows which frame is stale.