← All writing
08 June 2025·11 min

From Bench to Browser: Streaming 1 kHz Sensor Data Without Dropping a Sample

How to push high-rate sensor frames from a C++ binary through a Python FastAPI bridge to a browser, without back-pressure killing you. Lessons from the radar pipeline, generalised.

After shipping the CFAR radar pipeline last month, the question I kept getting from other students was the same one: how do you actually get data out of a fast C++ process and onto a screen, in real time, without the browser eventually freezing? This post is the answer I wish I had when I started.

The naive version of this is two lines of code. The version that survives a 1 kHz sample rate for hours without a memory leak takes a bit more thinking. I'll walk through the version that ships.

The shape of the problem

You have a producer that emits frames much faster than a browser can render them. The browser can comfortably composite at 60 Hz; your sensor is producing 1,000 frames per second; your network can carry them but your tab cannot draw them. Somewhere in the middle, something has to give.

There are three places back-pressure can build: the C++ process's stdout buffer, the Python bridge's async queue, and the WebSocket send buffer. If any of them grows without bound you have a leak. If any of them drops the wrong frame you have a glitch. The job is to drop the right frames in the right place.

Layer 1: the C++ side

The producer writes newline-delimited JSON to stdout. The single most important line in the entire program is `std::cout.flush()` after every frame. Without it the OS pipe buffers up to 64 KB before releasing anything; the Python reader blocks forever waiting for the first newline; you spend an hour adding logging to a system that is working perfectly except for one missing flush.

for (;;) {
    auto frame = pipeline.process_one();
    serialise_json(std::cout, frame);
    std::cout << '\n';
    std::cout.flush();          // <- non-negotiable
}

I deliberately do not buffer in C++. Buffering at the producer is the wrong place to do it: the buffer is invisible to the consumer, which means the consumer cannot make an informed drop decision.

Layer 2: the asyncio bridge

FastAPI spawns the C++ binary as a subprocess and reads its stdout line by line. Each connected WebSocket client gets its own bounded asyncio.Queue with maxsize=3. Three is not magic; it is enough to absorb a single GC pause on the client without dropping, and small enough that a wedged client never accumulates seconds of stale data.

class Hub:
    def __init__(self):
        self._subs: set[asyncio.Queue] = set()

    def subscribe(self) -> asyncio.Queue:
        q = asyncio.Queue(maxsize=3)
        self._subs.add(q)
        return q

    async def broadcast(self, frame: dict):
        for q in list(self._subs):
            if q.full():
                _ = q.get_nowait()  # drop oldest
            q.put_nowait(frame)

Drop the oldest, not the newest. A stale frame is always less useful than the current one for live visualisation; if you drop the newest you guarantee the screen lags by the queue depth forever.

Layer 3: the WebSocket

Browsers expose a `bufferedAmount` property on a WebSocket. I never trust it. Instead the server caps its send rate to 30 Hz with `asyncio.sleep(1/30)` between frames per client. The producer still runs at full speed; the per-client coroutine simply pulls the latest frame from the queue and writes it.

async def client_loop(ws: WebSocket, q: asyncio.Queue):
    last_sent = 0.0
    while True:
        frame = await q.get()
        now = time.monotonic()
        if now - last_sent < 1/30:
            continue
        await ws.send_text(json.dumps(frame, separators=(',', ':')))
        last_sent = now

Frame coalescing

If two frames arrive between sends, you have two reasonable strategies. The first is to send only the latest (this is what I do for raw sensor frames). The second is to merge them, useful when frames are deltas rather than snapshots.

Snapshots (full radar profiles, full pose matrices): keep the newest.

Deltas (events, detections, telemetry counters): merge into a single batched frame.

Mixed payloads: split the channel — one WebSocket per semantic stream, each tuned independently.

Binary vs JSON

I stayed on JSON for this pipeline. Each frame is ~8 KB; at 30 Hz that is 240 KB/s, well under what any browser will struggle with. The debuggability win (open DevTools, see the frame, paste into a notebook) is worth the 3x size penalty over a Float32Array binary frame. Switch to binary when bandwidth, not engineering time, is the bottleneck.

What it looks like under load

I ran a soak test: producer at 50,000 fps, three browser clients, one of them deliberately throttled to 5 fps render. Over 12 hours the server's RSS grew by 4 MB (Python interpreter warmup). The throttled client received an even 5 fps of recent frames; the other two received a steady 30 fps; no queue ever exceeded depth 2. No leaks, no glitches, no tab crashes.

Memory growth: +4 MB across 12 hours

Dropped frames at fast client: 0

Dropped frames at slow client: 1,944,121 (expected, and the right ones)

p99 server→client latency: 12 ms over localhost

Mistakes I made first

I started with an unbounded queue. The slow tab grew to 1.2 GB in 40 minutes. I then switched to dropping the newest, and watched the visualisation lag by an ever-growing constant. The third version — bounded queue, drop oldest — has been boring ever since, which is what you want from infrastructure.

High-rate streaming is not a throughput problem, it is a drop-policy problem. The right place to drop a frame is the place that knows which frame is stale.
WebSocketsPythonReal-timeFastAPI

Next essay

MISRA C++ Without the Misery: Writing Safety-Adjacent Code as a Student