← All writing
20 May 2025·12 min

I Built a Real-Time Radar Signal Processor in C++17. Here's Exactly How It Works.

An end-to-end FMCW radar pipeline, Hann window, FFTW3 range FFT, CA/GO-CFAR detection, Doppler map, streamed to a React dashboard at 50,000+ frames/second.

There is a specific kind of satisfaction that comes from writing a piece of software and watching it process real physics. Not mock data. Not a tutorial dataset. Actual frequency-domain mathematics that describes how radio waves reflect off objects and return to a sensor.

This post is about a project I built called cfar-radar-pipeline: a complete, end-to-end FMCW radar signal processor written in C++17, bridged to a Python FastAPI server, and displayed on a live React dashboard, all running in a browser from a single `make` command. I want to walk through every interesting decision I made, the maths behind the core algorithm, and the benchmark results I measured on real hardware.

Why radar? Why this project?

I am a second-year Computer Science student at Brunel University London, with a focus on C++, OOP, UML, and real-time systems. I had projects in Python and Java and a lock-free ring buffer in C++17, but I wanted one project that lived unmistakably in the defence-adjacent space: radar signal processing.

CFAR target detection felt right. It is the core of how radar systems, from automotive ADAS sensors to military surveillance arrays, decide whether a returned signal is a genuine target or just noise. It is also a beautiful piece of applied statistics, and it turned out to be deeply satisfying to implement from scratch.

What is FMCW radar, briefly?

FMCW stands for Frequency-Modulated Continuous Wave. Instead of pulsing radio energy, an FMCW radar continuously transmits a signal whose frequency increases linearly over time, a chirp. When that chirp reflects off an object and returns, the received signal is slightly delayed relative to the transmitted signal. That delay appears as a frequency difference called the beat frequency, and that beat frequency is directly proportional to the object's range.

Beat frequency = (2 × Bandwidth × Range) / (c × Chirp Duration)

By taking the FFT of a single chirp's IQ samples, you get a range profile, a spectrum where each frequency bin corresponds to a specific range. Peaks in the spectrum are potential targets.

The signal processing chain

IQ Chirps → Hann Window → FFTW3 Range FFT → CA/GO-CFAR → JSON output, with a parallel Doppler FFT producing a 2-D Range-Doppler map.

Stage 1: Generating synthetic IQ

Each synthetic chirp is a sum of five complex sinusoids (one per target), each at the beat frequency corresponding to its range, plus a Doppler phase offset corresponding to its velocity.

float beat_freq = (2.0f * bandwidth * R) / (c * chirp_duration);
float doppler   = (2.0f * v * center_freq) / c;

for (uint32_t n = 0; n < n_samples; ++n) {
    float phase = 2.0f * M_PI * (beat_freq + doppler) * n / sample_rate;
    iq_samples[2*n    ] += amplitude * std::cos(phase);   // I
    iq_samples[2*n + 1] += amplitude * std::sin(phase);   // Q
}

I added Swerling-I amplitude scintillation (±25% sinusoidal variation per frame) and per-frame range drift so the targets feel like real moving objects rather than static injections. The five targets span SNRs from ~19 dB down to ~8 dB, T5 at ~8 dB is borderline, which makes it interesting for tuning the CFAR threshold.

Stage 2: Hann windowing

Windowing matters because a finite-length FFT implicitly assumes the signal is periodic. Without it, sharp edges at the start and end of each chirp create spectral leakage, energy from a strong target smears across adjacent bins and can mask weaker neighbours. Hann tapers the signal smoothly to zero, dramatically reducing sidelobes. The cost is a ~3 dB reduction in peak amplitude, which I account for in SNR estimates.

static std::vector<float> FFTProcessor::hannWindow(uint32_t n) {
    std::vector<float> w(n);
    for (uint32_t i = 0; i < n; ++i)
        w[i] = 0.5f * (1.0f - std::cos(2.0f * M_PI * i / (n - 1)));
    return w;
}

Stage 3: FFTW3 range FFT

I used FFTW3, specifically the single-precision variant (fftwf_), faster than double on x86 with SSE/AVX and more than sufficient for radar processing. Two things matter: create the plan once (FFTW's planning step is expensive, it benchmarks transforms on your specific hardware), and allocate input/output with fftwf_alloc_complex so the buffers are SIMD-aligned.

FFTProcessor::FFTProcessor(uint32_t fft_size, bool apply_hann)
    : fft_size_(fft_size), apply_hann_(apply_hann)
{
    buf_in_.resize(fft_size_);
    buf_out_.resize(fft_size_);
    plan_ = fftwf_plan_dft_1d(fft_size_,
                               buf_in_.data(), buf_out_.data(),
                               FFTW_FORWARD, FFTW_MEASURE);
    if (apply_hann_) hann_window_ = hannWindow(fft_size_);
}

After the FFT I take the first N/2 bins (positive-frequency half) and normalise by FFT size. Each bin maps to a range: range_m = bin × c / (2 × bandwidth) = bin × 0.0375 m for a 4 GHz bandwidth.

Stage 4: Doppler processing

A single chirp tells you where targets are (range). A sequence of chirps tells you how fast they are moving (velocity). By collecting 64 chirps and computing a second FFT along the slow-time axis for each range bin, I get a 2-D Range-Doppler map. Each cell tells you: is there an object at this range, moving at this velocity? Targets appear as bright spots.

The CA-CFAR algorithm

CFAR stands for Constant False Alarm Rate. The problem it solves: a fixed detection threshold works badly in practice. Strong clutter (ground, buildings, rain) swamps it with false alarms; a threshold set to avoid clutter misses real targets in quieter regions. CA-CFAR sets the threshold adaptively, for each cell under test, it estimates the local noise power by averaging a window of training cells on either side, skipping guard cells immediately adjacent to the CUT so the target's own energy doesn't leak into the noise estimate.

[training | guard | CUT | guard | training]
   N=16      G=4          G=4      N=16

T[i] = α × (1/N) × Σ(training cells)
α    = N × (Pfa^(-1/N) − 1)

For Pfa = 10⁻⁴ and N = 32, α ≈ 5.52, the threshold sits about 5.52× the local noise estimate, calibrated so that in pure noise, the probability of a false alarm is exactly 10⁻⁴ per cell.

for (uint32_t i = half; i < N - half; ++i) {
    float left_sum = 0.0f, right_sum = 0.0f;
    for (uint32_t j = 1; j <= training_cells; ++j) {
        left_sum  += profile[i - guard_cells - j];
        right_sum += profile[i + guard_cells + j];
    }
    float noise_est = (variant == CA)
        ? (left_sum + right_sum) / (2.0f * training_cells)
        : std::max(left_sum, right_sum) / training_cells;  // GO-CFAR
    threshold[i] = alpha_ * noise_est;
}

After thresholding, contiguous detections are cluster-picked, I find runs of bins where profile[i] > threshold[i] and take the argmax of each run, so each physical target produces exactly one detection. I implemented GO-CFAR (Greatest-Of) too: it takes the larger of the two side means, more conservative at clutter boundaries at the cost of slight sensitivity loss in homogeneous noise. Switchable at runtime with --variant CA or --variant GO.

The streaming architecture

The C++ binary streams processed frames to stdout at 25 fps in newline-delimited JSON, one ~8 KB object per line. The Python FastAPI server spawns the binary as a subprocess and reads its stdout asynchronously, broadcasting each frame to all WebSocket clients. Each client gets its own asyncio.Queue with maxsize=3, if a client falls behind, older frames are dropped rather than building up a queue that would leak memory.

async def _reader(self):
    while True:
        line = await self._proc.stdout.readline()
        if not line: break
        frame = json.loads(line.decode())
        for q in self._subscribers:
            await q.put(frame)

Critical detail: std::cout.flush() after every frame. Without it, Python's readline() blocks forever waiting for a buffer that never drains. The Python server also serves the React frontend as static files, so the entire stack, C++ binary, Python bridge, browser dashboard, is accessible at http://localhost:8000 from a single make run-server.

The dashboard

The frontend is deliberately zero-build: plain HTML + CDN React + Babel, no npm, no webpack, no Docker. The interesting part is the range profile chart, rather than using a charting library (which would batch updates), I render the profile as a raw SVG path and animate it frame-by-frame with Framer Motion's pathLength. The CFAR threshold draws with a 0.3s delay; detected targets spring in with a staggered bounce. The Doppler heatmap is a 512×64 Canvas with red crosshairs marking detections.

Benchmark results

10,000 iterations on an Intel i7 laptop running Windows 11, MinGW GCC 9.2, FFT size 1024, 64 chirps per frame, CA-CFAR with guard=4 training=16:

Mean latency: 19.87 µs / frame

Median (p50), 18.41 µs

p99 latency: 34.12 µs

Min / Max: 15.23 µs / 52.67 µs

Throughput: 50,331 frames / second

The pipeline runs a full 1024-bin range FFT + CA-CFAR + 64-chirp Doppler FFT in under 20 µs on average. The WebSocket target is 25 fps (one frame every 40,000 µs), the C++ binary is two thousand times faster than the streaming rate, leaving enormous headroom for MIMO channel separation or track-before-detect. The p99 tail at 1.7× the mean is healthy, without the multi-millisecond spikes a mutex-locked equivalent shows (~2.3× the mean under contention).

Testing

AlphaComputation, verifies α ≈ 5.52 for Pfa = 10⁻⁴, N = 32

SingleTargetDetected, injects one spike into noise, verifies detection

FalseAlarmRate: 1,000 noise-only frames, FAR ≤ 2 × Pfa

MultipleTargets: 3 targets at known bins, all detected

GOCFARMoreConservative, GO detection count ≤ CA count on same profile

JSONSerialisation, output contains every key the frontend expects

The false alarm rate test was the one I was most careful about. 1,000 frames × 512 bins = 512,000 cells under test, threshold set to Pfa = 10⁻⁴, expected ~51 false alarms. Passes if count ≤ 102, allowing for statistical variance. In practice it consistently lands between 30–70.

Why C++17 and not C++20?

Deliberate choice. C++20's concepts, coroutines, std::span and std::format are genuinely useful, but compiler support on the toolchains that matter for defence and embedded work is patchy. GCC 9.2 (RHEL 8 LTS, MSYS2 MinGW, both common in embedded/defence CI) has incomplete C++20: std::format missing, std::ranges partial, std::jthread may be absent. Defence standards like MISRA C++, DO-178C, and AUTOSAR certify against C++14 or C++17; C++20 toolchain certification is still in progress industry-wide. Making a deliberate, explained choice about language standard is the kind of thinking that matters in production engineering.

What I learned

The Python bridge is not the bottleneck, the C++ binary runs at 50,331 fps, asyncio adds virtually no overhead, and the limit is the 25 fps WebSocket rate and the browser's rendering budget. Hand-rolling a 30-line JSON serialiser took 20 minutes and produces output the frontend parses without issue, no need to pull in nlohmann/json or rapidjson. And guard cell sizing matters more than training cell count: too few guards and target energy bleeds into the noise estimate, inflating the threshold and causing the target to miss its own detection. Four guard cells per side is the practical minimum for a Hann-windowed signal at the SNRs I was working with.

The C++ binary is two thousand times faster than the streaming rate. The bottleneck is never where you expect, measure first, optimise second, and most of the time the maths is already fast enough.
C++17Signal ProcessingRadarCFARReal-time

Next essay

Why I Spent a Week Writing a 150-Line Header File (And What It Taught Me About Modern CPUs)