Efficient Frequency Detection with Goertzel

The receiver’s core job is frequency detection. Given a window of audio samples, determine which FSK tone is present. This post covers why NearWave uses the Goertzel algorithm instead of FFT, how it works, and what makes it efficient for FSK demodulation.

The Detection Problem

After the modulator (Part 2) plays a sequence of tones, the receiver captures audio and needs to answer one question per symbol period: which frequency was playing?

graph LR
    A[Audio Samples] --> B[Windowing]
    B --> C[Goertzel Filter]
    C --> D[Magnitude per Frequency]
    D --> E[Peak Selection]
    E --> F[Bit Decision]

For BFSK, this means choosing between two frequencies. For 4-MFSK, choosing among four. The answer maps directly to bits.

Why Not FFT

The FFT (Fast Fourier Transform) computes the magnitude of every frequency bin across the entire spectrum. For a sample window of $N$ samples, the FFT produces $N/2$ frequency bins in $O(N \log N)$ time.

For FSK demodulation, this is wasteful. NearWave needs the magnitude at 2 frequencies (BFSK) or 4 frequencies (4-MFSK) — not hundreds. Computing a full 1024-point FFT to extract 2 values discards 99.8% of the output.

The mismatch:

	FFT	Goertzel
Output	All frequency bins	Selected frequencies only
Complexity	$O(N \log N)$ for all bins	$O(N)$ per target frequency
For 2 targets, N=1024	~10,000 ops	~2,048 ops
For 4 targets, N=1024	~10,000 ops	~4,096 ops
Memory	Full spectrum buffer	Single accumulator per target

When the number of target frequencies is small relative to $N$, Goertzel is significantly faster.

How Goertzel Works

The Goertzel algorithm computes the magnitude of a single frequency bin using a second-order IIR (infinite impulse response) filter. It processes $N$ samples and produces one magnitude value.

Setup

For a target frequency $f_{\text{target}}$ at sample rate $f_s$ with window size $N$:

$$k = \frac{N \cdot f_{\text{target}}}{f_s}$$

$$\omega = \frac{2\pi k}{N}$$

$$\text{coeff} = 2 \cos(\omega)$$

The coefficient is computed once per target frequency and reused for every window.

Iteration

Process each sample $x[n]$ for $n = 0$ to $N-1$:

$$s[n] = x[n] + \text{coeff} \cdot s[n-1] - s[n-2]$$

Starting with $s[-1] = 0$ and $s[-2] = 0$.

Magnitude Extraction

After processing all $N$ samples, the magnitude squared at the target frequency is:

$$|X[k]|^2 = s[N-1]^2 + s[N-2]^2 - \text{coeff} \cdot s[N-1] \cdot s[N-2]$$

No complex arithmetic. No sine/cosine per sample. Just one multiply-add per sample in the loop, then three operations to extract the result.

Implementation

In Go, the core loop is minimal:

coeff := 2.0 * math.Cos(2.0*math.Pi*float64(k)/float64(N))
var s0, s1, s2 float64

for i := 0; i < N; i++ {
    s0 = samples[i] + coeff*s1 - s2
    s2 = s1
    s1 = s0
}

magnitude := s1*s1 + s2*s2 - coeff*s1*s2

This lives in /internal/dsp/ alongside windowing and bandpass filter utilities.

Windowing

Raw rectangular windows cause spectral leakage — energy from one frequency bleeding into adjacent bins. This is especially problematic when two FSK tones are close together.

NearWave applies a Hann window before Goertzel processing:

$$w[n] = 0.5 \left(1 - \cos\left(\frac{2\pi n}{N-1}\right)\right)$$

The windowed samples are:

$$x’[n] = x[n] \cdot w[n]$$

This smooths the edges of the sample window to zero, reducing leakage at the cost of slightly widening the main lobe. For FSK with adequate tone spacing, this tradeoff is favorable.

Bandpass Filtering

Before frequency detection, NearWave applies a bandpass filter to attenuate out-of-band noise. If the FSK tones are in the 2–4 kHz range, frequencies outside that band are suppressed.

This reduces the chance of noise-induced false detections. A speaking voice at 300 Hz or a fan hum at 120 Hz won’t contribute energy to the Goertzel computation if they’re filtered out beforehand.

The filter parameters are profile-dependent — the ultrasonic profile uses a bandpass centered above 18 kHz.

Decision Logic

For each symbol period, the receiver:

Extracts $N$ samples from the audio buffer
Applies the Hann window
Runs Goertzel for each candidate frequency (2 for BFSK, 4 for 4-MFSK)
Selects the frequency with the highest magnitude
Maps the selected frequency to bits via the tone map

Magnitudes: [f₀: 0.12, f₁: 0.87, f₂: 0.05, f₃: 0.03]
Winner: f₁ → bits "01"

A confidence threshold is optional — if no frequency exceeds a minimum energy level, the symbol is marked as uncertain. In practice, NearWave relies on FEC (Part 3) to handle occasional detection failures rather than implementing complex confidence scoring.

Performance in Practice

For a typical BFSK configuration at 44100 Hz sample rate with 20ms symbols:

Samples per window: $N = 44100 \times 0.020 = 882$
Goertzel operations per symbol: $2 \times 882 = 1764$ multiply-adds
FFT operations for same window: $882 \times \log_2(1024) \approx 8820$ (padded to power of 2)

Goertzel is ~5x less work for BFSK. For 4-MFSK it’s still ~2.5x less. The savings are modest in absolute terms on modern hardware, but they compound across thousands of symbols in a long transmission and matter on resource-constrained devices.

More importantly, Goertzel requires no FFT library dependency. The entire implementation is ~20 lines of Go using only the standard math package.

What Goertzel Doesn’t Do

Goertzel detects magnitude at a known frequency. It doesn’t:

Discover unknown frequencies (use FFT for that)
Handle frequency drift (the target must be precise)
Provide phase information (not needed for FSK magnitude detection)

These limitations are irrelevant for FSK demodulation where the frequency set is fixed and known to both sender and receiver.

The next post covers how modulation, protocol framing, Goertzel detection, and error correction come together in NearWave’s end-to-end pipeline.