Modulation: Turning Bits into Sound

NearWave encodes data as audio. The question is how. This post covers the modulation layer — the part of the system that maps binary data to sound frequencies and defines the relationship between symbol rate, tone count, and throughput.

The Core Problem

We have a byte stream — compressed, possibly encrypted — that needs to be transmitted as audio. The receiver will detect which frequencies are present over time and reconstruct the bytes. The modulation scheme determines:

How many distinct frequencies are used
How long each tone plays (symbol duration)
How many bits each tone represents

These three parameters define the throughput ceiling and the reliability floor of the entire system.

Binary FSK (BFSK)

The simplest approach. Two frequencies represent two states: 0 and 1.

Bit 0 → f₀ (e.g., 1000 Hz)
Bit 1 → f₁ (e.g., 1200 Hz)

One bit per symbol. If each symbol lasts 20ms, the raw throughput is 50 bits per second. Straightforward to implement, easy to detect, and resilient to noise because the receiver only needs to distinguish between two frequencies.

The downside is obvious: throughput is limited. Every bit requires one full symbol duration. For payloads of any meaningful size, transmission time adds up quickly.

Multi-Frequency Shift Keying (MFSK)

MFSK generalizes BFSK by using more frequencies. With $2^n$ frequencies, each symbol encodes $n$ bits.

4-FSK  (4 tones)  → 2 bits per symbol
8-FSK  (8 tones)  → 3 bits per symbol
16-FSK (16 tones) → 4 bits per symbol

For the same symbol duration, MFSK multiplies throughput by the number of bits per symbol. A 16-FSK scheme at 20ms per symbol gives 200 bits per second — 4x the throughput of BFSK with the same timing.

Frequency Allocation

The tones must be spaced far enough apart for reliable detection. The minimum spacing depends on the symbol duration:

$$\Delta f \geq \frac{1}{T_{\text{symbol}}}$$

For a 20ms symbol, the minimum spacing is 50 Hz. In practice, NearWave uses wider spacing to tolerate hardware frequency response curves and ambient noise. The reliable profile uses ~100 Hz spacing. The fast profile tightens it to ~60 Hz.

Tone Map

NearWave assigns frequencies contiguously from a base frequency:

Tone 0  → base_freq
Tone 1  → base_freq + spacing
Tone 2  → base_freq + 2 × spacing
...
Tone 15 → base_freq + 15 × spacing

The base frequency and spacing are profile-dependent. The ultrasonic profile shifts the entire map above 18 kHz.

Symbol Rate vs. Throughput

Throughput is the product of symbol rate and bits per symbol:

$$\text{Throughput (bps)} = \frac{\text{bits per symbol}}{T_{\text{symbol}}}$$

Scheme	Tones	Bits/Symbol	Symbol Duration	Throughput
BFSK	2	1	20ms	50 bps
4-FSK	4	2	20ms	100 bps
8-FSK	8	3	20ms	150 bps
16-FSK	16	4	20ms	200 bps
16-FSK	16	4	10ms	400 bps

Reducing symbol duration increases throughput but shrinks the detection window. At very short durations, the Goertzel filter (covered in Part 4) doesn’t accumulate enough samples to resolve frequencies cleanly. This is the fundamental tension in the system.

Guard Intervals

Between consecutive symbols, NearWave inserts a short silence — the guard interval. This prevents inter-symbol interference where the tail of one tone bleeds into the detection window of the next.

[tone₁][guard][tone₂][guard][tone₃]...

The guard interval is typically 2–5ms. Shorter guards increase effective throughput but reduce tolerance for speaker/microphone transient response. The reliable profile uses 5ms guards. The fast profile drops to 2ms.

Tradeoffs

Every parameter trades reliability for speed:

Parameter	Faster	More Reliable
Symbol duration	Shorter	Longer
Frequency spacing	Tighter	Wider
Guard interval	Shorter	Longer
Tone count	More tones	Fewer tones

More tones pack more bits per symbol, but require finer frequency resolution from the detector. Tighter spacing fits more tones in a given bandwidth, but increases the chance of cross-tone interference. Shorter symbols increase throughput, but reduce detection accuracy.

NearWave bundles these into profiles (reliable, fast, ultrasonic) so users don’t need to tune individual parameters. But under the hood, the profile is just a specific point in this tradeoff space.

What the Modulator Produces

The modulation stage maps the full path from bits to audio:

graph LR
    A[Bits] --> B[Symbol Mapper]
    B --> C[Frequency Selection<br/>BFSK: f₀ or f₁<br/>MFSK: f₀…f₁₅]
    C --> D[Sine Wave Generator]
    D --> E[Audio Signal + Guards]

The output is a sequence of audio samples — a PCM waveform. Each symbol is a pure sine wave at the assigned frequency, lasting for the symbol duration, followed by silence for the guard interval.

output[i] = amplitude × sin(2π × freq × i / sample_rate)

The waveform is written at a standard sample rate (44100 Hz or 48000 Hz) and played through the system’s audio output. On the receiving side, the demodulator processes the incoming audio to extract the frequency sequence and recover the bits.

The next post covers what happens after modulation: how the protocol structures the data into frames with synchronization, headers, error correction, and integrity checks.