ADR-0024: FFmpeg Audio Engine — Externalizing the Real-Time Audio Path¶
Status: Accepted • Date: 2026-03-25
1. Context & Problem¶
The Recorder service (v0.4.0) captures audio via Python libraries (sounddevice + soundfile + soxr). The PortAudio callback runs in a C thread, but re-enters Python to enqueue data — exposing the audio path to GIL contention and Garbage Collector pauses.
On a Raspberry Pi 5 running concurrent ML workloads (BirdNET, BatDetect), these pauses cause xruns (buffer overflows) at high sample rates (384 kHz). This violates Data Capture Integrity — the paramount directive (AGENTS.md §1).
2. Decision¶
We chose: Replace the Python audio pipeline with an FFmpeg subprocess managed by the Recorder service.
Reasoning:
- GIL-Free Audio Path: FFmpeg is pure C. No Python code executes in the audio capture path. GIL contention, GC pauses, and
queue.Queuelock contention become impossible. - Process Isolation: FFmpeg runs as a separate OS process. If Python crashes, FFmpeg continues recording until SIGTERM. The Linux kernel can schedule FFmpeg on a dedicated CPU core, independent of Python and ML workloads.
- Battle-Tested: FFmpeg is used in billions of installations for audio/video capture. Its ALSA backend, resampler (
soxr), and segment muxer are production-hardened. - Dual Stream in One Command: FFmpeg's
-mapand-f segmentproduce both Raw and Processed streams simultaneously — replacing ~600 lines of Python with a single CLI invocation. - Future-Ready: Adding the Triple Stream (Opus → Icecast, v1.1.0) is a single additional
-mapline.
2.1. Segment Completion Strategy¶
FFmpeg writes segments to .buffer/raw/ and .buffer/processed/. A Python SegmentPromoter thread polls FFmpeg's -segment_list CSV output and atomically promotes completed segments to data/ via os.replace().
This preserves the existing .buffer/ → data/ promotion pattern required by:
- Filesystem Governance (§2): Processor/Indexer polls data/ for complete files
- ADR-0009 (Zero-Trust): Consumers mount data/ as :ro — must never see partial files
- DB Schema: recordings.file_raw / recordings.file_processed reference data/ paths
2.2. Mock Source for CI¶
SILVASONIC_RECORDER_MOCK_SOURCE=true switches FFmpeg from -f alsa -i hw:X,0 to -f lavfi -i "sine=frequency=440:sample_rate=48000" — enabling hardware-independent testing without any Python mock classes.
3. Options Considered¶
- Keep Python (
sounddevice+soundfile): Rejected. Architecturally unsuitable for a 24/7 hardware appliance at 384 kHz with concurrent ML workloads. - Custom Rust CLI (
silvasonic-capture): Rejected. Functionally equivalent to FFmpeg but requires maintaining a separate Rust codebase. FFmpeg delivers 99% of the robustness benefit without the development overhead. - FFmpeg writing directly to
data/: Rejected. FFmpeg's segment muxer does not guarantee atomic file completion — the Processor/Indexer would see partially-written WAVs.
4. Consequences¶
- Positive:
- Data Capture Integrity guaranteed at hardware level — no Python in the audio path.
- 3 Python dependencies removed (
sounddevice,soundfile,soxr). - ~600 lines of complex Python replaced by ~250 lines of subprocess management.
- Testable without hardware via FFmpeg's built-in signal generator (
lavfi). - Trivial extension to Triple Stream (Opus → Icecast) in v1.1.0.
- Negative:
- FFmpeg becomes a system-level dependency (installed in Containerfile).
- Observability shifts from Python properties to FFmpeg stderr parsing.
SegmentPromoterintroduces a small promotion latency (~0.5s after segment close).
5. References¶
- ADR-0011 — Dual Stream Architecture
- ADR-0020 — OOM Protection, Recorder =
oom_score_adj=-999 - Filesystem Governance —
.buffer/→data/promotion pattern - Processor Service Spec — Indexer polls
data/for new WAVs