ADR-0011: Audio Recording Strategy (Raw vs Processed)¶
Status: Accepted (amended 2026-03-30) • Date: 2026-01-31
NOTE: The
processor(v0.5.0) anduploader(v0.6.0) are implemented. References tobirdnet,batdetect, orweatherrefer to planned services.
1. Context & Problem¶
The system supports various hardware microphones with different native capabilities (e.g., Dodotronic Ultramic at 384kHz, standard USB mics at 48kHz). Previously, we used terminology like "High Res" and "Low Res" or hardcoded 384kHz/48kHz assumptions. This is brittle and does not scale to different hardware configurations.
We need a standardized way to handle audio streams to ensure downstream services (Analysis, Visualization, Upload) know exactly what to expect, regardless of the input hardware.
2. Decision¶
We chose: A Dual Stream Architecture with standardized naming.
Reasoning:
-
Logical Dual Stream Architecture: The Recorder service MUST always produce the Raw stream. The Processed stream is produced when the device's microphone profile sets
processed_enabled: true(the default). Profiles for microphones whose native sample rate matches the target (48 kHz) setprocessed_enabled: false— no resampling is needed, sodata/rawIS the analysis-ready stream.- Raw: The native, bit-perfect capture from the hardware. Sample rate is variable (hardware-dependent). Always present.
- Processed: A standardized 48 kHz stream derived from the raw input. Present only when
processed_enabled: true.
-
Naming Convention:
- Streams and artifacts MUST be named
rawandprocessed. - We explicitly abandon names like
high,low,high_res,low_resor specific bitrates (384k) in naming conventions (variables, directories, database columns). - Filename Format: All files MUST adhere to the v0.6.0 collision-proof format:
YYYY-MM-DDTHH-MM-SSZ_{duration}s_{run_id}_{seq:08d}.wav. Pre-v0.6.0 legacy fallback formats are strictly forbidden.
- Streams and artifacts MUST be named
-
Local Storage Format:
- Format:
WAV(linear PCM). - Motivation: Minimal CPU overhead for writing; instant availability for local seeking/reading without decoding latency.
- Structure (see Filesystem Governance for full directory layout):
data/raw/YYYY-MM-DDTHH-MM-SSZ_{duration}s_{run_id}_{seq}.wavdata/processed/YYYY-MM-DDTHH-MM-SSZ_{duration}s_{run_id}_{seq}.wav
- Format:
-
Cloud Storage Format:
- Format:
FLAC(Free Lossless Audio Codec). - Motivation: Bandwidth efficiency. Uploading uncompressed WAVs is wasteful.
- Policy: The Uploader service converts
rawartifacts to FLAC on-the-fly (or uses a buffer) before/during upload.
- Format:
3. Options Considered¶
- Single Stream (Processed only):
- Rejected because: Losing the raw, bit-perfect recording is unacceptable for scientific purposes. Hardware-native sample rates carry valuable high-frequency data (e.g., bat echolocation).
- Hardware-specific naming (
384k,48k):- Rejected because: Brittle. Replacing a microphone with a different native sample rate would require changes across the entire codebase and database schema.
4. Consequences¶
- Positive:
- Downstream Compatibility: Services like BirdNET consume the preferred audio via
COALESCE(file_processed, file_raw), always receiving 48 kHz data without internal resampling. - Hardware Independence: Replacing a 384 kHz mic with a 96 kHz mic requires no code changes in downstream consumers, as
processedremains 48 kHz, andrawis just handled as "the archival file". - Database Schema: The
recordingstable usesfile_raw(NOT NULL) andfile_processed(NULLABLE) columns. Raw-only devices insertfile_processed = NULLandfilesize_processed = 0. - Filesystem: The workspace directory structure uses
data/rawand optionallydata/processedwithin each microphone folder (see Filesystem Governance). - Cross-Service Contract: The Controller stores
devices.workspace_nameduring enrollment, which the Processor Indexer uses to resolve filesystem paths to the device's stable identity (devices.name).
- Downstream Compatibility: Services like BirdNET consume the preferred audio via
- Negative:
- Requires up to double the storage for local recordings when both streams are active (raw + processed).
- CPU overhead for real-time resampling to produce the processed stream (handled by FFmpeg, see ADR-0024).
5. Future: Live Opus Stream (v1.1.0)¶
Status: Planned
In v1.1.0, the Recorder will produce a third output stream, extending the Dual Stream Architecture to a Triple Stream Architecture:
| Stream | Format | Destination | Purpose |
|---|---|---|---|
| Raw | WAV (PCM) | NVMe (local) | Archival, scientific analysis |
| Processed | WAV (PCM) | NVMe (local) | BirdNET, BatDetect, consumption |
| Live | Opus | Icecast (network) | Real-time monitoring via Web-UI |
The Live stream is best-effort — if Icecast is unavailable, the Recorder continues writing Raw and Processed without interruption. Data Capture Integrity applies: the live stream must never compromise the recording pipeline.
Each Recorder pushes its Opus stream to a dedicated mount point on the Icecast server (e.g. /mic-ultramic.opus). The Web-Interface allows the user to select which microphone to listen to by switching the mount point URL.
6. Retention Policy (v0.5.0 — The Janitor)¶
Status: Implemented (since v0.5.0) Service:
processor(Tier 1, Critical)
To prevent storage exhaustion on the edge device (typical: 256 GB NVMe), the processor service implements a centralized background cleanup task, colloquially called "The Janitor".
Design Decision¶
We have decided to enforce Data Capture Integrity via an escalating retention policy based on local disk utilization. As storage fills up, the policy progressively sacrifices first local analysis completeness, and eventually remote backup guarantees, to ensure the Recorder never faces a "Disk Full" scenario and never stops recording.
Batch Size Limit¶
Deletions are limited to janitor_batch_size (default: 50) files per cleanup cycle to prevent I/O storms and excessive database load. At 10-second segments, this corresponds to ~8 minutes of audio per batch.
Uploader-Fallback (Pre-v0.6.0)¶
When no Uploader is configured (no active storage_remotes rows in the database), the uploaded condition in Housekeeping and Defensive modes is skipped. This prevents the Janitor from remaining idle until the Panic threshold is reached. The fallback is logged at WARNING level with the key janitor.uploader_fallback_active.
If Uploaders are configured, the uploaded condition is strictly interpreted as meaning the file has been successfully uploaded to ALL currently active remotes.
The exact implementation details, thresholds, and deletion rules are maintained authoritatively in the Processor Service Documentation.
7. Implementation: FFmpeg Audio Engine (v0.4.0)¶
Status: Implemented (since v0.4.0) See: ADR-0024 for the full architectural decision.
The Dual Stream output (Raw + Processed) is produced by a single FFmpeg subprocess managed by the Recorder service. FFmpeg handles ALSA capture, resampling, segmentation, and WAV encoding in native C code — the Python GIL never touches the audio path.
The Recorder's Python process manages FFmpeg's lifecycle and atomically promotes completed segments from .buffer/ to data/ via filesystem polling and os.replace().