Testing Guide¶

Status: Normative (Mandatory) · Scope: All Python packages and services

1. Test Markers¶

Every test function MUST have exactly one marker. Tests without a marker will be rejected in code review.

Marker	Included in `just ci`	Target duration (guideline)
`unit`	✅	< 1s/test
`integration`	✅	< 30s/test
`system`	✅	< 60s/test
`smoke`	✅	< 30s total
`e2e`	✅	< 60s/test
`system_hw_auto`	❌ Never	< 60s/test
`system_hw_manual`	❌ Never	< 60s/test

[!IMPORTANT] system_hw_auto and system_hw_manual tests are never included in CI or just ci. They require real USB microphone hardware. Run via just test-hw / just test-hw-manual.

2. Directory Structure¶

Test location MUST match the marker. Service-specific tests live inside the service package. Only cross-cutting tests (multi-service, stack-level) belong in root tests/.

Location	Markers
`packages/<pkg>/tests/unit/`	`@pytest.mark.unit`
`packages/<pkg>/tests/integration/`	`@pytest.mark.integration`
`services/<svc>/tests/unit/`	`@pytest.mark.unit`
`services/<svc>/tests/integration/`	`@pytest.mark.integration`
`tests/smoke/`	`@pytest.mark.smoke`
`tests/integration/`	`@pytest.mark.integration` (multi-service)
`tests/system/`	`.system`, `.system_hw_auto`, `.system_hw_manual`
`tests/e2e/`	`@pytest.mark.e2e`

[!IMPORTANT] Mixing markers in a single directory is FORBIDDEN. Exception: tests/system/ contains .system, .system_hw_auto and .system_hw_manual because they share Podman socket, DB, and hardware-config fixtures via a common conftest.py.

3. Running Tests¶

Individual Suites¶

just test-unit       # Unit tests only (no external deps)
just test-int        # Integration tests (Testcontainers)
just test-system     # System lifecycle tests (Podman + built images, no HW)
just test-hw         # Automated hardware tests (requires real USB microphone)
just test-hw-manual  # Interactive hardware tests (requires manual unplug/replug)
just test-hw-all     # All hardware tests (automated + manual)
just test-smoke      # Smoke tests (built images via Testcontainers)
just test-e2e        # End-to-end browser tests (Playwright)
just test            # Quick dev: Unit + Integration
just test-all        # All tests except hardware (Unit+Int+System+Smoke+E2E)
just test-cov-all    # Combined coverage map (Unit+Int+System+Smoke+E2E)

Quality Gates¶

just c               # Fast dev check (< 10s):
                     #   Lock + Ruff + Mypy + Unit Tests
just v               # Verify for push (~ 35s):
                     #   Fast Check + DB-Integration Tests
just ci              # Full CI pipeline (> 4m):
                     #   Lock → Audit → Lint → Type → Unit → Int
                     #   → Containerfile → Build → System → Smoke → E2E

When to Run What¶

Situation	Command	What it covers
During development	`just test`	Unit + Integration (quick feedback)
Before every commit	`just c`	Lint, types, unit tests (no containers)
Before push / PR	`just v`	Code Quality + Integration Tests
Thorough test run	`just test-all`	All test suites except hardware
Verify Full CI	`just ci`	Full 12-stage pipeline incl. build
Before release	`just ci`	All automated gates (see Release Checklist)
Release test audit	`just test-cov-all`	Combined coverage map for Changed-Path Audit
With USB mic connected	`just test-hw-all`	Real hardware detection + spawning

4. Writing Tests¶

Unit Tests¶

Use unittest.mock or pytest-mock for all external dependencies (DB, Redis, Podman, filesystem).
No network calls, no containers, no filesystem side-effects.
Each test should run in < 1 second.

Integration Tests¶

Use testcontainers for disposable PostgreSQL and Redis instances.
Do NOT rely on the Compose stack — integration tests must be self-contained.
Use polyfactory for generating Pydantic model instances as test data.

System Tests (`@pytest.mark.system`)¶

Test the full Controller lifecycle pipeline with real Podman but mocked hardware.
Each test gets its own isolated Podman network (silvasonic-test-{run_id}) via the system_network fixture.
Use testcontainers for DB + Redis (Controller tests), or podman_run() with the isolated network (Processor tests).
Mock /proc/asound/cards and sysfs to simulate device detection without hardware.
Skip gracefully when Podman socket is absent or images aren't built.
Tests cover: seeding, device scanning, profile matching, reconciliation, container start/stop, crash recovery.
Fully isolated from production — no shared network with just start.

Hardware System Tests (`@pytest.mark.system_hw_auto` / `.system_hw_manual`)¶

Test device detection pipeline with real USB microphone hardware.
Each test session gets its own isolated Podman network (silvasonic-hw-test-{id}) via the hw_redis fixture.
Require a USB-Audio device connected (e.g., UltraMic 384K).
Skip automatically when no USB-Audio device is detected.
Fully isolated via container network — but hardware-locked. Cannot run while just start is active (ALSA device requires exclusive access).
Never included in CI pipelines — run manually via just test-hw or just test-hw-manual.

Smoke Tests¶

Use testcontainers to start built container images in isolation (no just start needed).
Require images to be built first (just build or pipeline Stage 9).
Only test service health endpoints and basic connectivity (heartbeats in Redis).
Must be idempotent — running them multiple times produces the same result.
Do NOT test deep lifecycle behavior — that belongs in system tests.

E2E Tests¶

Use Playwright for browser automation.
Test user-facing flows through the Web-Interface.
Screenshots on failure for debugging.
Planned for v0.9.0+ when the Web-Interface has sufficient coverage.

5. "Vertrauensanker" End-to-End Tests¶

To guarantee that the entire cross-container data pipeline remains unbroken from ingestion to deep-learning inference, the CI suite includes two explicit "Vertrauensanker" (trust anchor) System Tests. These tests bypass unit-level mocking to confirm that real audio translates to real SQL insertions:

test_birdnet_full_pipeline.py (Marker: .system): Runs automatically in the CI pipeline. It boots the database and redis containers, then starts the recorder, processor, and birdnet containers sequentially. It feeds a fixed fixture WAV into the recorder using a mock FFmpeg loop (SILVASONIC_RECORDER_MOCK_SOURCE), enforcing deterministic ingestion without hardware. It verifies the complete data flow: chunk creation, database indexation, BirdNET ML inference, label thresholding, and clip generation.
test_hw_birdnet_full_pipeline.py (Marker: .system_hw_manual): An optional physical analog. Instead of mocking the file stream, it uses ffplay to output sound out of the workstation's physical speakers, capturing the waveform back through a connected USB microphone. This validates the host ALSA stack, hardware constraints, and the real-world acoustic transfer paths.

6. Test Quality & Anti-Patterns¶

Status: Normative (Mandatory) This section defines the qualitative boundaries for tests. It is especially critical for AI-generated code.

5.1 Anti-Patterns (What Tests Must NOT Do)¶

The following patterns are FORBIDDEN and will lead to test rejection: - Existence/Import Tests: Tests that only verify imports or whether a function exists without asserting observable behavior. - Trivial Equality: Tests that only assert constants or default values (unless the value itself is an explicit domain contract). - Call-Chain Mirroring: Tests that identically replicate internal ORM/framework logic or helper structures instead of testing visible behavior. Bad: assert mock_session.execute.call_args == select(Model).where(...) — mirrors the ORM query instead of testing the returned domain result. - Mock-Heavy Verification: Tests whose primary logic consists of setting up mocks rather than verifying domain logic. If mocking is substantially larger than the assertion, the test design is flawed. - Fragile Async Control: Async or loop tests that rely on brittle call_count checks, artificial CancelledError injections, or timing tricks when more robust alternatives exist.

5.2 Delete vs. Refactor Rule¶

Particularly for AI-generated tests, distinguish carefully between fixing and deleting: - DELETE if a test provides no clear business or architectural value. - DELETE if a test artificially inflates line coverage but would not catch a real regression. - REFACTOR if a test covers a valuable domain contract but is written in a brittle way.

5.3 Layer-Specific Quality Rules¶

Unit Tests (@pytest.mark.unit):
Must test the behavior of small units, not the implementation details.
Zero I/O, zero database access, zero framework internals.
Minimize mocking: use fakes or data structures where possible.
Integration Tests (@pytest.mark.integration):
Must use real PostgreSQL/Redis testcontainers. Mocking the database in integration tests is FORBIDDEN.
Must verify actual contracts between components and external dependencies.
System Tests (@pytest.mark.system):
Must focus on full lifecycle effects and state transitions.
Must assert end results, not internal call sequences.

5.4 Guidelines for AI Agents¶

Check Before Adding: Before an agent adds a new test, it must verify whether an existing higher-level test already covers the same failure space.
Prioritize Simplicity: AI-generated tests must favor simplicity and readability over exhaustive completeness.
Avoid Coverage-Driven Bloat: Do not generate tests solely to increase test coverage.
Document Intent: New tests must clearly state (via naming or docstrings) the specific behavior or regression they are safeguarding.

6. `ci` Pipeline Stages¶

The just ci command runs these stages in order:

Stage Name	Critical	Description
Lock-File Check	No	`uv lock --check`
Dep Audit	No	`pip-audit` (skipped by default in dev)
Containerfile Lint	No	Hadolint + `podman-compose config` validation
Ruff Lint	Yes	Linting + formatting
Mypy	Yes	Static type checking
Unit Tests	Yes	`@pytest.mark.unit` (parallel, coverage)
Integration Tests	Yes	`@pytest.mark.integration` (testcontainers)
Clear	Always	Clean workspace
Build Images	Always	`just build`
System Tests	Yes	`@pytest.mark.system` (real Podman, needs images)
Smoke Tests	Yes	`@pytest.mark.smoke` (testcontainers)
E2E Tests	Yes	`@pytest.mark.e2e` (Playwright)

7. Test Infrastructure¶

Tool	Purpose
`pytest`	Test runner
`pytest-xdist`	Parallel test execution (`-n` workers) — see §10
`testcontainers`	Disposable PostgreSQL + Redis for integration tests
`polyfactory`	Pydantic model factories for test data generation
`playwright`	Browser automation for E2E tests
`pytest-timeout`	Global timeout per test (default: 120s)
`pytest-asyncio`	Async test support (auto mode)

8. Naming Conventions¶

Element	Convention	Example
Test file	`test_<module>.py`	`test_controller.py`
Test class	`Test<Feature>`	`TestDeviceEvaluation`
Test function	`test_<behavior>`	`test_missing_profile_stays_pending`

Test names should describe the expected behavior, not the implementation detail.

9. Parallel Execution & Isolation¶

Every test level is fully isolated. Almost all combinations can run in parallel — with each other and with just start (production stack). Exception: system_hw_auto and system_hw_manual cannot run in parallel with just start due to the exclusive kernel-level lock on the ALSA audio device.

Isolation by Level¶

Level	Container Infra	Network	ALSA Device
`unit`	None	None	Mocked
`integration`	`testcontainers`	Random (auto)	Mocked
`smoke`	`testcontainers`	`smoke_network` (random)	Mocked
`system`	Podman CLI	`silvasonic-test-{run_id}` (per test)	Mocked
`system_hw_auto`	Podman CLI	`silvasonic-hw-test-{session_id}` (per session)	Locked (Exclusive)
`system_hw_manual`	Podman CLI	`silvasonic-hw-test-{session_id}` (per session)	Locked (Exclusive)
`just start`	Compose	`silvasonic-net` (fixed)	Locked (Exclusive)

Key Fixtures & Mechanisms¶

system_network fixture — creates per-test Podman network, prevents DNS alias collisions between parallel system tests.
hw_redis fixture — creates per-session Podman network for hardware tests.
TEST_RUN_ID (UUID per session) / run_id (UUID per test) — prevent container name collisions.
Owner label io.silvasonic.owner=controller-test-<ID> prevents just stop from removing test containers.
podman_run(network=...) and make_test_spec(..., network=...) require the network parameter (compile-time guarantee).

10. Parallel Workers & DB Cleanup¶

Worker counts and their environment variable overrides are defined in scripts/test.py — the single source of truth. Do not duplicate the defaults here.

Override example: SILVASONIC_INTEGRATION_WORKERS=8 just test-int

[!WARNING] System test hard ceiling: The rootless Podman socket is a shared bottleneck. Too many workers cause testcontainers to hit 60s read timeouts on the Podman API. See scripts/test.py for current limits.

Integration Test DB Cleanup¶

With pytest-xdist, each worker gets its own session-scoped postgres_container (via testcontainers). An autouse _clean_db_tables fixture calls a centralized clean_database helper from silvasonic-test-utils.

This helper dynamically queries the database for all application tables and truncates them using RESTART IDENTITY CASCADE. This automatically respects foreign key relationships and entirely removes the need to manually maintain cleanup lists when adding new tables to the schema.