Testing Guide¶

Status: Normative (Mandatory) · Scope: All Python packages and services

1. Test Markers¶

Every test function MUST have exactly one marker (AGENTS.md §6). Tests without a marker will be rejected in code review.

Marker	Description	External Deps	Typical Duration	In `check-all`
`unit`	Fast, isolated tests without external dependencies	None (mocks only)	< 1s per test	✅ Stage 6
`integration`	Tests with external services (DB, Redis)	Testcontainers / Compose	< 30s per test	✅ Stage 7
`system`	Full-stack lifecycle tests with real Podman	Podman socket + built images	< 60s per test	✅ Stage 10
`system_hw`	Hardware-dependent system tests	Podman + real USB microphone	< 60s per test	❌ Never
`smoke`	Health checks against built containers	Built images (testcontainers)	< 30s total	✅ Stage 11
`e2e`	Browser tests via Playwright	Full stack + Playwright	< 60s per test	✅ Stage 12

[!IMPORTANT] system_hw tests are never included in CI or just check-all. They require real USB microphone hardware and must be run manually via just test-hw.

2. Directory Structure¶

Test location MUST match the marker. Service-specific tests live inside the service package. Only cross-cutting tests (multi-service interactions, stack-level health) belong in the root tests/ directory.

packages/<pkg>/tests/
    unit/           # @pytest.mark.unit
    integration/    # @pytest.mark.integration

services/<svc>/tests/
    unit/           # @pytest.mark.unit
    integration/    # @pytest.mark.integration

tests/                # Cross-cutting tests only
    smoke/          # @pytest.mark.smoke — stack health checks
    integration/    # @pytest.mark.integration — multi-service
    system/         # @pytest.mark.system — full-stack lifecycle (Podman)
                    # @pytest.mark.system_hw — hardware system tests
    e2e/            # @pytest.mark.e2e — browser tests (Playwright, v0.9.0+)

[!IMPORTANT] A test file in tests/unit/ MUST only contain @pytest.mark.unit tests. Mixing markers in a single directory is FORBIDDEN. Exception: tests/system/ contains both @pytest.mark.system and @pytest.mark.system_hw tests because they share Podman socket, DB, and hardware-config fixtures via a common conftest.py.

3. Running Tests¶

Individual Suites¶

just test-unit       # Unit tests only (no external deps)
just test-int        # Integration tests (Testcontainers)
just test-system     # System lifecycle tests (Podman + built images, no HW)
just test-hw         # Hardware system tests (requires real USB microphone)
just test-smoke      # Smoke tests (built images via Testcontainers)
just test-e2e        # End-to-end browser tests (Playwright)
just test            # Quick dev: Unit + Integration
just test-all        # All tests except hardware (Unit+Int+System+Smoke+E2E)
just test-cov-all    # Combined coverage map (Unit+Int+System+Smoke+E2E)

Quality Gates¶

just check           # Fast dev check (4 stages):
                     #   Lock + Ruff + Mypy + Unit Tests
just check-all       # Full CI pipeline (12 stages):
                     #   Lock → Audit → Lint → Type → Unit → Int
                     #   → Containerfile → Build → System → Smoke → E2E

When to Run What¶

Situation	Command	What it covers
During development	`just test`	Unit + Integration (quick feedback)
Before every commit	`just check`	Lint, types, unit tests (no containers)
Thorough test run	`just test-all`	All test suites except hardware
Before push / PR	`just check-all`	Full 12-stage pipeline incl. build
Before release	`just check-all`	All automated gates (see Release Checklist)
Release test audit	`just test-cov-all`	Combined coverage map for Changed-Path Audit
With USB mic connected	`just test-hw`	Real hardware detection + spawning

4. Writing Tests¶

Unit Tests¶

Use unittest.mock or pytest-mock for all external dependencies (DB, Redis, Podman, filesystem).
No network calls, no containers, no filesystem side-effects.
Each test should run in < 1 second.

Integration Tests¶

Use testcontainers for disposable PostgreSQL and Redis instances.
Do NOT rely on the Compose stack — integration tests must be self-contained.
Use polyfactory for generating Pydantic model instances as test data.

System Tests (`@pytest.mark.system`)¶

Test the full Controller lifecycle pipeline with real Podman but mocked hardware.
Each test gets its own isolated Podman network (silvasonic-test-{run_id}) via the system_network fixture.
Use testcontainers for DB + Redis (Controller tests), or podman_run() with the isolated network (Processor tests).
Mock /proc/asound/cards and sysfs to simulate device detection without hardware.
Skip gracefully when Podman socket is absent or images aren't built.
Tests cover: seeding, device scanning, profile matching, reconciliation, container start/stop, crash recovery.
Fully isolated from production — no shared network with just start.

Hardware System Tests (`@pytest.mark.system_hw`)¶

Test device detection pipeline with real USB microphone hardware.
Each test session gets its own isolated Podman network (silvasonic-hw-test-{id}) via the hw_redis fixture.
Require a USB-Audio device connected (e.g., UltraMic 384K).
Skip automatically when no USB-Audio device is detected.
Fully isolated from production — can run while just start is active.
Never included in CI pipelines — run manually via just test-hw.

Smoke Tests¶

Use testcontainers to start built container images in isolation (no just start needed).
Require images to be built first (just build or pipeline Stage 9).
Only test service health endpoints and basic connectivity (heartbeats in Redis).
Must be idempotent — running them multiple times produces the same result.
Do NOT test deep lifecycle behavior — that belongs in system tests.

E2E Tests¶

Use Playwright for browser automation.
Test user-facing flows through the Web-Interface.
Screenshots on failure for debugging.
Planned for v0.9.0+ when the Web-Interface has sufficient coverage.

5. Test Quality & Anti-Patterns¶

Status: Normative (Mandatory) This section defines the qualitative boundaries for tests. It is especially critical for AI-generated code.

5.1 Anti-Patterns (What Tests Must NOT Do)¶

The following patterns are FORBIDDEN and will lead to test rejection: - Existence/Import Tests: Tests that only verify imports or whether a function exists without asserting observable behavior. - Trivial Equality: Tests that only assert constants or default values (unless the value itself is an explicit domain contract). - Call-Chain Mirroring: Tests that identically replicate internal ORM/framework logic or helper structures instead of testing visible behavior. - Mock-Heavy Verification: Tests whose primary logic consists of setting up mocks rather than verifying domain logic. If mocking is substantially larger than the assertion, the test design is flawed. - Fragile Async Control: Async or loop tests that rely on brittle call_count checks, artificial CancelledError injections, or timing tricks when more robust alternatives exist.

5.2 Delete vs. Refactor Rule¶

Particularly for AI-generated tests, distinguish carefully between fixing and deleting: - DELETE if a test provides no clear business or architectural value. - DELETE if a test artificially inflates line coverage but would not catch a real regression. - REFACTOR if a test covers a valuable domain contract but is written in a brittle way.

5.3 Layer-Specific Quality Rules¶

Unit Tests (@pytest.mark.unit):
Must test the behavior of small units, not the implementation details.
Zero I/O, zero database access, zero framework internals.
Minimize mocking: use fakes or data structures where possible.
Integration Tests (@pytest.mark.integration):
Must use real PostgreSQL/Redis testcontainers. Mocking the database in integration tests is FORBIDDEN.
Must verify actual contracts between components and external dependencies.
System Tests (@pytest.mark.system):
Must focus on full lifecycle effects and state transitions.
Must assert end results, not internal call sequences.

5.4 Guidelines for AI Agents¶

Prioritize Simplicity: AI-generated tests must favor simplicity and readability over exhaustive completeness.
Avoid Coverage-Driven Bloat: Do not generate tests solely to increase test coverage.
Check Before Adding: Before an agent adds a new test, it must verify whether an existing higher-level test already covers the same failure space.
Document Intent: New tests must clearly state (via naming or docstrings) the specific behavior or regression they are safeguarding.

6. `check-all` Pipeline Stages¶

The just check-all command runs 12 stages in order:

Stage	Name	Critical	Description
1	Lock-File Check	No	`uv lock --check`
2	Dep Audit	No	`pip-audit` (skipped by default in dev)
3	Containerfile Lint	No	Hadolint + `podman-compose config` validation
4	Ruff Lint	Yes	Linting + formatting
5	Mypy	Yes	Static type checking
6	Unit Tests	Yes	`@pytest.mark.unit` (parallel, coverage)
7	Integration Tests	Yes	`@pytest.mark.integration` (testcontainers)
8	Clear	Always	Clean workspace
9	Build Images	Always	`just build`
10	System Tests	Yes	`@pytest.mark.system` (real Podman, needs images)
11	Smoke Tests	Yes	`@pytest.mark.smoke` (testcontainers)
12	E2E Tests	Yes	`@pytest.mark.e2e` (Playwright)

[!NOTE] system_hw tests are intentionally excluded from this pipeline. Run just test-hw separately when hardware is available.

7. Test Infrastructure¶

Tool	Purpose
`pytest`	Test runner
`pytest-xdist`	Parallel test execution (`-n` workers) — see §10
`testcontainers`	Disposable PostgreSQL + Redis for integration tests
`polyfactory`	Pydantic model factories for test data generation
`playwright`	Browser automation for E2E tests
`pytest-timeout`	Global timeout per test (default: 120s)
`pytest-asyncio`	Async test support (auto mode)

8. Naming Conventions¶

Element	Convention	Example
Test file	`test_<module>.py`	`test_controller.py`
Test class	`Test<Feature>`	`TestDeviceEvaluation`
Test function	`test_<behavior>`	`test_missing_profile_stays_pending`

Test names should describe the expected behavior, not the implementation detail.

9. Parallel Execution & Isolation¶

Every test level is fully isolated. All combinations can run in parallel — with each other and with just start (production stack).

Isolation by Level¶

Level	Container Infra	Network	Ports	Parallel-safe?	Safe vs. `just start`?
`unit`	None	None	None	✅	✅
`integration`	`testcontainers`	Random (auto)	Random	✅	✅
`smoke`	`testcontainers`	`smoke_network` (random)	Random	✅	✅
`system`	Podman CLI	`silvasonic-test-{run_id}` (per test)	Random	✅	✅
`system_hw`	Podman CLI	`silvasonic-hw-test-{session_id}` (per session)	Random	✅	✅
`just start`	Compose	`silvasonic-net`	Fixed	—	✅

All Combinations: ✅ Safe¶

Suite A	Suite B	Why it's safe
`just test-unit`	anything	Pure in-process, no containers, no Podman
`just test-int`	anything	`testcontainers`: ephemeral containers, random ports, own networks
`just test-smoke`	anything	`testcontainers`: own `smoke_network`, distinct aliases (`test-database`, `test-redis`)
`just test-system`	anything	Per-test `silvasonic-test-{run_id}` network via `system_network` fixture
`just test-hw`	anything	Per-session `silvasonic-hw-test-{id}` network via `hw_redis` fixture
`just test-system`	`just start`	✅ No shared network — test and prod are fully separated
`just test-hw`	`just start`	✅ No shared network — test and prod are fully separated
`just stop`	any test	`stop.py` filters `owner=controller` (exact match); tests use `owner=controller-test-*`

Isolation Mechanisms¶

Mechanism	Scope	Protects against
`system_network` fixture	Per test function	DNS alias collisions between parallel system tests
`hw_redis` fixture	Per session	DNS alias collisions between HW tests and prod
`TEST_RUN_ID` (UUID per session)	Per session	Container name collisions between parallel runs
`run_id` (UUID per test)	Per test function	Container name collisions within a single session
`io.silvasonic.owner=controller-test-<ID>`	Per session	`just stop` accidentally removing test containers
`testcontainers.Network()`	Per session	Integration/smoke tests vs. everything else
`podman_run(network=...)` (required, no default)	Per call	Compile-time guarantee that no caller forgets the network
`make_test_spec(..., network=...)` (required kw)	Per call	Compile-time guarantee that no caller forgets the network

[!NOTE] The production stack guard (_abort_if_prod_running()) was removed. It is no longer needed because all system and hardware tests create their own isolated Podman networks and never share silvasonic-net with the production Compose stack.

10. Parallel Workers & Configuration¶

All just test-* commands delegate to scripts/test.py — the single source of truth for pytest invocations, worker counts, and coverage arguments.

Worker Defaults¶

Worker counts and their environment variable overrides are defined in scripts/test.py — the single source of truth. Do not duplicate the defaults here. Refer to the constants at the top of that file (UNIT_WORKERS, INTEGRATION_WORKERS, SYSTEM_WORKERS) for current values.

Override via environment variables: SILVASONIC_UNIT_WORKERS, SILVASONIC_INTEGRATION_WORKERS, SILVASONIC_SYSTEM_WORKERS.

# Temporary override for a single run
SILVASONIC_INTEGRATION_WORKERS=8 just test-int

# Or export for the session
export SILVASONIC_UNIT_WORKERS=12
just check

[!WARNING] System test hard ceiling: ~6–7 workers. The rootless Podman socket is a shared bottleneck. At 8 workers, testcontainers hits 60s read timeouts on the Podman API, causing most tests to SKIP or ERROR.

Integration Test DB Cleanup¶

With pytest-xdist, each worker gets its own session-scoped postgres_container (via testcontainers). Tests on the same worker share that DB and run sequentially. An autouse _clean_db_tables fixture (in each conftest.py) deletes all application rows between tests in FK-safe order, ensuring no cross-test contamination.

Affected conftest.py files: - services/processor/tests/integration/conftest.py - services/controller/tests/integration/conftest.py - tests/integration/conftest.py

[!IMPORTANT] When adding new tables to the database schema, you MUST add them to the _CLEANUP_TABLES tuple in each conftest.py above, respecting FK order (children before parents).