Testing Guide¶
Status: Normative (Mandatory) · Scope: All Python packages and services
1. Test Markers¶
Every test function MUST have exactly one marker (AGENTS.md §6). Tests without a marker will be rejected in code review.
| Marker | Description | External Deps | Typical Duration | In check-all |
|---|---|---|---|---|
unit |
Fast, isolated tests without external dependencies | None (mocks only) | < 1s per test | ✅ Stage 6 |
integration |
Tests with external services (DB, Redis) | Testcontainers / Compose | < 30s per test | ✅ Stage 7 |
system |
Full-stack lifecycle tests with real Podman | Podman socket + built images | < 60s per test | ✅ Stage 10 |
system_hw |
Hardware-dependent system tests | Podman + real USB microphone | < 60s per test | ❌ Never |
smoke |
Health checks against built containers | Built images (testcontainers) | < 30s total | ✅ Stage 11 |
e2e |
Browser tests via Playwright | Full stack + Playwright | < 60s per test | ✅ Stage 12 |
[!IMPORTANT]
system_hwtests are never included in CI orjust check-all. They require real USB microphone hardware and must be run manually viajust test-hw.
2. Directory Structure¶
Test location MUST match the marker. Service-specific tests live inside the service package. Only cross-cutting tests (multi-service interactions, stack-level health) belong in the root tests/ directory.
packages/<pkg>/tests/
unit/ # @pytest.mark.unit
integration/ # @pytest.mark.integration
services/<svc>/tests/
unit/ # @pytest.mark.unit
integration/ # @pytest.mark.integration
tests/ # Cross-cutting tests only
smoke/ # @pytest.mark.smoke — stack health checks
integration/ # @pytest.mark.integration — multi-service
system/ # @pytest.mark.system — full-stack lifecycle (Podman)
# @pytest.mark.system_hw — hardware system tests
e2e/ # @pytest.mark.e2e — browser tests (Playwright, v0.9.0+)
[!IMPORTANT] A test file in
tests/unit/MUST only contain@pytest.mark.unittests. Mixing markers in a single directory is FORBIDDEN. Exception:tests/system/contains both@pytest.mark.systemand@pytest.mark.system_hwtests because they share Podman socket, DB, and hardware-config fixtures via a commonconftest.py.
3. Running Tests¶
Individual Suites¶
just test-unit # Unit tests only (no external deps)
just test-int # Integration tests (Testcontainers)
just test-system # System lifecycle tests (Podman + built images, no HW)
just test-hw # Hardware system tests (requires real USB microphone)
just test-smoke # Smoke tests (built images via Testcontainers)
just test-e2e # End-to-end browser tests (Playwright)
just test # Quick dev: Unit + Integration
just test-all # All tests except hardware (Unit+Int+System+Smoke+E2E)
just test-cov-all # Combined coverage map (Unit+Int+System+Smoke+E2E)
Quality Gates¶
just check # Fast dev check (4 stages):
# Lock + Ruff + Mypy + Unit Tests
just check-all # Full CI pipeline (12 stages):
# Lock → Audit → Lint → Type → Unit → Int
# → Containerfile → Build → System → Smoke → E2E
When to Run What¶
| Situation | Command | What it covers |
|---|---|---|
| During development | just test |
Unit + Integration (quick feedback) |
| Before every commit | just check |
Lint, types, unit tests (no containers) |
| Thorough test run | just test-all |
All test suites except hardware |
| Before push / PR | just check-all |
Full 12-stage pipeline incl. build |
| Before release | just check-all |
All automated gates (see Release Checklist) |
| Release test audit | just test-cov-all |
Combined coverage map for Changed-Path Audit |
| With USB mic connected | just test-hw |
Real hardware detection + spawning |
4. Writing Tests¶
Unit Tests¶
- Use
unittest.mockorpytest-mockfor all external dependencies (DB, Redis, Podman, filesystem). - No network calls, no containers, no filesystem side-effects.
- Each test should run in < 1 second.
Integration Tests¶
- Use
testcontainersfor disposable PostgreSQL and Redis instances. - Do NOT rely on the Compose stack — integration tests must be self-contained.
- Use
polyfactoryfor generating Pydantic model instances as test data.
System Tests (@pytest.mark.system)¶
- Test the full Controller lifecycle pipeline with real Podman but mocked hardware.
- Each test gets its own isolated Podman network (
silvasonic-test-{run_id}) via thesystem_networkfixture. - Use
testcontainersfor DB + Redis (Controller tests), orpodman_run()with the isolated network (Processor tests). - Mock
/proc/asound/cardsand sysfs to simulate device detection without hardware. - Skip gracefully when Podman socket is absent or images aren't built.
- Tests cover: seeding, device scanning, profile matching, reconciliation, container start/stop, crash recovery.
- Fully isolated from production — no shared network with
just start.
Hardware System Tests (@pytest.mark.system_hw)¶
- Test device detection pipeline with real USB microphone hardware.
- Each test session gets its own isolated Podman network (
silvasonic-hw-test-{id}) via thehw_redisfixture. - Require a USB-Audio device connected (e.g., UltraMic 384K).
- Skip automatically when no USB-Audio device is detected.
- Fully isolated from production — can run while
just startis active. - Never included in CI pipelines — run manually via
just test-hw.
Smoke Tests¶
- Use
testcontainersto start built container images in isolation (nojust startneeded). - Require images to be built first (
just buildor pipeline Stage 9). - Only test service health endpoints and basic connectivity (heartbeats in Redis).
- Must be idempotent — running them multiple times produces the same result.
- Do NOT test deep lifecycle behavior — that belongs in
systemtests.
E2E Tests¶
- Use Playwright for browser automation.
- Test user-facing flows through the Web-Interface.
- Screenshots on failure for debugging.
- Planned for v0.9.0+ when the Web-Interface has sufficient coverage.
5. Test Quality & Anti-Patterns¶
Status: Normative (Mandatory) This section defines the qualitative boundaries for tests. It is especially critical for AI-generated code.
5.1 Anti-Patterns (What Tests Must NOT Do)¶
The following patterns are FORBIDDEN and will lead to test rejection:
- Existence/Import Tests: Tests that only verify imports or whether a function exists without asserting observable behavior.
- Trivial Equality: Tests that only assert constants or default values (unless the value itself is an explicit domain contract).
- Call-Chain Mirroring: Tests that identically replicate internal ORM/framework logic or helper structures instead of testing visible behavior.
- Mock-Heavy Verification: Tests whose primary logic consists of setting up mocks rather than verifying domain logic. If mocking is substantially larger than the assertion, the test design is flawed.
- Fragile Async Control: Async or loop tests that rely on brittle call_count checks, artificial CancelledError injections, or timing tricks when more robust alternatives exist.
5.2 Delete vs. Refactor Rule¶
Particularly for AI-generated tests, distinguish carefully between fixing and deleting: - DELETE if a test provides no clear business or architectural value. - DELETE if a test artificially inflates line coverage but would not catch a real regression. - REFACTOR if a test covers a valuable domain contract but is written in a brittle way.
5.3 Layer-Specific Quality Rules¶
- Unit Tests (
@pytest.mark.unit): - Must test the behavior of small units, not the implementation details.
- Zero I/O, zero database access, zero framework internals.
- Minimize mocking: use fakes or data structures where possible.
- Integration Tests (
@pytest.mark.integration): - Must use real PostgreSQL/Redis testcontainers. Mocking the database in integration tests is FORBIDDEN.
- Must verify actual contracts between components and external dependencies.
- System Tests (
@pytest.mark.system): - Must focus on full lifecycle effects and state transitions.
- Must assert end results, not internal call sequences.
5.4 Guidelines for AI Agents¶
- Prioritize Simplicity: AI-generated tests must favor simplicity and readability over exhaustive completeness.
- Avoid Coverage-Driven Bloat: Do not generate tests solely to increase test coverage.
- Check Before Adding: Before an agent adds a new test, it must verify whether an existing higher-level test already covers the same failure space.
- Document Intent: New tests must clearly state (via naming or docstrings) the specific behavior or regression they are safeguarding.
6. check-all Pipeline Stages¶
The just check-all command runs 12 stages in order:
| Stage | Name | Critical | Description |
|---|---|---|---|
| 1 | Lock-File Check | No | uv lock --check |
| 2 | Dep Audit | No | pip-audit (skipped by default in dev) |
| 3 | Containerfile Lint | No | Hadolint + podman-compose config validation |
| 4 | Ruff Lint | Yes | Linting + formatting |
| 5 | Mypy | Yes | Static type checking |
| 6 | Unit Tests | Yes | @pytest.mark.unit (parallel, coverage) |
| 7 | Integration Tests | Yes | @pytest.mark.integration (testcontainers) |
| 8 | Clear | Always | Clean workspace |
| 9 | Build Images | Always | just build |
| 10 | System Tests | Yes | @pytest.mark.system (real Podman, needs images) |
| 11 | Smoke Tests | Yes | @pytest.mark.smoke (testcontainers) |
| 12 | E2E Tests | Yes | @pytest.mark.e2e (Playwright) |
[!NOTE]
system_hwtests are intentionally excluded from this pipeline. Runjust test-hwseparately when hardware is available.
7. Test Infrastructure¶
| Tool | Purpose |
|---|---|
pytest |
Test runner |
pytest-xdist |
Parallel test execution (-n workers) — see §10 |
testcontainers |
Disposable PostgreSQL + Redis for integration tests |
polyfactory |
Pydantic model factories for test data generation |
playwright |
Browser automation for E2E tests |
pytest-timeout |
Global timeout per test (default: 120s) |
pytest-asyncio |
Async test support (auto mode) |
8. Naming Conventions¶
| Element | Convention | Example |
|---|---|---|
| Test file | test_<module>.py |
test_controller.py |
| Test class | Test<Feature> |
TestDeviceEvaluation |
| Test function | test_<behavior> |
test_missing_profile_stays_pending |
Test names should describe the expected behavior, not the implementation detail.
9. Parallel Execution & Isolation¶
Every test level is fully isolated. All combinations can run in parallel — with each other and with just start (production stack).
Isolation by Level¶
| Level | Container Infra | Network | Ports | Parallel-safe? | Safe vs. just start? |
|---|---|---|---|---|---|
unit |
None | None | None | ✅ | ✅ |
integration |
testcontainers |
Random (auto) | Random | ✅ | ✅ |
smoke |
testcontainers |
smoke_network (random) |
Random | ✅ | ✅ |
system |
Podman CLI | silvasonic-test-{run_id} (per test) |
Random | ✅ | ✅ |
system_hw |
Podman CLI | silvasonic-hw-test-{session_id} (per session) |
Random | ✅ | ✅ |
just start |
Compose | silvasonic-net |
Fixed | — | ✅ |
All Combinations: ✅ Safe¶
| Suite A | Suite B | Why it's safe |
|---|---|---|
just test-unit |
anything | Pure in-process, no containers, no Podman |
just test-int |
anything | testcontainers: ephemeral containers, random ports, own networks |
just test-smoke |
anything | testcontainers: own smoke_network, distinct aliases (test-database, test-redis) |
just test-system |
anything | Per-test silvasonic-test-{run_id} network via system_network fixture |
just test-hw |
anything | Per-session silvasonic-hw-test-{id} network via hw_redis fixture |
just test-system |
just start |
✅ No shared network — test and prod are fully separated |
just test-hw |
just start |
✅ No shared network — test and prod are fully separated |
just stop |
any test | stop.py filters owner=controller (exact match); tests use owner=controller-test-* |
Isolation Mechanisms¶
| Mechanism | Scope | Protects against |
|---|---|---|
system_network fixture |
Per test function | DNS alias collisions between parallel system tests |
hw_redis fixture |
Per session | DNS alias collisions between HW tests and prod |
TEST_RUN_ID (UUID per session) |
Per session | Container name collisions between parallel runs |
run_id (UUID per test) |
Per test function | Container name collisions within a single session |
io.silvasonic.owner=controller-test-<ID> |
Per session | just stop accidentally removing test containers |
testcontainers.Network() |
Per session | Integration/smoke tests vs. everything else |
podman_run(network=...) (required, no default) |
Per call | Compile-time guarantee that no caller forgets the network |
make_test_spec(..., network=...) (required kw) |
Per call | Compile-time guarantee that no caller forgets the network |
[!NOTE] The production stack guard (
_abort_if_prod_running()) was removed. It is no longer needed because all system and hardware tests create their own isolated Podman networks and never sharesilvasonic-netwith the production Compose stack.
10. Parallel Workers & Configuration¶
All just test-* commands delegate to scripts/test.py — the single source of truth for pytest invocations, worker counts, and coverage arguments.
Worker Defaults¶
Worker counts and their environment variable overrides are defined in scripts/test.py — the single source of truth. Do not duplicate the defaults here. Refer to the constants at the top of that file (UNIT_WORKERS, INTEGRATION_WORKERS, SYSTEM_WORKERS) for current values.
Override via environment variables: SILVASONIC_UNIT_WORKERS, SILVASONIC_INTEGRATION_WORKERS, SILVASONIC_SYSTEM_WORKERS.
# Temporary override for a single run
SILVASONIC_INTEGRATION_WORKERS=8 just test-int
# Or export for the session
export SILVASONIC_UNIT_WORKERS=12
just check
[!WARNING] System test hard ceiling: ~6–7 workers. The rootless Podman socket is a shared bottleneck. At 8 workers,
testcontainershits 60s read timeouts on the Podman API, causing most tests to SKIP or ERROR.
Integration Test DB Cleanup¶
With pytest-xdist, each worker gets its own session-scoped postgres_container (via testcontainers). Tests on the same worker share that DB and run sequentially. An autouse _clean_db_tables fixture (in each conftest.py) deletes all application rows between tests in FK-safe order, ensuring no cross-test contamination.
Affected conftest.py files:
- services/processor/tests/integration/conftest.py
- services/controller/tests/integration/conftest.py
- tests/integration/conftest.py
[!IMPORTANT] When adding new tables to the database schema, you MUST add them to the
_CLEANUP_TABLEStuple in eachconftest.pyabove, respecting FK order (children before parents).
See Also¶
scripts/test.py— Single source of truth for test commands and worker counts- AGENTS.md §6 — Testing rules (markers, directory structure)
- AGENTS.md §5 — Approved test libraries
- Release Checklist — Quality gates per release type