Quality bar: testing and performance
evolvingQuality bar: testing and performance
TL;DR. The quality bar for phux has two halves. Tests run in three layers (unit tests beside code, property tests for codec and state-machine invariants, snapshot tests for wire bytes and rendered frames). Performance is measured, not guessed: a small fixed set of throughput, fanout, and reattach numbers, plus a release profile tuned for shipped-binary speed. Most of this describes the intended bar; benches and mutation testing arrive with the code that needs them.
Test strategy
Tests are organized in three layers.
-
Unit tests colocated with the code they cover.
-
Property tests (
proptest) for invariants that should hold across arbitrary inputs:- Protocol codec roundtrip: encode, decode, and the result equals the input. This is the codec’s primary safety net.
- State-machine invariants, for example that after any sequence of commands the layout tree stays well-formed.
- Replay equivalence: for any PTY byte stream, writing those bytes into a
fresh
Terminalon the client reproduces the same visible grid the server’sTerminalsaw, up to the documented downsampling rewrites. The snapshot-on-attach synthesis algorithm is checked the same way: synthesize, replay into a freshTerminal, and compare the resultingRenderStatesnapshots. Seestate-sync.mdfor the synchronization model this verifies andresearch/2026-05-25-libghostty-renderstate.md§7 for the synthesis algorithm.
-
Snapshot tests (
insta) for outputs that should change only on purpose:- Wire bytes of representative messages, so an accidental format change is loud rather than silent.
- Rendered TUI frames, via a cell-grid to ASCII-art helper.
Mutation testing with cargo-mutants is planned for once the codebase is
substantial enough to warrant it; the intended bar is a mutation score above
90% on the protocol and core crates. It is not running today.
Performance
phux does not optimize speculatively. A fixed set of numbers is measured so regressions are visible:
- Single-pane throughput under a
yesflood, with tmux as the reference point. - Multi-pane fanout: one server, N clients, M panes.
- Reattach latency for sessions with large scrollback.
Benchmarks live in benches/ per crate using criterion, added as the code
under measurement lands. The release profile uses fat LTO and a single
codegen unit, since the speed of the shipped binary is a goal in its own
right.