0002 — Diff-based wire protocol, not VT byte replay
0002 — Diff-based wire protocol, not VT byte replay
TL;DR. Historical decision to ship structured cell-level diffs from server to clients instead of VT byte replay. The cost model (parsing as the dominant cost; one parse vs three) turned out to be wrong at modern libghostty speeds, and protocol-design cost was perpetual. Superseded in full by ADR-0013; preserved as historical context.
Status update (2026-05-25): SUPERSEDED by ADR-0013. The cost model that motivated this decision was wrong (parse cost is invisible at modern libghostty speeds; protocol-design cost is perpetual). The decision below is preserved as historical context. Do not implement against it.
Status: Superseded by ADR-0013 Date: 2026-05-24 Superseded: 2026-05-25
Context
A multiplexer must transport pane state from a server to one or more clients. Two approaches exist:
- VT byte replay (tmux): the server emits raw VT bytes for the client to re-parse.
- Structured diff (Mosh, RDP, this proposal): the server holds the canonical grid; the wire carries cell-level diffs; the client renders directly.
Decision
phux uses a structured cell-level diff protocol. See SPEC.md §8.
Rationale
- Avoids round-tripping through a parse state machine. tmux: parse VT → grid → synthesize VT → parse VT again at the client. phux: parse VT → grid → diff → render. One parse instead of three.
- Frontend-agnostic. A native GUI client (libghostty surface) renders diffs straight to a GPU surface. A TUI client converts to VT at the edge. Same protocol; both clients are possible.
- Server is authoritative. No client-side parser inconsistencies. SGR ambiguity, true-color vs 256-color downsampling, hyperlink interpretation — all resolved server-side before the cell hits the wire.
- Frame pacing is natural. Diffs can be coalesced, dropped, or replaced by a snapshot under backpressure. A byte stream cannot be paced; a structured frame stream can.
- Faithful key events. Once we accept that the protocol carries
semantics, the natural extension is to do the same for input.
KeyEventstructures preserve modifier-rich events through the multiplexer in a way raw bytes cannot. This is independently important; the diff decision opened the door.
Tradeoffs
- TUI client carries a small composer. Cell diffs → VT for the
outer terminal is real work. But it is well-understood work (see
tmux’s
tty.c), and it is done once at the edge instead of once per emit at the server. - Protocol design surface is larger. A byte stream has no design; a diff protocol has a wire format that must be specified, tested, and versioned. We accept the cost because we get architectural payoffs that compound over the system’s lifetime.
Prior art
- Mosh’s State Synchronization Protocol does this for SSH.
- RDP and SPICE do this for full GUI sessions.
- libghostty-vt’s
RenderStateis designed to enable diff consumers.
Alternatives considered
- Raw VT byte forwarding (tmux’s model). Simplest possible protocol. Loses everything described above. Structurally caps the ceiling of what a multiplexer on top can become.
- Hybrid: VT bytes for legacy clients, diffs for new ones. Doubles protocol surface, hides decisions, encourages “just hack it in as VT”. Rejected.