Test automation & scripting architecture (plan)¶
Status: largely implemented (2026-05). The functional E2E harness
(pytest) is built and headless in CI as tools/roosttest/;
the tooling is reorganized into three layers (see
tools/README.md); and tab.dump, wait, and
the palette.* ops shipped. The plan below is kept as the design
rationale — the "Gap" entries it lists are mostly closed now. The Lua
scripting layer (§ below) remains the open piece. The north star is
canonical in vision.md; §0 here
is the testing-lens recap.
Audience: Claude (primary) + the maintainer. Targets: this Mac, Macs in
general, the Pop!_OS (COSMIC/Wayland) box, and CI (Linux + macOS runners).
This doc plans two intertwined things the maintainer asked to grow together:
- A Lua scripting layer in
roostctlthat can set up and mutate application state (projects, tabs, focus, …) in multi-step actions — surfaced to users through the Cmd/Alt+Shift+T launcher, and reused wholesale by tests. - Functional, automated tests that exercise the real app on both UIs, in CI, giving confidence that basic flows work on every change.
The thesis: these are the same substrate. A control protocol rich enough to script the app for a power-user launcher is exactly what an automated test driver needs. Build the substrate once; let the launcher and the test suite both stand on it.
0. North star¶
Every way to drive Roost — mouse/clicks, hotkeys, the CLI, and Lua scripts — converges on one core: the workspace operation set (open/close/focus tab, create/rename/delete/reorder project, set-state, notify, dump, … plus a few view ops like screenshot / open-palette). Each surface is a thin adapter onto that core; the UI is a reaction to the core's events, never its own source of truth.
roostctl (CLI) ─┐
Lua scripts ────┤──▶ IPC handler ──┐
├─▶ workspace op set ──emit──▶ events ──▶ UI re-renders
mouse / clicks ─┐ │ (THE CORE)
hotkeys ────────┤──▶ UI dispatch ───┘
- CLI + Lua are out-of-process → reach the core over the IPC socket (the handler is their adapter; Lua sits on top of the same op set).
- Clicks + hotkeys are in-process → call the same op set directly (their adapter is the UI command / keybind handler).
- A hotkey (
Cmd+Shift+T), aroostctlcall, and a Lua script all invoke the same command — e.g. "run action" or "open tab".
One contract, two implementations. There is no shared codebase
core — Swift and Rust can't share one. There is one shared contract —
the IPC op set in roost-ipc — implemented by Swift Workspace +
AppKit and Rust Workspace + GTK. "Same interface" means same op
contract + behavioral parity, which the cross-platform E2E suite (below)
exists to enforce. Per platform: identical command surface,
platform-specific guts (forkpty vs portable-pty, Core Graphics vs
Cairo).
Two seams (both firmed up in the IPC refactor on #106):
- surfaces → core (commands in): CLI/Lua via IPC, UI/hotkeys direct. The convergence goal — partially there; every UI/hotkey action should route through the op set, not divergent local logic.
- core → UI (view reach-back: screenshot/dump/activate): GTK's one
UiRequestchannel, Mac's oneUiBridgeseam.
Why this is the north star: it buys the three things we optimize for at once —
- Testability — tests drive the same op set users do and assert on its events/state; no test-only backdoors that drift from reality.
- Programmability — the op set is the public surface; Lua actions and the launcher are first-class clients of it, same as the CLI.
- Clean architecture — one place owns each mutation; the UI is a pure projection of core state; adding a capability means adding an op + thin adapters, not bespoke logic per surface.
Every decision below (and in P2+) is measured against this: does it route through the one op set, keep the UI reactive, and stay at parity across both implementations?
1. Goals & non-goals¶
Goals
- Drive the running app deterministically from outside the process and assert on the result — terminal content (text), workspace state (tabs/projects/agent-state/notifications), and rendering (pixels).
- Run a functional E2E suite headless in CI on both UIs.
- A Lua scripting surface in
roostctlfor multi-step state setup / mutation, shared by the launcher and tests. - Zero-to-few runtime dependencies; cross-platform (macOS + Linux); legible and extensible by an agent.
- Kill
sleep-based flakiness: wait on conditions, not wall-clock.
Non-goals (for now)
- Pixel-perfect golden-image diffing of the whole window. We assert content via text and rendering via targeted color/þcell checks.
- Testing the OS-level input encoders (key/mouse → bytes) in CI. That stays a local smoke (see Tier 2), because uinput/CGEvent injection needs privileges / Accessibility and a real compositor.
- Replacing the unit/integration tests. They stay the fast first line.
2. Current state¶
| Layer | What exists | Gap |
|---|---|---|
| Unit / integration | cargo test --workspace (Rust: IPC, OSC, vt, target picker, persistence) + swift test (190 tests: Workspace state machine, IPC dispatch, persistence) |
No coverage of the live app (PTY, rendering, IPC end-to-end). |
| IPC surface | roost-ipc: tab/project CRUD, set-state, notify, focus, send, resize, reorder, screenshot, claude-hook, identify — now also tab.dump (content), palette.* (UI-action), and roostctl wait. |
Copy/paste + live events.subscribe still unimplemented. |
| Event stream | UIs consume an in-process event bus. events.subscribe over the wire is stubbed not-implemented on both UIs (mac/Sources/Roost/IPCHandlerImpl.swift, crates/roost-linux/src/ipc.rs). |
External clients can't wait on events yet (the pytest harness condition-waits via polling instead). |
| Render state | roost-vt RenderState.walk(|cell| …) yields Cell { text: String /*grapheme*/, fg, bg } + cursor; mirrored 1:1 in mac/Sources/Roost/RenderState.swift. Both UIs walk it to draw. |
Now exposed over IPC as text via tab.dump (viewport only; scrollback is a follow-up). |
| Tooling | Three layers (see tools/README.md): tools/roosttest/ (pytest, IPC, in CI), tools/screenshot/ (bash + roostctl + pngtool, visual), tools/input/linux/ (uinput/clipboard, real input). |
Real-input (Layer 3) + visual (Layer 2) are local-only; a Mac CGEvent injector is still to come. |
This unified design is now realized: the pytest harness (tools/roosttest/)
is Tier 1, and the screenshot + input harnesses are reorganized by layer.
The Lua scripting substrate below is the remaining piece.
3. Principles¶
- Robustness lives in the driver + app affordances, not the test language. Flake-resistance comes from (a) waiting on the event stream, (b) reading content as text, (c) reproducible rendering, and (d) driving via IPC instead of OS input. These are shared no matter what language the test cases are written in. (This reframes the language question — see §7.)
- Drive through the control protocol. The IPC socket is the seam. Driving via IPC (not synthetic keystrokes) is deterministic, headless, and — critically — needs no macOS Accessibility (TCC) grant and no Wayland pointer mapping, which is what makes "Mac E2E in CI" tractable.
- Determinism by construction. A test mode pins window geometry, font, and animations so screenshots and reflow are reproducible across machines and DPI.
- One substrate, two consumers. The Lua/IPC verbs power both the launcher and the tests. Tests can invoke a launcher action and assert its effect — the feature tests itself.
4. Testing tiers¶
| Tier | What | Where it runs | Speed |
|---|---|---|---|
| 0 — unit/integration | cargo test, swift test. Pure logic: state machine, IPC dispatch, OSC, persistence, key-encoder tables. |
CI (exists) + local | seconds |
| 1 — functional E2E | Launch the real app; drive via IPC/Lua; assert via tab dump (text) + tab list (state) + targeted screenshot color checks. Covers: open project/tab, run a command and read its output, state→color, notification + badge, focus switch, session restore, cascade-close, launcher actions. |
CI on both UIs (Linux xvfb, macOS GUI session) + local | seconds–low minutes |
| 2 — real-input smoke | OS-level key/pointer injection (tools/input/linux uinput; a Mac CGEvent/AppleScript equivalent) exercising the encoder + gesture path, verified by screenshot. |
Local only (Pop!_OS, Mac) | minutes, manual-ish |
Tier 1 is the new center of gravity and the CI confidence-builder. Tier 2 stays local because injecting real input needs privileges/Accessibility and a live compositor — not worth the CI fragility when Tier 1 already covers behavior.
5. App / CLI / IPC refactors (the affordances)¶
These are the enabling changes. All are additive to the wire protocol.
5.1 tab dump — terminal content as text (highest leverage)¶
New IPC op + roostctl tab dump --tab N [--scrollback] [--json].
// request: {"op":"tab.dump","params":{"tab_id":61,"scrollback":false}}
// response:
{
"cols": 120, "rows": 30,
"cursor": {"row": 1, "col": 14, "visible": true},
"rows_text": ["/private/tmp $ echo hi", "hi", "/private/tmp $", ""]
// optional --json adds per-cell fg/bg for color assertions
}
Implementation: walk the existing RenderState (Cell.text per cell,
concatenated per row, trailing blanks trimmed) on each UI's main thread —
the same walk both renderers already do. This is the determinism unlock:
tests assert exact text (assert dump.contains("hi")) instead of OCR or
pixel-matching. Low risk; both UIs already have the walk.
5.2 events.subscribe over the wire + roostctl wait / events¶
Implement the currently-stubbed events.subscribe op on both UIs:
bridge the in-process event bus to the IPC connection (the GTK side
already has the in-process events.subscribe(); the Mac side has the
@MainActor event stream App.swift consumes — both need a wire fan-out).
Then:
roostctl events --follow— stream events as JSON lines (debugging + driver consumption).roostctl wait --tab N --state idle --timeout 5(and--tab-count,--notification,--project-count) — block until satisfied or exit non-zero on timeout.
Fallback if wire-events slip: wait can poll tab list/identify
on an interval initially; swap to event-driven once subscribe lands. The
interface (roostctl wait …) is stable either way, so tests don't
churn. Either way, no test ever calls sleep.
5.3 IPC UI-action ops¶
Expose the actions currently reachable only by keyboard/mouse so tests (and Lua actions) can trigger them without synthetic input:
ui.open_launcher,ui.open_palette,ui.dismiss_overlaytab.copy/tab.paste(drive the clipboard path deterministically)- (later)
ui.select_palette_item, query overlay state for assertions
Each maps to the same handler the keybind already calls. This is what lets Mac E2E avoid TCC entirely.
5.4 Test mode (ROOST_TEST_MODE=1)¶
Status: partially landed (the IPC scaffolding PR closes the IPC-side scaffolding;
rendering-reproducibility knobs still open.) Originally planned as
ROOST_TEST=1 / --test-mode. Now shipping under
ROOST_TEST_MODE=1 so it's unambiguous which subsystem the gate
belongs to.
The implemented surface (the IPC scaffolding PR):
tab.feed_pty_bytes: inject bytes into a live tab's PTY-output drain. Indistinguishable from real shell output to the OSC scanner + libghostty (samempsc::UnboundedSender<TabOutput>on GTK; sameTerminalView.appendBytes(_:)on Mac — no shadow drain). Gated.tab.capture_pty_input: read (and optionally drain) the bytes the UI has queued onto a tab's PTY-input channel — keystrokes, paste, synthesised OSC reply replies. Single tap point insideTabSession::send_input(GTK) / theonKeyclosure (TabSession.start()/attach()on Mac) catches everything. Gated.tab.dump_resolved: walk the viewport through the sameresolve_cell_colorscall the production paint path runs (withtheme.bold_color). Ungated — no shadow state, just a richer read of the existing render output.
Both gated ops surface not-enabled when the env var is absent (a
deterministic error rather than silent acceptance). The flag is read
once at UI boot and stashed on App / RoostBackend.shared so per-op
dispatch is a cheap bool check; a tester can't toggle the gate
mid-session.
Open: rendering reproducibility (fixed geometry / font / animations off / never steal focus). The renderer + window-creation paths are the remaining work for full visual-determinism — independent from the IPC scaffolding above. Same flag, same gating shape; new knobs join later without breaking the IPC scaffolding PR's surface.
Vision compliance check (docs/development/vision.md: "No test-only
backdoors that drift from reality"):
- Feed/capture ride the same channels production uses. Bytes go
through the OSC scanner + libghostty exactly the way real PTY
output does; captured bytes are mirrored from the same
send_input/onKeypath that hands bytes to the supervisor. Nothing is observed in a shadow surface that the real renderer doesn't also see. tab.dump_resolvedis ungated for the same reason: it's a richer read of an existing surface, not a new one.
5.5 Wire/versioning notes¶
All ops are additive; bump nothing that breaks existing clients. tab
dump's --json cell schema is the one place to design for forward
compat (optional fields). Document each new op in
docs/reference/ipc.md.
5.6 Hermetic runs, harness flags & the skip policy (2026-05)¶
The pytest harness is parameterized + configured by these knobs (full
operational detail in tools/roosttest/README.md):
Make targets. make e2e / e2e-gtk / e2e-mac are the quick local
runs (reuse a running UI if present). make e2e-gtk-ci / e2e-mac-ci
reproduce CI exactly — they set ROOST_TEST_MODE=1 and --roost-fresh, so
a local run exercises the same set CI does (no test-mode-gated tests
silently skipped). e2e-mac-ci is destructive (force-quits a running
Roost.app); the targets are labeled accordingly.
Environment / flags.
| Knob | Set by | Effect |
|---|---|---|
ROOST_TEST_MODE=1 |
CI jobs; make *-ci |
Unlocks the gated test-only IPC ops (§5.4). Read once at UI boot. |
--roost-fresh / ROOST_TEST_FRESH=1 |
make *-ci; CI |
Harness owns a fresh, hermetic instance: force-quit any running UI (lock-safe on Mac), launch with isolated state, always quit at teardown. Also flips precondition-skips to hard failures (below). |
ROOST_STATE_DIR |
harness (per-run mkdtemp) |
Prod env, both UIs. Redirects only state.json's dir; socket/lock/log stay on the default path so the harness still finds the UI. Gives each run a throwaway workspace — never touches the dev's real saved tabs. (See paths.md.) |
ROOST_DEFAULTS_SUITE |
harness (Mac) | Prod env, Mac only. Redirects the macOS app's UserDefaults to a throwaway suite (sidebar visibility/width) — the UserDefaults analog of ROOST_STATE_DIR, which can't reach it. |
ROOST_TEST_TIMEOUT_SCALE=3 |
CI (slower runners) | Scales every wait_* budget. Local default 1. |
ROOST_CONFIG |
harness | Points the UI at fixtures/launcher.conf (seed launcher commands). |
ROOST_TEST_RESET_STATE (a former gate that deleted ~/Library/.../state.json
on Mac) was retired — the throwaway ROOST_STATE_DIR subsumes it
without a destructive delete.
The skip policy (the trustworthiness rule). A skip must mean only
"this environment genuinely can't exercise this" — never "the setup didn't
work" or "we didn't turn the mode on." Enforced by three helpers in
tools/roosttest/util.py:
precondition(ok, reason)— a setup precondition (seed config present, OSC 7 cwd tracking working) is a hard failure in fresh mode (the harness guarantees the environment, so a failure is a real regression); a graceful skip otherwise (an ad-hoc dev UI may lack the capability).skip_on_ci(reason, alt_coverage=…)— for the rare test that genuinely can't run remotely (e.g. a quit→relaunch lifecycle under bare xvfb); it must cite where the regression class is otherwise covered (e.g. the sidebar-persistence relaunch e2e is alt-covered by a Rust unit test + a SwiftUserDefaultstest).cwd_reaches(...)— the scaled, shared cwd-poll (replaced per-file copies that ignoredROOST_TEST_TIMEOUT_SCALE).
Every run prints a SKIPS: N summary listing each skipped test + reason
(conftest.py::pytest_terminal_summary), so a run that quietly skipped
half the suite can never read as "all green" — the failure mode that
motivated this rule. Capability skips that shouldn't happen on the
platform that owns a feature (e.g. a shell-integration test needing zsh)
are a CI-runner-provisioning gap, tracked separately, not silently normal.
6. Lua scripting layer¶
6.1 Engine & placement¶
Embed Lua in roostctl (Rust) via mlua (Lua 5.4). Dependency
justification per CLAUDE.md: no pure-Rust Lua is production-grade
(hematita/piccolo are immature); mlua is the mature, widely-used
binding. Constraint named, wrapper kept small — the engine only exposes a
curated roost table that forwards to the existing IPC client.
The UIs do not embed Lua. The Cmd/Alt+Shift+T launcher runs an action
by shelling out to roostctl run <action.lua>, which scripts the running
UI back over IPC. One Lua host (the CLI), identical code path for launcher
actions and tests.
launcher (Mac/GTK UI) ──exec──▶ roostctl run action.lua ──IPC──▶ UI workspace
test runner ───────────────────▶ roostctl run test.lua ──IPC──▶ UI workspace
6.2 API surface (sketch)¶
-- queries
roost.identify(); roost.projects(); roost.tabs()
local d = roost.dump(tab) -- {cols,rows,cursor,rows_text=…}
-- mutations
local p = roost.create_project{name="review", cwd="/repo"}
local t = roost.open_tab{project=p.id, cwd="/repo", title="build", cmd="…"}
roost.set_state(t, "running"); roost.focus(t); roost.notify{tab=t, title="…"}
roost.send(t, "echo hi\n"); roost.close_tab(t)
-- synchronization (no sleeps)
roost.wait{tab=t, state="idle", timeout=5}
roost.wait_for(function() return #roost.tabs() == 2 end, 5)
-- rendering (Rust-backed, in-process screenshot)
roost.screenshot{out="/tmp/x.png", scale=2}
roost.pixel(x, y); roost.find_color("#f0a040") -- locate a UI element
-- assertions
expect(cond, "msg"); expect_eq(a, b); expect_contains(d.rows_text, "hi")
The same primitives express a launcher action ("spin up my review
layout: project + 3 tabs running these commands") and a test
("open a tab, send echo hi, wait for prompt, assert dump contains hi").
6.3 Launcher integration (the product feature)¶
Actions are named Lua scripts discovered from config (e.g.
~/.config/roost/actions/*.lua and/or a repo-local .roost/actions/).
The launcher lists them; selecting one runs roostctl run against the
current UI. Built-ins ship in-tree. Config format + discovery to be
specified in the launcher PR.
6.4 Trust / safety¶
Lua actions run arbitrary local code (they can spawn shells via
tab send). That's acceptable for local, user-authored scripts — same
trust level as a shell rc. We do not execute actions from untrusted
sources, and the IPC socket stays user-only (0600, already the case). No
network in the exposed roost table.
7. Test-language decision (decided 2026-05-26)¶
Decision: pytest drives the tests; Lua is a scoped user-scripting surface, not the test mechanism (see vision.md DL-12). The analysis that led there is kept below; the key insight that made it low-stakes is that E2E robustness lives in the affordances, not the runner.
What actually drives E2E robustness (flake resistance, good failures):
| Robustness factor | Comes from | Language-dependent? |
|---|---|---|
| No sleeps / wait-on-condition | roostctl wait + event stream (§5.2) |
No |
| Deterministic content assertions | tab dump (§5.1) |
No |
| Reproducible rendering | test mode (§5.4) | No |
| No TCC/uinput flake | drive via IPC (§5.3) | No |
| Clear failure output (expected vs actual) | the runner | Yes |
| Fixtures / setup-teardown / parametrize | the runner | Yes |
| Reporting (JUnit/HTML), retries, timeouts, parallel | the runner | Yes |
| Maintenance burden of the runner itself | the runner | Yes |
So: the flake floor is identical for Lua or Python — it's set by the shared affordances. The language only changes ergonomics and reporting, plus how much harness code we own.
| Option | Pros | Cons |
|---|---|---|
Lua runner (in roostctl) |
One language; zero runtime deps (just the binary) — ideal for CI + an agent; dogfoods the launcher; same helpers as actions | We hand-roll the runner (discovery, fixtures, JUnit XML, timeouts) — new code we own; thinner ecosystem |
| pytest | Mature fixtures/parametrize/reporting/retries; assertion introspection; reuses #103's Python | Python runtime on every box + CI (cheap, but real); a second language; separate from the app |
| Hybrid (pytest runner over the shared roostctl/Lua/IPC layer) | pytest ergonomics and the Lua launcher; can E2E-test launcher actions; clean role split — Lua = what the app does, Python = how we assert | Two languages to keep coherent; most moving parts |
The decision: pytest as the test runner; Lua scoped to user
scripting. Tests are pytest over the IPC op set (plus roostctl /
shell for the simplest cases) — its fixtures, parametrization over the
2-UI matrix, and reporting cut the harness code we'd otherwise own, and
the flake-killing affordances (roostctl wait, tab dump) live in the
app, so the runner choice doesn't move the robustness floor. Lua is
deliberately not the test runner. It is a user-facing scripting
surface — the Cmd+Shift+T launcher and complex user-authored multi-step
actions — added where it earns programmability and not over-invested as
test infrastructure. Both stay thin adapters onto the same op set: a
pytest step and a Lua action invoke identical ops, so neither can drift
from what users actually drive.
Concretely: pytest fixtures launch/quit each UI and yield a thin Python
Roost client (wraps the socket + roostctl); tests assert with plain
assert; where a test needs to exercise the launcher path, it runs the
Lua action and asserts the resulting state via the op set.
DECISION (2026-05-26): ☑ pytest runner for tests + scoped Lua for user scripting. Supersedes the earlier "hybrid (pytest + heavy Lua)" lean now that programmability + clean-architecture are explicit north-star goals; Lua's role narrows to user-facing.
8. CI design — Linux + Mac E2E¶
The maintainer chose both platforms now. Feasible because Tier-1 drives via IPC + in-process screenshot (no TCC, no compositor capture).
Linux (GTK):
- Runner: ubuntu-latest. Deps: libgtk-4-dev libadwaita-1-dev, the
ghostty prebuild (reuse the existing gtk CI cache), Python (hybrid) or
nothing (Lua-only).
- Display: xvfb-run -a with GDK_BACKEND=x11 (the Cairo/Pango
GtkDrawingArea renders fine under Xvfb; in-process screenshot
doesn't need a compositor). Headless Wayland (weston --backend=headless
/ sway --headless) is a fallback if an X11-only quirk appears.
- Run: build roost + roostctl, launch under ROOST_TEST_MODE=1, run the
Tier-1 suite via tools/screenshot/launch.sh gtk → runner.
macOS:
- Runner: macos-latest (GUI session present; AppKit windows work).
- Build + bundle (or run the unbundled swift run Roost — TBD which is
lighter for tests; the IPC socket comes up either way). Launch under
ROOST_TEST_MODE=1; the in-process renderer works unfocused, so no
screencapture entitlement and no Accessibility grant (we never inject
OS input in Tier 1).
- Risk was app launch/quit hygiene and runner image quirks. Resolved and
promoted (2026-05): e2e-mac is now required (in ci-success). It
proved stable on main (26P/1S, ~8s); the only failures ever seen were a
real clean-install crash it correctly caught. The one cascade mode — a
crashed instance leaving a held single-instance flock that wedges the next
launch — is handled by pre-launch hygiene in the harness
(tools/roosttest/ui.py _mac_cleanup(): kill any leftover, then unlink
the stale socket + lock) plus a one-shot launch retry; timeouts scale via
ROOST_TEST_TIMEOUT_SCALE=3 on the slower runner. No pytest reruns —
parity with e2e-gtk, so a genuine intermittent bug isn't masked. Both
e2e jobs now run --roost-fresh with a throwaway ROOST_STATE_DIR for
hermetic state (§5.6); this replaced the old ROOST_TEST_RESET_STATE
state.json delete.
Both:
- Path-filtered like the rest of ci.yml (run only when relevant code
changes). Cache cargo + ghostty. Emit JUnit XML (hybrid: pytest
--junitxml; Lua: runner emits it) for GitHub test annotations. Upload
screenshots + manifest.md as artifacts on failure.
- Keep Tier 0 as the fast gate; Tier 1 runs after build.
9. Determinism strategy¶
- Content: assert via
tab dumptext, not pixels. Normalize the shell (setPS1,clear) or run a fixedcmdin the tab so output is stable (avoids the👻-prompt variability seen in manual testing). - Rendering: Tier-1 pixel checks are targeted — "the cell at the
needs_input pill is amber
#f0a040" viafind_color, not whole-window diffs. Test mode fixes geometry+font so even those are stable. - Timing: only
roostctl wait/wait_for. Nosleepin any test. - Isolation: each test creates its own project and cascade-closes it
(the smoke already does this); a fixture guarantees cleanup even on
failure. Implemented (2026-05): a harness-launched UI also gets a
throwaway
ROOST_STATE_DIR(+ROOST_DEFAULTS_SUITEon Mac), so a run never touches the dev's realstate.json/prefs — see §5.6.
10. Relationship to #103 / #104¶
- #104
tools/screenshot/(bash smoke): keep. Its scenario shape (create project → states → notify → focus → hook → cascade) becomes the first Tier-1 cases. The bash version remains a zero-setup smoke until the runner supersedes it. - #103
tools/input/linux/(uinput/PNG/clipboard): keep as the Tier-2 real-input layer. Itspngtoollogic informsroost.pixel/find_color; its uinput injector is the Linux half of the Tier-2 smoke. A Mac CGEvent equivalent is the other half (later). - Land both now, resolve the one-line
CLAUDE.mdTroubleshooting conflict (both add adjacent bullets — they coexist). The unified harness lands separately, on top of §5.
11. Risks & mitigations¶
| Risk | Mitigation |
|---|---|
| macOS app won't run cleanly in CI | Resolved (2026-05): promoted to required after proving stable. Drives via IPC (no TCC); the harness clears a stale lock/socket before launch and retries the open once; timeouts scale via ROOST_TEST_TIMEOUT_SCALE. |
events.subscribe wire work is bigger than hoped |
Ship roostctl wait polling-backed first; swap to events later behind the same interface. |
tab dump differs subtly Mac vs GTK |
Golden the dump format in a cross-UI test (same cmd, assert identical rows_text); both walk the same RenderState shape. |
Lua (mlua) C-dep friction in CI |
It builds vendored Lua; cache the cargo artifacts; it only lands in roostctl, not the UIs. |
| Two harness entry points confuse future work | This doc + a single tools/README.md map (Tier 0/1/2) once the unified harness lands. |
| Screenshot flake across DPI/machines | Test mode pins geometry+font; prefer text assertions; targeted color checks only. |
12. Phased rollout¶
Each phase is an independently reviewable PR (or a small stack), gated on green CI, merged manually per branch policy.
- P0 — coordination. Land #103 + #104 (resolve
CLAUDE.mdconflict). Done when: both merged,ci-successgreen. - P1 — content + waiting (the backbone).
tab dump(IPC + both UIs +roostctl);events.subscribeover the wire (both UIs) +roostctl wait/events; unit tests for dump + a Rust/Swift test for the wire event fan-out. Done when:roostctl tab dumpandroostctl waitwork against both UIs locally; nosleepneeded to observe a state change. - P2 — test mode + Tier-1 harness skeleton.
ROOST_TEST_MODE(fixed geometry/font/no-anim); the runner (per §7 decision) + 3–4 ported smoke cases; runs locally on both UIs. Done when:make e2e(orroostctl test) is green locally on Mac + GTK. - P3 — CI. ✅ Linux xvfb E2E job (required) + macOS E2E job. JUnit +
artifact upload. Done: Tier-1 runs on PRs touching relevant paths, and
the macOS job was promoted to required (
ci-success) once stable; releases gate on the sameci-successviarelease.yml'sci-gate. - P4 — Lua engine.
mluainroostctl; theroostAPI table;roostctl run <script.lua>; convert the Tier-1 helpers to use it (or the Lua smoke). Done when: a Lua action script can set up a multi-tab layout end-to-end. - P5 — launcher actions. Wire Cmd/Alt+Shift+T to discover + run Lua
actions via
roostctl run; ship a couple of built-in actions; docs. Done when: selecting a launcher action mutates the live workspace. - P6 — Tier-2 real-input smoke + consolidation. Fold #103 into the
Tier-2 layer; add the Mac CGEvent injector; write the
tools/README.mdtier map; decide #104's fate. Done when: a real-keystroke smoke passes locally on Pop!_OS and Mac.
P1 is the linchpin — everything downstream leans on tab dump + wait.
13. CLAUDE.md updates (written for the agent)¶
When this lands, CLAUDE.md Troubleshooting/Testing should tell an agent,
prescriptively:
- To verify a change on the live app:
tools/screenshot/launch.sh <mac|gtk>, then drive withroostctl(tab dumpto read content,waitto synchronize,screenshotto see). Neversleep. - To run the functional suite: the one command (
roostctl test …orpytest tools/roosttest -m e2e --target <mac|gtk>), and how to read the JUnit/artifacts. - To add a test: where cases live, the fixture that gives a clean
workspace, and the assertion helpers (
dump,tab list,find_color). - To add a launcher action: where actions live and the
roostLua API. - Tier map: 0 =
cargo/swift test; 1 = functional E2E (CI, both UIs); 2 = local real-input (tools/input/linux+ Mac equivalent).
The guiding rule for these docs: an agent should be able to go from "I changed X" to "here's the exact command that proves X still works" without guessing.
Open decisions¶
- ~~Test runner language~~ — DECIDED (§7 / DL-12): pytest runner; Lua scoped to user scripting. Unblocks P2.
- ~~macOS CI launch~~ — DECIDED (2026-05): bundle via
bundle.sh+ launch throughopen; the macOS E2E job is now required. The harness clears a stale lock/socket before launch and retries once; timeouts scale viaROOST_TEST_TIMEOUT_SCALE. Unblocked P3. - Launcher action discovery — global (
~/.config/roost/actions/), repo-local (.roost/actions/), or both; built-ins in-tree. Blocks P5. - ~~Temp-workspace isolation~~ — DECIDED (2026-05): both. Tests
keep per-test
projectcreate/cascade-close hygiene and a harness-launched UI runs against a throwawayROOST_STATE_DIR(+ROOST_DEFAULTS_SUITEon Mac), so a run never touches the dev's real workspace. See §5.6.