Run queued
Bug report normalized and waiting for a local workspace.
A local Codex repair loop that reproduces a defect, writes the regression test, patches the source, verifies the suite, and presents a reviewable diff.
demo-repo
fix/roi-zero-equity
not-started
Reproduced
Patched
Verified
Ready to Review
Select a seeded failure and stage the repair prompt.
Selected target
ROI calculator crashes when equity is zero.
Triage, test, patch, run, and review are separated into visible handoffs.
Bug report normalized and waiting for a local workspace.
The repair agent will add a failing test before patching source.
Final Vitest output and diff will replace the preview after the API returns.
Regression first, full suite after the patch.
Before patch
1 failed
After patch
12 passed
$ pnpm test RUN v2.1.9 .workspaces/preview waiting for repair run...
Reviewable unified diff from the isolated workspace.
Judge-ready explanation of the verified patch.
Start a repair to generate a Codex-authored PR title, summary, changed files, test command, and risk level.
Run the repair loop to unlock the merge-ready state.
Judge-facing proof that the agent reproduced, patched, verified, and packaged the run.
Report
Reproduce
Patch
Verify
PR
Bug report
When a user enters equity = 0, the ROI calculator returns Infinity. Expected behavior: return 0 and keep the report stable.
Runtime
25s
Changed files
2
Retry count
0
Local signals that make the coding-agent run measurable for Loops House and judges.
100%
3/3 golden runs passed
25s
end-to-end evidence runtime
3/3
seeded bugs have golden evidence
25 lines
low risk
Judging signal map
Agentic Coding
Codex performs the test-first repair loop inside an isolated workspace.
Building Evals
Runs expose pass/fail, runtime, diff size, changed files, retry count, and risk.
UX
Timeline, evidence, diff, tests, and PR summary are visible in one local cockpit.
Selected run evaluation