Anyone can wrap an LLM around "write me some Verilog." The hard, defensible part is the closed loop: real simulation, reliable failure parsing, and repair that converges.
Five LLM agents coordinate; one deterministic tool runner does the real work. The orchestrator enforces a max-iteration budget so every run terminates.
Turns natural-language or structured specs into a formal intent โ interfaces, protocol, timing and the corner cases that matter.
Writes synthesizable Verilog with a synthesizability critic pass, biased toward area or timing for your target and application.
Generates a self-checking testbench with directed + constrained-random cases, SVA assertions, and a reference model.
Reads failing logs, localizes the fault, decides whether the design or the testbench is wrong, and patches the right one.
Not an LLM. Wraps iverilog/vvp, Verilator and yosys with timeouts and a security scan, and parses logs into structured results. Loop reliability lives here.
Drives the loop, enforces the iteration budget, decides convergence, and streams every stage to the UI over SSE.
Every generated and golden testbench follows one convention: print PASS:/FAIL: per check and end with a machine-readable summary line. A deterministic parser scores on that โ so there's no human in the waveform-watching loop.
That single convention is what turns "generate and hope" into "generate, measure, and repair."
A benchmark suite of standard blocks with known-good references measures convergence rate and time-saved versus a baseline โ and every run becomes a proprietary spec โ RTL โ bug โ fix trace.
Convergence and bug-find rates on a fixed benchmark โ the number that proves the loop works.
Each iteration logged from day one becomes training and evaluation data competitors don't have.
Max-iteration budget guarantees termination; the tool runner sandboxes and times out every simulation.
Pick a block, give it a spec, and watch each stage stream live.