OpenClaw Benchmark Explorer

Reading the chart: cell colors scale green→red with whichever metric you chose. "Composite" = pass_rate × TPS, rewards models that are both accurate AND fast — use it to pick for interactive agents.

Auto-refresh: this tab polls data/running.json every 3 seconds while a bench is active. The monitor daemon (openclaw-bench-monitor.service) tails /tmp/bench-*-run.log and derives model/task status.