Reading the chart: cell colors scale green→red with whichever metric you chose.
"Composite" = pass_rate × TPS, rewards models that are both accurate AND fast — use it to pick for interactive agents.
Auto-refresh: this tab polls
data/running.json every 3 seconds while a bench is active.
The monitor daemon (openclaw-bench-monitor.service) tails /tmp/bench-*-run.log and derives model/task status.