LLM-as-Judge vs Rule-Based Evals for a Coding Agent: Which Should You Use?
Rule-based checks are your floor and they are non-negotiable; LLM-as-judge is the ceiling you add when code quality, not just correctness, is what you ship. Here is the decision, with cost, latency, and the SWE-bench gap that proves why.