← Back to Benchmarks

Benchmark

Chess

Strategic planning benchmarks that evaluate search depth, memory, and move-by-move compositional reasoning.

Chess benchmark preview

Suite coverage

ELO ladder evaluation against classic engines, plus tactical puzzle sets and positional endgame studies.

Metrics monitored

  • Blunder rate and best-move agreement.
  • Long-term plan consistency across 30+ move horizons.
  • Adaptation to novel openings and mid-game disruptions.

Current focus

Sharpening endgame conversion rates while keeping explainability accessible to human analysts.