Overall rankings for text generation models, powered by community votes and the Elo rating system.
Scroll horizontally to see all columns
Uniform sampling, excluding ties
Each cell shows how often the row model beats the column model
Loading head-to-head data…
Approximate 95% confidence intervals — narrower bars = more certain ranking
Total number of head-to-head challenges between each model pair
Loading challenge count data…