english_mmlu_pro

GPT-oss-120B · english mmlu pro · English (US) · 2100 samples

Every row in the list is one question from the benchmark. The check or cross icon shows whether the model's answer matched the target; click a row to read the full prompt, expected answer, and what the model actually produced.

exact_match