ifeval

GPT-5 Nano · ifeval · English (US) · 0 samples

Every row in the list is one question from the benchmark. The check or cross icon shows whether the model's answer matched the target; click a row to read the full prompt, expected answer, and what the model actually produced.

inst_level_loose_acc

39.3

inst_level_strict_acc

38.5

prompt_level_loose_acc

26.6

prompt_level_strict_acc

25.7

sample_len

541.0

Average

32.5

No per-question samples have been synced for this task yet.