BenchmarksGPT-oss-120Burdu_factcheckbench

urdu_factcheckbench

GPT-oss-120B · urdu claim · Urdu (Pakistan) · 0 samples

Every row in the list is one question from the benchmark. The check or cross icon shows whether the model's answer matched the target; click a row to read the full prompt, expected answer, and what the model actually produced.

f1_macro

44.6

sample_len

387.0

Average

44.6

No per-question samples have been synced for this task yet.