ModelChorusModelChorus
ChallengeChatLeaderboardBenchmarksHistoryHow it works
Terms of ServicePrivacy PolicyAPI

Copyright 2026 MeetKai Inc.

Benchmarks/GPT-5 Nano/Urdu (Pakistan) tasks

GPT-5 Nano

8 tasks

Each row below is a single benchmark task this model was evaluated on. The Score column averages every metric the task reports (accuracy, F1, exact-match, etc.). Click a row to browse the individual questions and the model's responses.

Average
61.4
ScoreLanguageTaskMetrics
89.9Urdu (Pakistan)
urdu_uquad
urdu qa
llm_judge_score: 89.9sample_len: 139.0
llm_judge_score: 89.9sample_len: 139.0
79.5Urdu (Pakistan)
urdu_bingcheck
urdu claim
f1_macro: 79.5sample_len: 102.0
f1_macro: 79.5sample_len: 102.0
78.9Urdu (Pakistan)
urdu_freshqa
urdu qa
llm_judge_score: 78.9sample_len: 323.0
llm_judge_score: 78.9sample_len: 323.0
65.5Urdu (Pakistan)
urdu_factcheckbench
urdu claim
f1_macro: 65.5sample_len: 387.0
f1_macro: 65.5sample_len: 387.0
61.5Urdu (Pakistan)
urdu_facttool_qa
urdu claim
f1_macro: 61.5sample_len: 160.0
f1_macro: 61.5sample_len: 160.0
55.9Urdu (Pakistan)
urdu_fake_news
urdu classification
f1_macro: 55.9sample_len: 300.0
f1_macro: 55.9sample_len: 300.0
34.2Urdu (Pakistan)
urdu_emotion_class
urdu classification
f1_macro: 34.2sample_len: 200.0
f1_macro: 34.2sample_len: 200.0
25.5Urdu (Pakistan)
urdu_simpleqa
urdu qa
llm_judge_score: 25.5sample_len: 200.0
llm_judge_score: 25.5sample_len: 200.0