ModelChorusModelChorus
ChallengeChatLeaderboardBenchmarksHistoryHow it works
Terms of ServicePrivacy PolicyAPI

Copyright 2026 MeetKai Inc.

Benchmarks/Rnj 1 Instruct/Urdu (Pakistan) tasks

Rnj 1 Instruct

8 tasks

Each row below is a single benchmark task this model was evaluated on. The Score column averages every metric the task reports (accuracy, F1, exact-match, etc.). Click a row to browse the individual questions and the model's responses.

Average
37.9
ScoreLanguageTaskMetrics
70.5Urdu (Pakistan)
urdu_uquad
urdu qa
llm_judge_score: 70.5sample_len: 139.0
llm_judge_score: 70.5sample_len: 139.0
56.1Urdu (Pakistan)
urdu_factcheckbench
urdu claim
f1_macro: 56.1sample_len: 387.0
f1_macro: 56.1sample_len: 387.0
55.0Urdu (Pakistan)
urdu_fake_news
urdu classification
f1_macro: 55.0sample_len: 300.0
f1_macro: 55.0sample_len: 300.0
48.0Urdu (Pakistan)
urdu_bingcheck
urdu claim
f1_macro: 48.0sample_len: 102.0
f1_macro: 48.0sample_len: 102.0
47.0Urdu (Pakistan)
urdu_facttool_qa
urdu claim
f1_macro: 47.0sample_len: 160.0
f1_macro: 47.0sample_len: 160.0
22.4Urdu (Pakistan)
urdu_emotion_class
urdu classification
f1_macro: 22.4sample_len: 200.0
f1_macro: 22.4sample_len: 200.0
3.7Urdu (Pakistan)
urdu_freshqa
urdu qa
llm_judge_score: 3.7sample_len: 323.0
llm_judge_score: 3.7sample_len: 323.0
0.5Urdu (Pakistan)
urdu_simpleqa
urdu qa
llm_judge_score: 0.5sample_len: 200.0
llm_judge_score: 0.5sample_len: 200.0