ModelChorusModelChorus
ChallengeChatLeaderboardBenchmarksHistoryHow it works
Terms of ServicePrivacy PolicyAPI

Copyright 2026 MeetKai Inc.

Benchmarks/Rnj 1 Instruct/All tasks

Rnj 1 Instruct

67 tasks

Each row below is a single benchmark task this model was evaluated on. The Score column averages every metric the task reports (accuracy, F1, exact-match, etc.). Click a row to browse the individual questions and the model's responses.

Average
38.1
ScoreLanguageTaskMetrics
84.4English (US)
english_mgsm
english math
exact_match: 84.4sample_len: 250.0
exact_match: 84.4sample_len: 250.0
82.4English (US)
english_gsm8k
english math
exact_match: 82.4sample_len: 1319.0
exact_match: 82.4sample_len: 1319.0
79.6Portuguese (Portugal)
portuguese_hatebr
portuguese classification
f1_macro: 79.6sample_len: 1400.0
f1_macro: 79.6sample_len: 1400.0
77.7English (US)
english_belebele
english mcq
f1_macro: 77.7sample_len: 900.0
f1_macro: 77.7sample_len: 900.0
77.0Arabic (Saudi Arabia)
arabic_sib200
arabic classification
f1_macro: 77.0sample_len: 204.0
f1_macro: 77.0sample_len: 204.0
76.0French (France)
french_sib200
french classification
f1_macro: 76.0sample_len: 204.0
f1_macro: 76.0sample_len: 204.0
72.2Ukrainian (Ukraine)
ukrainian_sib200
ukrainian classification
f1_macro: 72.2sample_len: 204.0
f1_macro: 72.2sample_len: 204.0
70.5Urdu (Pakistan)
urdu_uquad
urdu qa
llm_judge_score: 70.5sample_len: 139.0
llm_judge_score: 70.5sample_len: 139.0
70.4French (France)
french_belebele
french mcq
f1_macro: 70.4sample_len: 900.0
f1_macro: 70.4sample_len: 900.0
70.2Spanish (Spain)
spanish_belebele
spanish mcq
f1_macro: 70.2sample_len: 900.0
f1_macro: 70.2sample_len: 900.0
69.6English (US)
ifeval
ifeval
inst_level_loose_acc: 76.3inst_level_strict_acc: 72.3prompt_level_loose_acc: 66.7prompt_level_strict_acc: 63.0sample_len: 541.0
inst_level_loose_acc: 76.3inst_level_strict_acc: 72.3prompt_level_loose_acc: 66.7prompt_level_strict_acc: 63.0sample_len: 541.0
66.8French (France)
french_mgsm
french math
exact_match: 66.8sample_len: 250.0
exact_match: 66.8sample_len: 250.0
61.8Ukrainian (Ukraine)
ukrainian_belebele
ukrainian mcq
f1_macro: 61.8sample_len: 900.0
f1_macro: 61.8sample_len: 900.0
58.9French (France)
french_fquad
french qa
exact_match: 47.0f1: 70.7sample_len: 400.0
exact_match: 47.0f1: 70.7sample_len: 400.0
56.1Urdu (Pakistan)
urdu_factcheckbench
urdu claim
f1_macro: 56.1sample_len: 387.0
f1_macro: 56.1sample_len: 387.0
55.1Spanish (Spain)
spanish_xquad_es
spanish xquad es
exact_match: 46.1f1: 64.1sample_len: 1190.0
exact_match: 46.1f1: 64.1sample_len: 1190.0
55.0Urdu (Pakistan)
urdu_fake_news
urdu classification
f1_macro: 55.0sample_len: 300.0
f1_macro: 55.0sample_len: 300.0
53.9Portuguese (Portugal)
portuguese_hate_speech
portuguese classification
f1_macro: 53.9sample_len: 851.0
f1_macro: 53.9sample_len: 851.0
51.3Arabic (Saudi Arabia)
arabic_belebele
arabic mcq
f1_macro: 51.3sample_len: 900.0
f1_macro: 51.3sample_len: 900.0
51.2Arabic (Saudi Arabia)
arabic_tydiqa
arabic qa
exact_match: 39.1f1: 63.4sample_len: 921.0
exact_match: 39.1f1: 63.4sample_len: 921.0
51.2Portuguese (Portugal)
portuguese_tweetsentbr
portuguese classification
f1_macro: 51.2sample_len: 2010.0
f1_macro: 51.2sample_len: 2010.0
49.0Spanish (Spain)
spanish_global_mmlu
spanish mcq
f1_macro: 49.0sample_len: 400.0
f1_macro: 49.0sample_len: 400.0
48.9Albanian (Albania)
albanian_sib200
albanian classification
f1_macro: 48.9sample_len: 204.0
f1_macro: 48.9sample_len: 204.0
48.1Arabic (Saudi Arabia)
arabic_aratrust
arabic mcq
f1_macro: 48.1sample_len: 522.0
f1_macro: 48.1sample_len: 522.0
48.0Urdu (Pakistan)
urdu_bingcheck
urdu claim
f1_macro: 48.0sample_len: 102.0
f1_macro: 48.0sample_len: 102.0
47.0Urdu (Pakistan)
urdu_facttool_qa
urdu claim
f1_macro: 47.0sample_len: 160.0
f1_macro: 47.0sample_len: 160.0
44.8Swahili (Tanzania)
swahili_sib200
swahili classification
f1_macro: 44.8sample_len: 204.0
f1_macro: 44.8sample_len: 204.0
44.7Ukrainian (Ukraine)
ukrainian_squad
ukrainian qa
exact_match: 33.6f1: 55.9sample_len: 3812.0
exact_match: 33.6f1: 55.9sample_len: 3812.0
44.3French (France)
french_mmmlu
french mcq
f1_macro: 44.3sample_len: 14042.0
f1_macro: 44.3sample_len: 14042.0
44.2English (US)
english_mmlu_pro
english mmlu pro
exact_match: 44.2sample_len: 2100.0
exact_match: 44.2sample_len: 2100.0
39.2Ukrainian (Ukraine)
ukrainian_global_mmlu
ukrainian mcq
f1_macro: 39.2sample_len: 2850.0
f1_macro: 39.2sample_len: 2850.0
37.5Albanian (Albania)
albanian_belebele
albanian mcq
f1_macro: 37.5sample_len: 900.0
f1_macro: 37.5sample_len: 900.0
37.2Yoruba (Nigeria)
yoruba_sib200
yoruba classification
f1_macro: 37.2sample_len: 204.0
f1_macro: 37.2sample_len: 204.0
34.1Ukrainian (Ukraine)
ukrainian_polywrite
ukrainian open generation
open_quality_score: 34.1sample_len: 154.0
open_quality_score: 34.1sample_len: 154.0
34.1Arabic (Saudi Arabia)
arabic_mmlu
arabic mcq
f1_macro: 34.1sample_len: 14316.0
f1_macro: 34.1sample_len: 14316.0
32.6Albanian (Albania)
albanian_global_mmlu
albanian mcq
f1_macro: 32.6sample_len: 400.0
f1_macro: 32.6sample_len: 400.0
32.2Hausa (Nigeria)
hausa_sib200
hausa classification
f1_macro: 32.2sample_len: 204.0
f1_macro: 32.2sample_len: 204.0
28.9Yoruba (Nigeria)
yoruba_naijasenti
yoruba sentiment
f1_macro: 28.9sample_len: 4515.0
f1_macro: 28.9sample_len: 4515.0
28.9Yoruba (Nigeria)
yoruba_afrimmlu
yoruba mcq
f1_macro: 28.9sample_len: 500.0
f1_macro: 28.9sample_len: 500.0
27.3Igbo (Nigeria)
igbo_sib200
igbo classification
f1_macro: 27.3sample_len: 204.0
f1_macro: 27.3sample_len: 204.0
26.6Igbo (Nigeria)
igbo_afrimmlu
igbo mcq
f1_macro: 26.6sample_len: 500.0
f1_macro: 26.6sample_len: 500.0
26.2Igbo (Nigeria)
igbo_belebele
igbo mcq
f1_macro: 26.2sample_len: 900.0
f1_macro: 26.2sample_len: 900.0
26.0Albanian (Albania)
albanian_aya
albanian open generation
llm_judge_score: 26.0sample_len: 200.0
llm_judge_score: 26.0sample_len: 200.0
24.7Hausa (Nigeria)
hausa_afrimmlu
hausa mcq
f1_macro: 24.7sample_len: 500.0
f1_macro: 24.7sample_len: 500.0
24.7Yoruba (Nigeria)
yoruba_belebele
yoruba mcq
f1_macro: 24.7sample_len: 900.0
f1_macro: 24.7sample_len: 900.0
24.2Hausa (Nigeria)
hausa_belebele
hausa mcq
f1_macro: 24.2sample_len: 900.0
f1_macro: 24.2sample_len: 900.0
23.1Portuguese (Portugal)
portuguese_oab_exams
portuguese mcq
exact_match: 23.1sample_len: 2210.0
exact_match: 23.1sample_len: 2210.0
22.7Igbo (Nigeria)
igbo_naijasenti
igbo sentiment
f1_macro: 22.7sample_len: 3682.0
f1_macro: 22.7sample_len: 3682.0
22.4Urdu (Pakistan)
urdu_emotion_class
urdu classification
f1_macro: 22.4sample_len: 200.0
f1_macro: 22.4sample_len: 200.0
21.9Ukrainian (Ukraine)
ukrainian_zno
ukrainian mcq
f1_macro: 21.9sample_len: 751.0
f1_macro: 21.9sample_len: 751.0
21.0Hausa (Nigeria)
hausa_naijasenti
hausa sentiment
f1_macro: 21.0sample_len: 5303.0
f1_macro: 21.0sample_len: 5303.0
20.3Portuguese (Portugal)
portuguese_enem
portuguese mcq
exact_match: 20.3sample_len: 1432.0
exact_match: 20.3sample_len: 1432.0
19.2Portuguese (Portugal)
portuguese_bluex
portuguese mcq
exact_match: 19.2sample_len: 724.0
exact_match: 19.2sample_len: 724.0
17.6Hausa (Nigeria)
hausa_afrixnli
hausa nli
f1_macro: 17.6sample_len: 600.0
f1_macro: 17.6sample_len: 600.0
16.7Igbo (Nigeria)
igbo_afrixnli
igbo nli
f1_macro: 16.7sample_len: 600.0
f1_macro: 16.7sample_len: 600.0
16.7Yoruba (Nigeria)
yoruba_afrixnli
yoruba nli
f1_macro: 16.7sample_len: 600.0
f1_macro: 16.7sample_len: 600.0
16.5Swahili (Tanzania)
swahili_afrixnli
swahili nli
f1_macro: 16.5sample_len: 600.0
f1_macro: 16.5sample_len: 600.0
9.6Swahili (Tanzania)
swahili_afrimgsm
swahili afrimgsm
exact_match: 9.6sample_len: 250.0
exact_match: 9.6sample_len: 250.0
9.5Albanian (Albania)
albanian_polywrite
albanian open generation
open_quality_score: 9.5sample_len: 155.0
open_quality_score: 9.5sample_len: 155.0
6.0Yoruba (Nigeria)
yoruba_afrimgsm
yoruba afrimgsm
exact_match: 6.0sample_len: 250.0
exact_match: 6.0sample_len: 250.0
4.0Hausa (Nigeria)
hausa_afrimgsm
hausa afrimgsm
exact_match: 4.0sample_len: 250.0
exact_match: 4.0sample_len: 250.0
3.7Urdu (Pakistan)
urdu_freshqa
urdu qa
llm_judge_score: 3.7sample_len: 323.0
llm_judge_score: 3.7sample_len: 323.0
2.4Igbo (Nigeria)
igbo_afrimgsm
igbo afrimgsm
exact_match: 2.4sample_len: 250.0
exact_match: 2.4sample_len: 250.0
1.5Igbo (Nigeria)
igbo_afriqa
igbo qa
exact_match: 0.5f1: 2.5sample_len: 409.0
exact_match: 0.5f1: 2.5sample_len: 409.0
1.3Yoruba (Nigeria)
yoruba_afriqa
yoruba qa
exact_match: 0.3f1: 2.2sample_len: 332.0
exact_match: 0.3f1: 2.2sample_len: 332.0
0.5Urdu (Pakistan)
urdu_simpleqa
urdu qa
llm_judge_score: 0.5sample_len: 200.0
llm_judge_score: 0.5sample_len: 200.0
0.4Hausa (Nigeria)
hausa_afriqa
hausa qa
exact_match: 0.0f1: 0.9sample_len: 300.0
exact_match: 0.0f1: 0.9sample_len: 300.0