ModelChorusModelChorus
ChallengeChatLeaderboardBenchmarksHistoryHow it works
Terms of ServicePrivacy PolicyAPI

Copyright 2026 MeetKai Inc.

Benchmarks/Functionary Swahili Large/All tasks

Functionary Swahili Large

27 tasks

Each row below is a single benchmark task this model was evaluated on. The Score column averages every metric the task reports (accuracy, F1, exact-match, etc.). Click a row to browse the individual questions and the model's responses.

Average
66.8
ScoreLanguageTaskMetrics
92.2Albanian (Albania)
albanian_belebele
albanian mcq
f1_macro: 92.2sample_len: 900.0
f1_macro: 92.2sample_len: 900.0
91.1Swahili (Tanzania)
swahili_sib200
swahili classification
f1_macro: 91.1sample_len: 204.0
f1_macro: 91.1sample_len: 204.0
90.9French (France)
french_sib200
french classification
f1_macro: 90.9sample_len: 204.0
f1_macro: 90.9sample_len: 204.0
89.3Albanian (Albania)
albanian_sib200
albanian classification
f1_macro: 89.3sample_len: 204.0
f1_macro: 89.3sample_len: 204.0
89.2Ukrainian (Ukraine)
ukrainian_sib200
ukrainian classification
f1_macro: 89.2sample_len: 204.0
f1_macro: 89.2sample_len: 204.0
87.8Arabic (Saudi Arabia)
arabic_sib200
arabic classification
f1_macro: 87.8sample_len: 204.0
f1_macro: 87.8sample_len: 204.0
85.2Hausa (Nigeria)
hausa_sib200
hausa classification
f1_macro: 85.2sample_len: 204.0
f1_macro: 85.2sample_len: 204.0
83.2Albanian (Albania)
albanian_global_mmlu
albanian mcq
f1_macro: 83.2sample_len: 400.0
f1_macro: 83.2sample_len: 400.0
79.1Urdu (Pakistan)
urdu_uquad
urdu qa
llm_judge_score: 79.1sample_len: 139.0
llm_judge_score: 79.1sample_len: 139.0
76.3Portuguese (Portugal)
portuguese_hatebr
portuguese classification
f1_macro: 76.3sample_len: 1400.0
f1_macro: 76.3sample_len: 1400.0
73.2Portuguese (Portugal)
portuguese_tweetsentbr
portuguese classification
f1_macro: 73.2sample_len: 2010.0
f1_macro: 73.2sample_len: 2010.0
72.4Urdu (Pakistan)
urdu_factcheckbench
urdu claim
f1_macro: 72.4sample_len: 387.0
f1_macro: 72.4sample_len: 387.0
72.0French (France)
french_mgsm
french math
exact_match: 72.0sample_len: 250.0
exact_match: 72.0sample_len: 250.0
71.4Urdu (Pakistan)
urdu_fake_news
urdu classification
f1_macro: 71.4sample_len: 300.0
f1_macro: 71.4sample_len: 300.0
70.7Urdu (Pakistan)
urdu_bingcheck
urdu claim
f1_macro: 70.7sample_len: 102.0
f1_macro: 70.7sample_len: 102.0
69.0Urdu (Pakistan)
urdu_facttool_qa
urdu claim
f1_macro: 69.0sample_len: 160.0
f1_macro: 69.0sample_len: 160.0
67.5French (France)
french_fquad
french qa
exact_match: 57.0f1: 77.9sample_len: 400.0
exact_match: 57.0f1: 77.9sample_len: 400.0
62.9Spanish (Spain)
spanish_xquad_es
spanish xquad es
exact_match: 53.0f1: 72.8sample_len: 1190.0
exact_match: 53.0f1: 72.8sample_len: 1190.0
59.8Portuguese (Portugal)
portuguese_hate_speech
portuguese classification
f1_macro: 59.8sample_len: 851.0
f1_macro: 59.8sample_len: 851.0
58.6Ukrainian (Ukraine)
ukrainian_squad
ukrainian qa
exact_match: 47.0f1: 70.3sample_len: 3812.0
exact_match: 47.0f1: 70.3sample_len: 3812.0
56.7Urdu (Pakistan)
urdu_freshqa
urdu qa
llm_judge_score: 56.7sample_len: 323.0
llm_judge_score: 56.7sample_len: 323.0
51.8Hausa (Nigeria)
hausa_afrixnli
hausa nli
f1_macro: 51.8sample_len: 600.0
f1_macro: 51.8sample_len: 600.0
51.1Arabic (Saudi Arabia)
arabic_tydiqa
arabic qa
exact_match: 38.3f1: 63.9sample_len: 921.0
exact_match: 38.3f1: 63.9sample_len: 921.0
38.0Hausa (Nigeria)
hausa_afrimgsm
hausa afrimgsm
exact_match: 38.0sample_len: 250.0
exact_match: 38.0sample_len: 250.0
31.5Urdu (Pakistan)
urdu_emotion_class
urdu classification
f1_macro: 31.5sample_len: 200.0
f1_macro: 31.5sample_len: 200.0
22.0Urdu (Pakistan)
urdu_simpleqa
urdu qa
llm_judge_score: 22.0sample_len: 200.0
llm_judge_score: 22.0sample_len: 200.0
9.3Hausa (Nigeria)
hausa_afriqa
hausa qa
exact_match: 7.7f1: 10.8sample_len: 300.0
exact_match: 7.7f1: 10.8sample_len: 300.0