ModelChorusModelChorus
ChallengeChatLeaderboardBenchmarksHistoryHow it works
Terms of ServicePrivacy PolicyAPI

Copyright 2026 MeetKai Inc.

Benchmarks/Functionary Swahili Mini/All tasks

Functionary Swahili Mini

21 tasks

Each row below is a single benchmark task this model was evaluated on. The Score column averages every metric the task reports (accuracy, F1, exact-match, etc.). Click a row to browse the individual questions and the model's responses.

Average
64.6
ScoreLanguageTaskMetrics
96.9Ukrainian (Ukraine)
ukrainian_polywrite
ukrainian open generation
open_quality_score: 96.9sample_len: 154.0
open_quality_score: 96.9sample_len: 154.0
87.7Portuguese (Portugal)
portuguese_hatebr
portuguese classification
f1_macro: 87.7sample_len: 1400.0
f1_macro: 87.7sample_len: 1400.0
87.3French (France)
french_sib200
french classification
f1_macro: 87.3sample_len: 204.0
f1_macro: 87.3sample_len: 204.0
86.8Arabic (Saudi Arabia)
arabic_sib200
arabic classification
f1_macro: 86.8sample_len: 204.0
f1_macro: 86.8sample_len: 204.0
86.1Ukrainian (Ukraine)
ukrainian_sib200
ukrainian classification
f1_macro: 86.1sample_len: 204.0
f1_macro: 86.1sample_len: 204.0
85.5Swahili (Tanzania)
swahili_sib200
swahili classification
f1_macro: 85.5sample_len: 204.0
f1_macro: 85.5sample_len: 204.0
84.6Albanian (Albania)
albanian_sib200
albanian classification
f1_macro: 84.6sample_len: 204.0
f1_macro: 84.6sample_len: 204.0
83.2Urdu (Pakistan)
urdu_facttool_qa
urdu claim
f1_macro: 83.2sample_len: 160.0
f1_macro: 83.2sample_len: 160.0
82.6Hausa (Nigeria)
hausa_sib200
hausa classification
f1_macro: 82.6sample_len: 204.0
f1_macro: 82.6sample_len: 204.0
76.6Urdu (Pakistan)
urdu_bingcheck
urdu claim
f1_macro: 76.6sample_len: 102.0
f1_macro: 76.6sample_len: 102.0
74.6Urdu (Pakistan)
urdu_factcheckbench
urdu claim
f1_macro: 74.6sample_len: 387.0
f1_macro: 74.6sample_len: 387.0
73.6French (France)
french_mgsm
french math
exact_match: 73.6sample_len: 250.0
exact_match: 73.6sample_len: 250.0
71.2Urdu (Pakistan)
urdu_fake_news
urdu classification
f1_macro: 71.2sample_len: 300.0
f1_macro: 71.2sample_len: 300.0
70.5Portuguese (Portugal)
portuguese_tweetsentbr
portuguese classification
f1_macro: 70.5sample_len: 2010.0
f1_macro: 70.5sample_len: 2010.0
69.8Hausa (Nigeria)
hausa_afrixnli
hausa nli
f1_macro: 69.8sample_len: 600.0
f1_macro: 69.8sample_len: 600.0
69.2Portuguese (Portugal)
portuguese_hate_speech
portuguese classification
f1_macro: 69.2sample_len: 851.0
f1_macro: 69.2sample_len: 851.0
38.1English (US)
english_mmlu_pro
english mmlu pro
exact_match: 38.1sample_len: 2100.0
exact_match: 38.1sample_len: 2100.0
32.6Urdu (Pakistan)
urdu_emotion_class
urdu classification
f1_macro: 32.6sample_len: 200.0
f1_macro: 32.6sample_len: 200.0
0.5French (France)
french_fquad
french qa
exact_match: 0.5f1: 0.5sample_len: 400.0
exact_match: 0.5f1: 0.5sample_len: 400.0
0.0Arabic (Saudi Arabia)
arabic_tydiqa
arabic qa
exact_match: 0.0f1: 0.0sample_len: 921.0
exact_match: 0.0f1: 0.0sample_len: 921.0
0.0Hausa (Nigeria)
hausa_afriqa
hausa qa
exact_match: 0.0f1: 0.0sample_len: 300.0
exact_match: 0.0f1: 0.0sample_len: 300.0