ModelChorusModelChorus
ChallengeChatLeaderboardBenchmarksHistoryHow it works
Terms of ServicePrivacy PolicyAPI

Copyright 2026 MeetKai Inc.

Benchmarks/Functionary Swahili Large/Arabic (Saudi Arabia) tasks

Functionary Swahili Large

2 tasks

Each row below is a single benchmark task this model was evaluated on. The Score column averages every metric the task reports (accuracy, F1, exact-match, etc.). Click a row to browse the individual questions and the model's responses.

Average
69.4
ScoreLanguageTaskMetrics
87.8Arabic (Saudi Arabia)
arabic_sib200
arabic classification
f1_macro: 87.8sample_len: 204.0
f1_macro: 87.8sample_len: 204.0
51.1Arabic (Saudi Arabia)
arabic_tydiqa
arabic qa
exact_match: 38.3f1: 63.9sample_len: 921.0
exact_match: 38.3f1: 63.9sample_len: 921.0