ModelChorusModelChorus
ChallengeChatLeaderboardBenchmarksHistoryHow it works
Terms of ServicePrivacy PolicyAPI

Copyright 2026 MeetKai Inc.

Benchmarks/GPT-oss-120B/Arabic (Saudi Arabia) tasks

GPT-oss-120B

5 tasks

Each row below is a single benchmark task this model was evaluated on. The Score column averages every metric the task reports (accuracy, F1, exact-match, etc.). Click a row to browse the individual questions and the model's responses.

Average
39.4
ScoreLanguageTaskMetrics
87.2Arabic (Saudi Arabia)
arabic_sib200
arabic classification
f1_macro: 87.2sample_len: 204.0
f1_macro: 87.2sample_len: 204.0
40.9Arabic (Saudi Arabia)
arabic_tydiqa
arabic qa
exact_match: 30.0f1: 51.7sample_len: 921.0
exact_match: 30.0f1: 51.7sample_len: 921.0
37.0Arabic (Saudi Arabia)
arabic_belebele
arabic mcq
f1_macro: 37.0sample_len: 900.0
f1_macro: 37.0sample_len: 900.0
17.1Arabic (Saudi Arabia)
arabic_aratrust
arabic mcq
f1_macro: 17.1sample_len: 522.0
f1_macro: 17.1sample_len: 522.0
14.7Arabic (Saudi Arabia)
arabic_mmlu
arabic mcq
f1_macro: 14.7sample_len: 14316.0
f1_macro: 14.7sample_len: 14316.0