ModelChorusModelChorus
ChallengeChatLeaderboardBenchmarksHistoryHow it works
Terms of ServicePrivacy PolicyAPI

Copyright 2026 MeetKai Inc.

Benchmarks/GPT-5 Nano/Arabic (Saudi Arabia) tasks

GPT-5 Nano

5 tasks

Each row below is a single benchmark task this model was evaluated on. The Score column averages every metric the task reports (accuracy, F1, exact-match, etc.). Click a row to browse the individual questions and the model's responses.

Average
65.0
ScoreLanguageTaskMetrics
85.4Arabic (Saudi Arabia)
arabic_sib200
arabic classification
f1_macro: 85.4sample_len: 204.0
f1_macro: 85.4sample_len: 204.0
75.3Arabic (Saudi Arabia)
arabic_aratrust
arabic mcq
f1_macro: 75.3sample_len: 522.0
f1_macro: 75.3sample_len: 522.0
64.9Arabic (Saudi Arabia)
arabic_belebele
arabic mcq
f1_macro: 64.9sample_len: 900.0
f1_macro: 64.9sample_len: 900.0
58.2Arabic (Saudi Arabia)
arabic_mmlu
arabic mcq
f1_macro: 58.2sample_len: 14316.0
f1_macro: 58.2sample_len: 14316.0
41.3Arabic (Saudi Arabia)
arabic_tydiqa
arabic qa
exact_match: 35.9f1: 46.7sample_len: 921.0
exact_match: 35.9f1: 46.7sample_len: 921.0