BenchmarksFunctionary Swahili LargeArabic (Saudi Arabia) tasks

Functionary Swahili Large

2 tasks

Each row below is a single benchmark task this model was evaluated on. The Score column averages every metric the task reports (accuracy, F1, exact-match, etc.). Click a row to browse the individual questions and the model's responses.

Average

69.4

Score	Language	Task	Metrics
87.8	Arabic (Saudi Arabia)	arabic_sib200 arabic classification f1_macro: 87.8sample_len: 204.0	f1_macro: 87.8sample_len: 204.0
51.1	Arabic (Saudi Arabia)	arabic_tydiqa arabic qa exact_match: 38.3f1: 63.9sample_len: 921.0	exact_match: 38.3f1: 63.9sample_len: 921.0