BenchmarksFunctionary Swahili MiniUrdu (Pakistan) tasks

Functionary Swahili Mini

5 tasks

Each row below is a single benchmark task this model was evaluated on. The Score column averages every metric the task reports (accuracy, F1, exact-match, etc.). Click a row to browse the individual questions and the model's responses.

Average

67.6

Score	Language	Task	Metrics
83.2	Urdu (Pakistan)	urdu_facttool_qa urdu claim f1_macro: 83.2sample_len: 160.0	f1_macro: 83.2sample_len: 160.0
76.6	Urdu (Pakistan)	urdu_bingcheck urdu claim f1_macro: 76.6sample_len: 102.0	f1_macro: 76.6sample_len: 102.0
74.6	Urdu (Pakistan)	urdu_factcheckbench urdu claim f1_macro: 74.6sample_len: 387.0	f1_macro: 74.6sample_len: 387.0
71.2	Urdu (Pakistan)	urdu_fake_news urdu classification f1_macro: 71.2sample_len: 300.0	f1_macro: 71.2sample_len: 300.0
32.6	Urdu (Pakistan)	urdu_emotion_class urdu classification f1_macro: 32.6sample_len: 200.0	f1_macro: 32.6sample_len: 200.0