BenchmarksFunctionary Swahili LargeFrench (France) tasks

Functionary Swahili Large

3 tasks

Each row below is a single benchmark task this model was evaluated on. The Score column averages every metric the task reports (accuracy, F1, exact-match, etc.). Click a row to browse the individual questions and the model's responses.

Average

76.8

Score	Language	Task	Metrics
90.9	French (France)	french_sib200 french classification f1_macro: 90.9sample_len: 204.0	f1_macro: 90.9sample_len: 204.0
72.0	French (France)	french_mgsm french math exact_match: 72.0sample_len: 250.0	exact_match: 72.0sample_len: 250.0
67.5	French (France)	french_fquad french qa exact_match: 57.0f1: 77.9sample_len: 400.0	exact_match: 57.0f1: 77.9sample_len: 400.0