Running Agents 37 BigCodeArena π 37 Compare two AI models by sending them code and seeing their responses
Running 12 FineWeb 2 - Community Leaderboard π 12 View and contribute to language model leaderboards
Running Agents 24 π¨πΏ BenCzechMark π 24 Submit models and view benchmark leaderboard with charts
Running Agents 104 Internal European Leaderboard π 104 Explore and compare multilingual LLM benchmarks
Running on CPU Upgrade Agents 104 Open LLM Leaderboard π 104 Track, rank and evaluate open LLMs and chatbots