🤏 Smol-Data Collection Tried and tested mixes for strong pretraining. Inspired by https://huggingface.co/blog/codelion/optimal-dataset-mixing • 14 items • Updated Mar 2 • 12
view article Article Compute and Competition in AI: Different FlOPs for Different Folks sasha • Feb 12 • 15
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face +3 abidlabs, znation, nouamanetazi, sasha, qgallouedec • Jul 29, 2025 • 223
Common Pile v0.1 Collection All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text • 4 items • Updated Jun 6, 2025 • 40