Papers: Models
updated
Llemma: An Open Language Model For Mathematics
Paper
• 2310.10631
• Published
• 57
Paper
• 2310.06825
• Published
• 58
Paper
• 2309.16609
• Published
• 38
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
Paper
• 2309.11568
• Published
• 11
Textbooks Are All You Need II: phi-1.5 technical report
Paper
• 2309.05463
• Published
• 89
Paper
• 2309.03450
• Published
• 8
Code Llama: Open Foundation Models for Code
Paper
• 2308.12950
• Published
• 29
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
• 2307.09288
• Published
• 250
LLaMA: Open and Efficient Foundation Language Models
Paper
• 2302.13971
• Published
• 21
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Paper
• 2211.05100
• Published
• 37
Scaling Instruction-Finetuned Language Models
Paper
• 2210.11416
• Published
• 7
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
lighter
Paper
• 1910.01108
• Published
• 22
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer
Paper
• 1910.10683
• Published
• 16
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper
• 1907.11692
• Published
• 10
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
Paper
• 1810.04805
• Published
• 26
Skywork: A More Open Bilingual Foundation Model
Paper
• 2310.19341
• Published
• 6
SkyMath: Technical Report
Paper
• 2310.16713
• Published
• 2
LaMDA: Language Models for Dialog Applications
Paper
• 2201.08239
• Published
• 5
Sheared LLaMA: Accelerating Language Model Pre-training via Structured
Pruning
Paper
• 2310.06694
• Published
• 3
UT5: Pretraining Non autoregressive T5 with unrolled denoising
Paper
• 2311.08552
• Published
• 8
TinyLlama: An Open-Source Small Language Model
Paper
• 2401.02385
• Published
• 95
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper
• 2401.02954
• Published
• 53
Paper
• 2401.04088
• Published
• 160
MoE-Mamba: Efficient Selective State Space Models with Mixture of
Experts
Paper
• 2401.04081
• Published
• 74
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
• 2312.00752
• Published
• 150
H2O-Danube-1.8B Technical Report
Paper
• 2401.16818
• Published
• 18
Aya Model: An Instruction Finetuned Open-Access Multilingual Language
Model
Paper
• 2402.07827
• Published
• 48