DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for
Reinforcement Learning and Monte-Carlo Tree Search
Paper
• 2408.08152
• Published • 61
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and
Two-Phase Partition
Paper
• 2402.15220
• Published • 20
Griffin: Mixing Gated Linear Recurrences with Local Attention for
Efficient Language Models
Paper
• 2402.19427
• Published • 56
Note similar https://huggingface.co/papers/2402.18668
Simple linear attention language models balance the recall-throughput
tradeoff
Paper
• 2402.18668
• Published • 20
Linear Transformers are Versatile In-Context Learners
Paper
• 2402.14180
• Published • 7
Scaling Laws for Fine-Grained Mixture of Experts
Paper
• 2402.07871
• Published • 13
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts
Models
Paper
• 2402.07033
• Published • 19
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache
Quantization
Paper
• 2401.18079
• Published • 8
Note kinda similar https://arxiv.org/pdf/2402.02750.pdf
StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback
Paper
• 2402.01391
• Published • 43
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper
• 2402.01739
• Published • 28
Note qmoe - https://arxiv.org/pdf/2310.16795.pdf
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper
• 2401.15024
• Published • 73
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
Language Models
Paper
• 2402.03300
• Published • 142
Repeat After Me: Transformers are Better than State Space Models at
Copying
Paper
• 2402.01032
• Published • 24
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper
• 2401.18058
• Published • 24
Can Large Language Models Understand Context?
Paper
• 2402.00858
• Published • 24
WARM: On the Benefits of Weight Averaged Reward Models
Paper
• 2401.12187
• Published • 19
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated
Text
Paper
• 2401.12070
• Published • 45
Zero Bubble Pipeline Parallelism
Paper
• 2401.10241
• Published • 25
Self-Rewarding Language Models
Paper
• 2401.10020
• Published • 152
Specialized Language Models with Cheap Inference from Limited Domain
Data
Paper
• 2402.01093
• Published • 47
ReFT: Reasoning with Reinforced Fine-Tuning
Paper
• 2401.08967
• Published • 31
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper
• 2401.06951
• Published • 26
Tuning Language Models by Proxy
Paper
• 2401.08565
• Published • 22
Extending LLMs' Context Window with 100 Samples
Paper
• 2401.07004
• Published • 16
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper
• 2401.06080
• Published • 27
Efficient LLM inference solution on Intel GPU
Paper
• 2401.05391
• Published • 11
The Impact of Reasoning Step Length on Large Language Models
Paper
• 2401.04925
• Published • 18
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence
Lengths in Large Language Models
Paper
• 2401.04658
• Published • 27
Blending Is All You Need: Cheaper, Better Alternative to
Trillion-Parameters LLM
Paper
• 2401.02994
• Published • 52
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper
• 2401.03462
• Published • 29
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
• 2401.01335
• Published • 68
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper
• 2401.01325
• Published • 27
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
• 2312.15166
• Published • 61
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Paper
• 2312.12456
• Published • 45
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Paper
• 2401.15077
• Published • 20
OLMo: Accelerating the Science of Language Models
Paper
• 2402.00838
• Published • 85
Dolma: an Open Corpus of Three Trillion Tokens for Language Model
Pretraining Research
Paper
• 2402.00159
• Published • 65
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning
Tasks
Paper
• 2402.04248
• Published • 32
LiPO: Listwise Preference Optimization through Learning-to-Rank
Paper
• 2402.01878
• Published • 20
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper
• 2402.04291
• Published • 50
Direct Language Model Alignment from Online AI Feedback
Paper
• 2402.04792
• Published • 35
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper
• 2402.05099
• Published • 20
Model Editing with Canonical Examples
Paper
• 2402.06155
• Published • 13
SubGen: Token Generation in Sublinear Time and Memory
Paper
• 2402.06082
• Published • 11
InternLM-Math: Open Math Large Language Models Toward Verifiable
Reasoning
Paper
• 2402.06332
• Published • 19
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Paper
• 2402.07319
• Published • 14
AutoMathText: Autonomous Data Selection with Language Models for
Mathematical Texts
Paper
• 2402.07625
• Published • 16
Suppressing Pink Elephants with Direct Principle Feedback
Paper
• 2402.07896
• Published • 11
Buffer Overflow in Mixture of Experts
Paper
• 2402.05526
• Published • 9
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Paper
• 2402.11131
• Published • 42
Linear Transformers with Learnable Kernel Functions are Better
In-Context Models
Paper
• 2402.10644
• Published • 81
LongAgent: Scaling Language Models to 128k Context through Multi-Agent
Collaboration
Paper
• 2402.11550
• Published • 19
BitDelta: Your Fine-Tune May Only Be Worth One Bit
Paper
• 2402.10193
• Published • 21
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
• 2402.13753
• Published • 116
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
• 2402.17764
• Published • 626
FuseChat: Knowledge Fusion of Chat Models
Paper
• 2402.16107
• Published • 39
MegaScale: Scaling Large Language Model Training to More Than 10,000
GPUs
Paper
• 2402.15627
• Published • 36
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Paper
• 2402.16837
• Published • 29
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Paper
• 2402.14830
• Published • 24
GPTVQ: The Blessing of Dimensionality for LLM Quantization
Paper
• 2402.15319
• Published • 22
Beyond A*: Better Planning with Transformers via Search Dynamics
Bootstrapping
Paper
• 2402.14083
• Published • 47
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper
• 2402.14289
• Published • 20
OneBit: Towards Extremely Low-bit Large Language Models
Paper
• 2402.11295
• Published • 24
AtP*: An efficient and scalable method for localizing LLM behaviour to
components
Paper
• 2403.00745
• Published • 14
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Paper
• 2403.07816
• Published • 44
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Paper
• 2403.07508
• Published • 77
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a
Single GPU
Paper
• 2403.06504
• Published • 56
ReALM: Reference Resolution As Language Modeling
Paper
• 2403.20329
• Published • 22
sDPO: Don't Use Your Data All at Once
Paper
• 2403.19270
• Published • 41
Long-form factuality in large language models
Paper
• 2403.18802
• Published • 26
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper
• 2403.13372
• Published • 182
Evolutionary Optimization of Model Merging Recipes
Paper
• 2403.13187
• Published • 58
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper
• 2403.10704
• Published • 60
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
• 2404.08801
• Published • 66
Pre-training Small Base LMs with Fewer Tokens
Paper
• 2404.08634
• Published • 36
Dataset Reset Policy Optimization for RLHF
Paper
• 2404.08495
• Published • 9
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper
• 2405.11143
• Published • 41
Bootstrapping Language Models with DPO Implicit Rewards
Paper
• 2406.09760
• Published • 41
Transfusion: Predict the Next Token and Diffuse Images with One
Multi-Modal Model
Paper
• 2408.11039
• Published • 63
Learning to Reason under Off-Policy Guidance
Paper
• 2504.14945
• Published • 88