The Unreasonable Ineffectiveness of the Deeper Layers
Paper
• 2403.17887
• Published • 82
Mixture-of-Depths: Dynamically allocating compute in transformer-based
language models
Paper
• 2404.02258
• Published • 107
ReFT: Representation Finetuning for Language Models
Paper
• 2404.03592
• Published • 101
Direct Nash Optimization: Teaching Language Models to Self-Improve with
General Preferences
Paper
• 2404.03715
• Published • 62
Better & Faster Large Language Models via Multi-token Prediction
Paper
• 2404.19737
• Published • 80
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper
• 2405.09818
• Published • 132
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model
Series
Paper
• 2405.19327
• Published • 48
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs
with Nothing
Paper
• 2406.08464
• Published • 72
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Paper
• 2407.01370
• Published • 89
Searching for Best Practices in Retrieval-Augmented Generation
Paper
• 2407.01219
• Published • 11
DoLa: Decoding by Contrasting Layers Improves Factuality in Large
Language Models
Paper
• 2309.03883
• Published • 36
Lynx: An Open Source Hallucination Evaluation Model
Paper
• 2407.08488
• Published
Scaling LLM Test-Time Compute Optimally can be More Effective than
Scaling Model Parameters
Paper
• 2408.03314
• Published • 63
Writing in the Margins: Better Inference Pattern for Long Context
Retrieval
Paper
• 2408.14906
• Published • 144
Human Feedback is not Gold Standard
Paper
• 2309.16349
• Published • 5
Paper
• 2410.05258
• Published • 182
Paper
• 2410.01201
• Published • 53