LLM - a Dinozorus Collection

Dinozorus 's Collections

LLM

updated Oct 20, 2024

The Unreasonable Ineffectiveness of the Deeper Layers

Paper • 2403.17887 • Published Mar 26, 2024 • 82
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 107
ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4, 2024 • 101
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4, 2024 • 62
Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30, 2024 • 80
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 132
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Paper • 2405.19327 • Published May 29, 2024 • 48
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 72
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Paper • 2407.01370 • Published Jul 1, 2024 • 89
Searching for Best Practices in Retrieval-Augmented Generation

Paper • 2407.01219 • Published Jul 1, 2024 • 11
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Paper • 2309.03883 • Published Sep 7, 2023 • 36
Lynx: An Open Source Hallucination Evaluation Model

Paper • 2407.08488 • Published Jul 11, 2024
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Paper • 2408.03314 • Published Aug 6, 2024 • 63
Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27, 2024 • 144
Human Feedback is not Gold Standard

Paper • 2309.16349 • Published Sep 28, 2023 • 5
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 182
Were RNNs All We Needed?

Paper • 2410.01201 • Published Oct 2, 2024 • 53