VoladorLuYu 's Collections Efficient LLM
updated
Medusa: Simple LLM Inference Acceleration Framework with Multiple
Decoding Heads
Paper
• 2401.10774
• Published
• 59
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
Paper
• 2401.06761
• Published
• 1
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention
and Distributed KVCache
Paper
• 2401.02669
• Published
• 17
MambaByte: Token-free Selective State Space Model
Paper
• 2401.13660
• Published
• 60
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Paper
• 2401.15077
• Published
• 20
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent
Paper
• 2401.07324
• Published
• 3
Hierarchical State Space Models for Continuous Sequence-to-Sequence
Modeling
Paper
• 2402.10211
• Published
• 13
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
• 2402.13753
• Published
• 116
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting
Paper
• 2402.13720
• Published
• 7
Paper
• 2402.13144
• Published
• 100
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper
• 2401.18058
• Published
• 24
LongHeads: Multi-Head Attention is Secretly a Long Context Processor
Paper
• 2402.10685
• Published
• 1
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper
• 2401.01325
• Published
• 27
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper
• 2401.06951
• Published
• 26
InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding
Extremely Long Sequences with Training-Free Memory
Paper
• 2402.04617
• Published
• 6
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Paper
• 2402.11131
• Published
• 42
Towards Optimal Learning of Language Models
Paper
• 2402.17759
• Published
• 18
When Scaling Meets LLM Finetuning: The Effect of Data, Model and
Finetuning Method
Paper
• 2402.17193
• Published
• 26
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
• 2403.03507
• Published
• 189
DenseMamba: State Space Models with Dense Hidden Connection for
Efficient Large Language Models
Paper
• 2403.00818
• Published
• 19
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Paper
• 2307.02486
• Published
• 82
Recurrent Drafter for Fast Speculative Decoding in Large Language Models
Paper
• 2403.09919
• Published
• 21
DiJiang: Efficient Large Language Models through Compact Kernelization
Paper
• 2403.19928
• Published
• 12
ReFT: Representation Finetuning for Language Models
Paper
• 2404.03592
• Published
• 101
Rethinking Optimization and Architecture for Tiny Language Models
Paper
• 2402.02791
• Published
• 13
MobileLLM: Optimizing Sub-billion Parameter Language Models for
On-Device Use Cases
Paper
• 2402.14905
• Published
• 134
Mixture-of-Depths: Dynamically allocating compute in transformer-based
language models
Paper
• 2404.02258
• Published
• 107
Pre-training Small Base LMs with Fewer Tokens
Paper
• 2404.08634
• Published
• 36
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and
Training Strategies
Paper
• 2404.08197
• Published
• 29
Rho-1: Not All Tokens Are What You Need
Paper
• 2404.07965
• Published
• 94
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
• 2404.08801
• Published
• 66
Skill-it! A Data-Driven Skills Framework for Understanding and Training
Language Models
Paper
• 2307.14430
• Published
• 3
Compression Represents Intelligence Linearly
Paper
• 2404.09937
• Published
• 28
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
• 2401.01335
• Published
• 68
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper
• 2405.12130
• Published
• 50
SLAB: Efficient Transformers with Simplified Linear Attention and
Progressive Re-parameterized Batch Normalization
Paper
• 2405.11582
• Published
• 17
How Abilities in Large Language Models are Affected by Supervised
Fine-tuning Data Composition
Paper
• 2310.05492
• Published
• 2
The Instruction Hierarchy: Training LLMs to Prioritize Privileged
Instructions
Paper
• 2404.13208
• Published
• 40
Unlocking Continual Learning Abilities in Language Models
Paper
• 2406.17245
• Published
• 30