Piotr Nawrot's picture

Piotr Nawrot

pnawrot

·

https://piotrnawrot.github.io

AI & ML interests

None yet

Recent Activity

upvoted a paper about 1 month ago

Self-Improving World Modelling with Latent Actions

liked a model about 2 months ago

g023/Qwen3-8B-DMS-8x-4bit-NF4

posted an update about 2 months ago

We’ve just released Qwen3-8B-DMS-8x fine-tuned for 8x KV cache compression. It maintains dense model accuracy on demanding tasks like AIME24, and is perfect for inference-time scaling. The code on HF works out-of-the-box. With DMS we fine-tune models end-to-end via distillation; this works much better than “token importance” proxies found in usual eviction methods. It’s state-of-art for KV eviction tailored for fast inference: adds negligible amount of parameters and computation to each KV head, and requires as little as 1K fine-tuning steps to reach 8x compression. It speeds-up both prefill and generation phase of Transformer LLMs, and can be combined with Sparse Attention methods such as DSA. 🎓Paper - https://neurips.cc/virtual/2025/loc/san-diego/poster/119605 💾 Checkpoint - https://huggingface.co/nvidia/Qwen3-8B-DMS-8x 📢 Article - https://ed.ac.uk/news/shrinking-ai-memory-boosts-accuracy

View all activity

Organizations

liked a model about 2 months ago

g023/Qwen3-8B-DMS-8x-4bit-NF4

Text Generation • 8B • Updated Jan 31 • 211 • 1

liked 2 models 2 months ago

pnawrot/nanoT5-base

Updated Apr 26, 2025 • 5 • 11

nvidia/Qwen3-8B-DMS-8x

Updated Jan 22 • 1.09k • 34

liked 4 models about 1 year ago

nvidia/Llama-2-7B-DMC-4x

Updated Dec 22, 2024 • 2

nvidia/Llama-2-7B-DMC-8x

Updated Dec 22, 2024 • 2

nvidia/Llama-2-13B-DMC-4x

Updated Dec 22, 2024 • 1

nvidia/Llama-2-13B-DMC-8x

Updated Dec 22, 2024 • 2

liked a model about 2 years ago

LazarusNLP/IndoNanoT5-base

0.2B • Updated Feb 12, 2024 • 45 • 2