Will there be a small model for speculative decoding?

#71
by Regrin - opened

Hello!
I'm very concerned about the speed of models. The easiest way to improve it without sacrificing quality is speculative decoding.
I have a slow computer, and I'd like to try it, but the problem is that there's no small model for Gemma 4.
Could you perhaps train a microscopic model for speculative decoding?
That would be REALLY REALLY REALLY good.

Hello! I'm not sure about Google's plans, but we have trained a model for use in vLLM, which is available here: https://huggingface.co/RedHatAI/gemma-4-31B-it-speculator.eagle3. We intend to keep iterating over the next week. Feel free to try it out!

Hello! I'm not sure about Google's plans, but we have trained a model for use in vLLM, which is available here: https://huggingface.co/RedHatAI/gemma-4-31B-it-speculator.eagle3. We intend to keep iterating over the next week. Feel free to try it out!

This is amazing if it works. Do you know if it's supported with pipeline parallelism on VLLM?

Sign up or log in to comment