Instructions to use Crystalcareai/GemMoE-Base-Random with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Crystalcareai/GemMoE-Base-Random with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Crystalcareai/GemMoE-Base-Random", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Crystalcareai/GemMoE-Base-Random", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Crystalcareai/GemMoE-Base-Random with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Crystalcareai/GemMoE-Base-Random"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Crystalcareai/GemMoE-Base-Random",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Crystalcareai/GemMoE-Base-Random

SGLang

How to use Crystalcareai/GemMoE-Base-Random with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Crystalcareai/GemMoE-Base-Random" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Crystalcareai/GemMoE-Base-Random",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Crystalcareai/GemMoE-Base-Random" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Crystalcareai/GemMoE-Base-Random",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Crystalcareai/GemMoE-Base-Random with Docker Model Runner:
```
docker model run hf.co/Crystalcareai/GemMoE-Base-Random
```

GemMoE-Base-Random / howto.md

Crystalcareai

Update howto.md

b125dc5 verified about 2 years ago

preview code

raw

history blame contribute delete

2.83 kB

GemMoE: Sharing Tools and Improved Base Models

I'm excited to share the tools I used to create GemMoE and release improved base models for the community to explore and build upon.

Updates to GemMoE-Beta-1

GemMoE-Beta-1 will continue to serve as the repository for the modeling_files required to operate the Mixture of Experts (MoEs). However, I will be removing the PyTorch files from that repository.

New Models

I'm introducing two new models:

Crystalcareai/GemMoE-Base-Hidden
- This is a new MoE created using an improved method that I will explain below.
- It utilizes a hidden gate and shows strong potential.
- The model has not been altered and requires finetuning to reach its full potential.
- If you're looking to achieve great performance with relatively minimal training, this is an excellent starting point.
Crystalcareai/GemMoE-Base-Random
- This model was created using the same merge method as GemMoE-Base-Hidden, but with a RANDOM gate.
- It randomly selects the experts during the merging process.
- With finetuning, the model learns to choose the appropriate experts naturally, potentially leading to better results compared to GemMoE-Base-Hidden.
- This method offers an intriguing mix between clown-car and mixtral-style approaches.

The new merge method and modeling files also reduce VRAM usage, making the models easier to finetune.

Training Experiences and Challenges

I have successfully trained the models on a single A100 using Qlora, although it required careful monitoring and posed some difficulties. It appears there is currently an issue with Qlora and GemMoE. I observed better VRAM usage when using 4 A6000 cards and finetuning with Dora without any quantization and deepspeed_Zero3.

Creating Your Own Merges

You can create your own merges using my modified branch of mergekit:

git clone -b gemmoe https://github.com/Crystalcareai/mergekit.git

To create an exact replica of Crystalcareai/GemMoE-Base-Hidden, use the following command:

mergekit-moe examples/gemmoe.yml ./merged --cuda --lazy-unpickle --allow-crimes

Feel free to modify the /examples/gemmoe.yml file to customize the merge according to your preferences.

Alternatively, you can use my modified lazymergekit available on Colab: Link to Colab Notebook

Let's Collaborate!

I'm thrilled to see what we can create together using these tools and improved base models. Let's push the boundaries of what's possible with GemMoE and explore new possibilities in the world of AI and machine learning.

Happy experimenting and building!