Instructions to use RajuKandasamy/tamillama_tiny_30m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RajuKandasamy/tamillama_tiny_30m with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="RajuKandasamy/tamillama_tiny_30m")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("RajuKandasamy/tamillama_tiny_30m")
model = AutoModelForCausalLM.from_pretrained("RajuKandasamy/tamillama_tiny_30m")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use RajuKandasamy/tamillama_tiny_30m with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RajuKandasamy/tamillama_tiny_30m"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RajuKandasamy/tamillama_tiny_30m",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/RajuKandasamy/tamillama_tiny_30m

SGLang

How to use RajuKandasamy/tamillama_tiny_30m with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RajuKandasamy/tamillama_tiny_30m" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RajuKandasamy/tamillama_tiny_30m",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RajuKandasamy/tamillama_tiny_30m" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RajuKandasamy/tamillama_tiny_30m",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use RajuKandasamy/tamillama_tiny_30m with Docker Model Runner:
```
docker model run hf.co/RajuKandasamy/tamillama_tiny_30m
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Tamillama_Tiny: A 30M tiny llama model trained to tell stories in Tamil

TL;DR:

This is an experimental model inspired by the paper https://arxiv.org/abs/2305.07759 - How Small Can Language Models Be and Still Speak Coherent English?.

Extended the same concept for Tamil. A 30M parameter LLaMA architecture model that outputs coherent Tamil is preseted here.

Additional experimentation which is included in the model:

This is a multilanguage model as it can output both English and Tamil stories.
The model also does translation of stories from Engish to tamil and vice versa. To see the translation feature, set the max_new_tokens > 512.
Translation of original stories from the tinystories dataset was done using IndicTrans

For now, this is a toy model for researchers, students and LLM enthusiasts to play with the linquistic capability of the model.

Weights Release, License and Usage

We release the weights in two formats: Hugging Face transformers format and GGML format to use with CTransformers or LLaMA.cpp.

This is not fit for any practical purpose other than for research/experimentation use cases.

Usage:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("RajuKandasamy/tamillama_tiny_30m")
model = AutoModelForCausalLM.from_pretrained("RajuKandasamy/tamillama_tiny_30m")
prompt = f"""சொற்கள்:
வாக்குறுதி, எலி, பெரியது
சுருக்கம்:"""
input_ids = tokenizer(prompt, return_tensors="pt").input_ids

generation_output = model.generate(
    input_ids=input_ids, max_new_tokens=256
)
print(tokenizer.decode(generation_output[0]))

Downloads last month: 709

Safetensors

Model size

30.2M params

Tensor type

F32

Model tree for RajuKandasamy/tamillama_tiny_30m

Quantizations

1 model

Dataset used to train RajuKandasamy/tamillama_tiny_30m

Spaces using RajuKandasamy/tamillama_tiny_30m 2

Paper for RajuKandasamy/tamillama_tiny_30m

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Paper • 2305.07759 • Published May 12, 2023 • 45