Hey! I'm Arthu1.
Hello! I've released a new from scratch model series called North 1 (North Star 1, Wind Arc 1.5 and DUMB lowk North Air 1.)
If you would like to, quantize my models!
If you don't know who I am, I'm the owner of starlight mini
Also, its arthu1/wind-arc-1.5 or north-star-1.
hey arthu1, wanted to let you know that you need to create a config file for the model, so that llama cpp could work with it. Also not sure if your model could be quantized using llama cpp , as it is completely new model from scratch and llama cpp might not recognize it and will refuse to quantize, and therefore you might need to create a pull request to support your model type, and hopefully it will get merged into the main branch.
arthu1/wind-arc-1.5: no architectures entry (malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "(end of string)")
Subject: Proposal to support Hugging Face transformers format for North Star / Wind Arc models
Hi,
I've been following the North Star model family (North Air, North Star, and Wind Arc). These are impressive small-scale reasoning models!
To make these models more accessible to the community (enabling use with tools like vLLM, Ollama, bitsandbytes quantization, and the transformers library), it would be great to provide a version in the standard Hugging Face format.
Since the models use a SentencePiece tokenizer and a Transformer-based architecture (likely Llama-like), they can be easily wrapped into a LlamaForCausalLM structure.
Below is a suggested conversion script based on your current .pt checkpoint structure.
Conversion Script (Python)
import torch
import os
from transformers import LlamaConfig, LlamaForCausalLM, LlamaTokenizer
def convert_to_hf(pt_path="windarc15.pt", tokenizer_path="tokenizer.model", save_dir="./windarc-1.5-hf"):
print(f"Loading checkpoint from {pt_path}...")
checkpoint = torch.load(pt_path, map_location="cpu", weights_only=False)
cfg = checkpoint["cfg"]
state_dict = checkpoint["model"]
# 1. Map your custom config to LlamaConfig
# Adjust the keys (e.g., 'dim', 'n_layers') based on your exact cfg dictionary
config = LlamaConfig(
vocab_size=32000,
hidden_size=cfg.get("dim", 896),
intermediate_size=cfg.get("hidden_dim", 2432),
num_hidden_layers=cfg.get("n_layers", 32),
num_attention_heads=cfg.get("n_heads", 14),
num_key_value_heads=cfg.get("n_kv_heads", 2), # If GQA is used
max_position_embeddings=cfg.get("max_seq_len", 2048),
rms_norm_eps=1e-5,
)
# 2. Initialize HF Model
model = LlamaForCausalLM(config)
# 3. Map State Dict Keys
# This maps custom keys to HF Llama keys
hf_state_dict = {}
mapping = {
"tok_embeddings.weight": "model.embed_tokens.weight",
"norm.weight": "model.norm.weight",
"output.weight": "lm_head.weight",
}
for k, v in state_dict.items():
if k in mapping:
hf_state_dict[mapping[k]] = v
elif k.startswith("layers."):
new_k = k.replace("layers.", "model.layers.")
# Map attention and feed-forward layers
new_k = new_k.replace(".attention.wq.", ".self_attn.q_proj.")
new_k = new_k.replace(".attention.wk.", ".self_attn.k_proj.")
new_k = new_k.replace(".attention.wv.", ".self_attn.v_proj.")
new_k = new_k.replace(".attention.wo.", ".self_attn.o_proj.")
new_k = new_k.replace(".feed_forward.w1.", ".mlp.gate_proj.")
new_k = new_k.replace(".feed_forward.w2.", ".mlp.down_proj.")
new_k = new_k.replace(".feed_forward.w3.", ".mlp.up_proj.")
new_k = new_k.replace(".attention_norm.", ".input_layernorm.")
new_k = new_k.replace(".ffn_norm.", ".post_attention_layernorm.")
hf_state_dict[new_k] = v
else:
hf_state_dict[k] = v
# 4. Load weights into the HF model
model.load_state_dict(hf_state_dict, strict=True)
# 5. Save Model and Tokenizer
print(f"Saving HF model to {save_dir}...")
model.save_pretrained(save_dir)
# Since it's a SentencePiece model, LlamaTokenizer handles it natively
tokenizer = LlamaTokenizer(vocab_file=tokenizer_path)
tokenizer.save_pretrained(save_dir)
print("Success!")
if __name__ == "__main__":
convert_to_hf()
Why this is helpful:
- Easy Inference: Users can load the model with just two lines of code:
AutoModelForCausalLM.from_pretrained("arthu1/windarc-1.5-hf"). - Ecosystem Support: It allows the model to be instantly compatible with inference engines like vLLM, Text-Generation-WebUI, and local runners like LM Studio.
- Quantization: It becomes easy to create 4-bit (GGUF/EXL2) versions for edge devices.
Looking forward to seeing more updates on North Star!
Best regards,
aifeifei798
Hey! I'm actually on a Mac mini (no CUDA) and if you'd like to use the models on-demand, email me at Arthur.schannel.stop@gmail.com and I can host the server at some point!
Thanks, Arthur!
hey arthu1, wanted to let you know that you need to create a config file for the model, so that llama cpp could work with it. Also not sure if your model could be quantized using llama cpp , as it is completely new model from scratch and llama cpp might not recognize it and will refuse to quantize, and therefore you might need to create a pull request to support your model type, and hopefully it will get merged into the main branch.
arthu1/wind-arc-1.5: no architectures entry (malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "(end of string)")
Hey! Im Arthur. So I was thinking to make a CLI tool (Gem) where you can host north-star architecture models on device or on a Silicon, (custom machine, lasts 4 hrs). Gems are custom spaces (similar to hf) where you can host models, so no need for llama.cpp.
If you want early access to Wind-Arc-1.6 or North-I, please email me.