OzTianlu (Zixi "Oz" Li)

O(1) inference is the foundational design of Spartacus-1B-Instruct 🛡️ !

NoesisLab/Spartacus-1B-Instruct

We have successfully replaced the KV-cache bottleneck inherent in Softmax Attention with Causal Monoid State Compression. By defining the causal history as a monoid recurrence, , the entire prefix is lossily compressed into a fixed-size state matrix per head.

The technical core of this architecture relies on the associativity of the monoid operator:

Training: parallel prefix scan using Triton-accelerated JIT kernels to compute all prefix states simultaneously.
Inference: True sequential updates. Memory and time complexity per token are decoupled from sequence length.
Explicit Causality: We discard RoPE and attention masks. Causality is a first-class citizen, explicitly modeled through learned, content-dependent decay gates.

Current zero-shot benchmarks demonstrate that Spartacus-1B-Instruct (1.3B) is already outperforming established sub-quadratic models like Mamba-1.4B and RWKV-6-1.6B on ARC-Challenge (0.3063). Recent integration of structured Chain-of-Thought (CoT) data has further pushed reasoning accuracy to 75%.

The "Spartacus" era is about scaling intelligence, not the memory wall ♾️.

posted an update 5 days ago

Post

3424

O(1) inference is the foundational design of Spartacus-1B-Instruct 🛡️ !

NoesisLab/Spartacus-1B-Instruct

We have successfully replaced the KV-cache bottleneck inherent in Softmax Attention with Causal Monoid State Compression. By defining the causal history as a monoid recurrence, , the entire prefix is lossily compressed into a fixed-size state matrix per head.

The technical core of this architecture relies on the associativity of the monoid operator:

Training: parallel prefix scan using Triton-accelerated JIT kernels to compute all prefix states simultaneously.
Inference: True sequential updates. Memory and time complexity per token are decoupled from sequence length.
Explicit Causality: We discard RoPE and attention masks. Causality is a first-class citizen, explicitly modeled through learned, content-dependent decay gates.

Current zero-shot benchmarks demonstrate that Spartacus-1B-Instruct (1.3B) is already outperforming established sub-quadratic models like Mamba-1.4B and RWKV-6-1.6B on ARC-Challenge (0.3063). Recent integration of structured Chain-of-Thought (CoT) data has further pushed reasoning accuracy to 75%.

The "Spartacus" era is about scaling intelligence, not the memory wall ♾️.

liked 2 models 5 days ago

Qwen/Qwen3-32B

Text Generation • Updated Jul 26, 2025 • 2.21M • • 659

NoesisLab/Spartacus-1B-Instruct

Text Generation • 1B • Updated 2 days ago • 134 • 12

published a model 5 days ago

NoesisLab/Spartacus-1B-Instruct

Text Generation • 1B • Updated 2 days ago • 134 • 12

liked a dataset 6 days ago

vilm/RedPajama-v2-small

Viewer • Updated Jan 20, 2024 • 500k • 77 • 3

reacted to mrs83's post with 👍 7 days ago

Post

2329

In 2017, my RNNs were babbling. Today, they are hallucinating beautifully.

10 years ago, getting an LSTM to output coherent English was a struggle.
10 years later, after a "cure" based on FineWeb-EDU and a custom synthetic mix for causal conversation, the results are fascinating.

We trained this on ~10B tokens on a single AMD GPU (ROCm). It is not a Transformer: Echo-DSRN (400M) is a novel recurrent architecture inspired by Hymba, RWKV, and xLSTM, designed to challenge the "Attention is All You Need" monopoly on the Edge.

The ambitious goal is to build a small instruct model with RAG and tool usage capabilities ( ethicalabs/Kurtis-EON1)

📊 The Benchmarks (Size: 400M)

For a model this size (trained on <10B tokens), the specialized performance is surprising:

*SciQ*: 73.8% 🦄 (This rivals billion-parameter models in pure fact retrieval).
*PIQA*: 62.3% (Solid physical intuition for a sub-1B model).

The Reality Check:

HellaSwag (29.3%) and Winogrande (50.2%) show the limits of 400M parameters and 10B tokens training.

We are hitting the "Reasoning Wall" which confirms we need to scale to (hopefully) unlock deeper common sense. As you can see in the visualization (to be released soon on HF), the FineWeb-EDU bias is strong. The model is convinced it is in a classroom ("In this course, we explore...").

The Instruct Model is not ready yet and we are currently using curriculum learning to test model plasticity.

Source code and weights will not be released yet. This is not a fork or a fine-tune: the base model is built in-house at https://www.ethicalabs.ai/, with novel components that do not exist in current open libraries.

🤝 Call for Collaboration: I am looking for Peer Reviewers interested in recurrent/hybrid architectures. If you want to explore what lies beyond Transformers, let’s connect!

Training diary: ethicalabs/Kurtis-EON1

6 replies

·

updated a model 9 days ago

NoesisLab/NanoHammer-1.5B-Instruct

Text Generation • 2B • Updated 8 days ago • 84 • 6

reacted to albertvillanova's post with 🤗 9 days ago

Post

1598

5 years already working in democratizing AI 🤗
Grateful to be part of such an awesome team making it happen every day.

liked a model 9 days ago

ethicalabs/Kurtis-EON1

Text Generation • Updated 1 day ago • 5

reacted to Parveshiiii's post with 🔥 10 days ago

Post

1608

🚀 Wanna train your own AI Model or Tokenizer from scratch?

Building models isn’t just for big labs anymore — with the right data, compute, and workflow, you can create **custom AI models** and **tokenizers** tailored to any domain. Whether it’s NLP, domain‑specific datasets, or experimental architectures, training from scratch gives you full control over vocabulary, embeddings, and performance.

✨ Why train your own?
- Full control over vocabulary & tokenization
- Domain‑specific optimization (medical, legal, technical, etc.)
- Better performance on niche datasets
- Freedom to experiment with architectures

⚡ The best part?
- Tokenizer training (TikToken / BPE) can be done in **just 3 lines of code**.
- Model training runs smoothly on **Google Colab notebooks** — no expensive hardware required.

📂 Try out my work:
- 🔗 https://github.com/OE-Void/Tokenizer-from_scratch
- 🔗 https://github.com/OE-Void/GPT

liked a model 11 days ago

NoesisLab/NanoHammer-1.5B-Instruct

Text Generation • 2B • Updated 8 days ago • 84 • 6

Zixi "Oz" Li PRO

AI & ML interests

Recent Activity

Organizations

ChatSpartacus

ChatSpartacus

ChatSpartacus

NoesisLab/Spartacus-1B-Instruct

High RAM usage

O(1) Reasoning Models

OpenDataArena/ODA-Mixture-100k

O(1) Reasoning Models

Qwen/Qwen3-32B

NoesisLab/Spartacus-1B-Instruct

NoesisLab/Spartacus-1B-Instruct

vilm/RedPajama-v2-small

NoesisLab/NanoHammer-1.5B-Instruct

ethicalabs/Kurtis-EON1

NoesisLab/NanoHammer-1.5B-Instruct

Zixi "Oz" Li PRO

AI & ML interests

Recent Activity

Organizations

OzTianlu's activity

ChatSpartacus

ChatSpartacus

ChatSpartacus

High RAM usage