Avey B1 Large (Experimental)

⚠️ Warning: This model is an experimental research artifact released to support exploration and evaluation of the Avey-B architecture. It is not intended for production use at this stage. A production-ready version, along with additional checkpoints, will be released in a future update.

⚠️ Compatibility Warning: This checkpoint was developed and tested using transformers v4. It is NOT guaranteed to work with transformers v5. Please pin your environment to a 4.x version.

Model Summary

Avey-B is a bidirectional sequence model built on the Avey architecture, departing from conventional Transformer-based designs. Instead of relying on quadratic self-attention, Avey-B decouples context width from global sequence length through a new Ranker–Processor architecture:

Ranker: The input sequence is partitioned into fixed-size splits. For each target split, the ranker retrieves the top-k most relevant splits based on similarity, constructing a focused contextualization window.
Neural Processor: The retrieved splits are contextualized using dynamic parameterization with decoupled static and dynamic components, stability-oriented normalization, and a neural compression module that reduces redundant global context while preserving salient information.

This architecture enables Avey-B to scale efficiently to long contexts well beyond its training window, while maintaining, and in many tasks exceeding, the bidirectional contextualization quality of BERT-style encoders.

Project Links

Paper: Avey-B (arXiv:2602.15814)
Code Repository: github.avey.ai/avey-b

Model Details

This checkpoint differs slightly from the configuration described in the associated research paper. It serves as a standalone release for users to experiment with the architecture.

Architecture: Avey-B
Dataset: FineWeb-edu (350BT split)
Training Volume: ~220 Billion tokens
Context Window: Unlimited
Parameters: 391M

For a comprehensive description of the architectural innovations (including decoupled parameterization and stability-oriented normalization, among others) and detailed benchmark evaluations, please refer to the linked paper.

Tokenization & Input Formatting

Note on Tokenizer: Avey-B uses a BPE tokenizer (similar to GPT-2) rather than BERT's WordPiece. This means spaces are often treated as part of the token (e.g., " word" vs "word").

Fine-Tuning: For standard tasks like Sequence Classification or NER, you can pass raw text directly. The tokenizer handles spacing naturally, and the model will learn the correct patterns during training.
Manual Prompting: If you are manually constructing strings with special tokens (like [MASK]), be aware that the tokenizer is sensitive to whitespace. Unlike BERT, it is often more effective to omit the space before a special token (e.g., use "text[MASK]" instead of "text [MASK]").

In addition, the Avey-B tokenizer includes all special tokens used by BERT for compatibility. However, only the [MASK] token was utilized during pre-training; any additional special tokens, if required for downstream tasks, should be trained during fine-tuning.

Usage

This model is compatible with HuggingFace transformers (v4). You can use it as a drop-in replacement for BERT-based models, provided you allow remote code execution with trust_remote_code=True.

1. Inference (Feature Extraction)

Get contextualized embeddings for downstream tasks:

import torch
from transformers import AutoModel, AutoTokenizer

model_id = "avey-ai/avey-b1-large-exp"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id, trust_remote_code=True)

text = "Avey-B offers a new approach to bi-directional encoding."
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# Access the last hidden state
last_hidden_states = outputs.last_hidden_state
print(f"Output shape: {last_hidden_states.shape}")

2. Masked Language Modeling (Pipeline)

import torch
from transformers import pipeline
from pprint import pprint

pipe = pipeline(
    "fill-mask",
    model="avey-ai/avey-b1-large-exp",
    dtype=torch.bfloat16,
    trust_remote_code=True
)

input_text = "Every morning, she drinks a cup of[MASK] before going to work." 
results = pipe(input_text)
pprint(results)

3. Fine-Tuning

Since Avey-B is compatible with the AutoModel API, it can be fine-tuned using the standard HuggingFace Trainer class or accelerate, just like BERT.

Citation

If you use this model or architecture in your research, please cite the original paper:

@inproceedings{2026aveyb,
  title={Avey-B},
  author={Acharya, Devang and Hammoud, Mohammad},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026}
}

Downloads last month: 28

Safetensors

Model size

0.4B params

Tensor type

F32

Dataset used to train avey-ai/avey-b1-large-exp

Collection including avey-ai/avey-b1-large-exp

Avey B1 experimental

Collection

Experimental pre-trained checkpoints for Avey-B1 • 3 items • Updated 14 days ago • 3

Paper for avey-ai/avey-b1-large-exp

Avey-B

Paper • 2602.15814 • Published 20 days ago • 3