Fill-Mask
Transformers
Safetensors
English
avey
custom_code

Avey B1 Large (Experimental)

⚠️ Warning: This model is an experimental research artifact released to support exploration and evaluation of the Avey-B architecture. It is not intended for production use at this stage. A production-ready version, along with additional checkpoints, will be released in a future update.

⚠️ Compatibility Warning: This checkpoint was developed and tested using transformers v4. It is NOT guaranteed to work with transformers v5. Please pin your environment to a 4.x version.

Model Summary

Avey-B is a bidirectional sequence model built on the Avey architecture, departing from conventional Transformer-based designs. Instead of relying on quadratic self-attention, Avey-B decouples context width from global sequence length through a new Ranker–Processor architecture:

  • Ranker: The input sequence is partitioned into fixed-size splits. For each target split, the ranker retrieves the top-k most relevant splits based on similarity, constructing a focused contextualization window.
  • Neural Processor: The retrieved splits are contextualized using dynamic parameterization with decoupled static and dynamic components, stability-oriented normalization, and a neural compression module that reduces redundant global context while preserving salient information.

This architecture enables Avey-B to scale efficiently to long contexts well beyond its training window, while maintaining, and in many tasks exceeding, the bidirectional contextualization quality of BERT-style encoders.

Project Links

Model Details

This checkpoint differs slightly from the configuration described in the associated research paper. It serves as a standalone release for users to experiment with the architecture.

  • Architecture: Avey-B
  • Dataset: FineWeb-edu (350BT split)
  • Training Volume: ~220 Billion tokens
  • Context Window: Unlimited
  • Parameters: 391M

For a comprehensive description of the architectural innovations (including decoupled parameterization and stability-oriented normalization, among others) and detailed benchmark evaluations, please refer to the linked paper.

Tokenization & Input Formatting

Note on Tokenizer: Avey-B uses a BPE tokenizer (similar to GPT-2) rather than BERT's WordPiece. This means spaces are often treated as part of the token (e.g., " word" vs "word").

  • Fine-Tuning: For standard tasks like Sequence Classification or NER, you can pass raw text directly. The tokenizer handles spacing naturally, and the model will learn the correct patterns during training.
  • Manual Prompting: If you are manually constructing strings with special tokens (like [MASK]), be aware that the tokenizer is sensitive to whitespace. Unlike BERT, it is often more effective to omit the space before a special token (e.g., use "text[MASK]" instead of "text [MASK]").

In addition, the Avey-B tokenizer includes all special tokens used by BERT for compatibility. However, only the [MASK] token was utilized during pre-training; any additional special tokens, if required for downstream tasks, should be trained during fine-tuning.

Usage

This model is compatible with HuggingFace transformers (v4). You can use it as a drop-in replacement for BERT-based models, provided you allow remote code execution with trust_remote_code=True.

1. Inference (Feature Extraction)

Get contextualized embeddings for downstream tasks:

import torch
from transformers import AutoModel, AutoTokenizer

model_id = "avey-ai/avey-b1-large-exp"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id, trust_remote_code=True)

text = "Avey-B offers a new approach to bi-directional encoding."
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# Access the last hidden state
last_hidden_states = outputs.last_hidden_state
print(f"Output shape: {last_hidden_states.shape}")

2. Masked Language Modeling (Pipeline)

import torch
from transformers import pipeline
from pprint import pprint

pipe = pipeline(
    "fill-mask",
    model="avey-ai/avey-b1-large-exp",
    dtype=torch.bfloat16,
    trust_remote_code=True
)

input_text = "Every morning, she drinks a cup of[MASK] before going to work." 
results = pipe(input_text)
pprint(results)

3. Fine-Tuning

Since Avey-B is compatible with the AutoModel API, it can be fine-tuned using the standard HuggingFace Trainer class or accelerate, just like BERT.

Citation

If you use this model or architecture in your research, please cite the original paper:

@inproceedings{2026aveyb,
  title={Avey-B},
  author={Acharya, Devang and Hammoud, Mohammad},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026}
}
Downloads last month
28
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train avey-ai/avey-b1-large-exp

Collection including avey-ai/avey-b1-large-exp

Paper for avey-ai/avey-b1-large-exp