Model Card for Vintix

This is a multi-task action model via in-context reinforcement learning

Model Details

Setting Description
Parameters 332M
Model Sizes Layers: 20, Heads: 16, Embedding Size: 1024
Sequence Length 8192
Training Data MuJoCo, Meta-World, Bi-DexHands, Industrial Benchmark

Model Description

  • Developed by: dunnolab
  • License: Apache 2.0

Model Sources

Citation

@article{polubarov2025vintix,
  author={Andrey Polubarov and Nikita Lyubaykin and Alexander Derevyagin and Ilya Zisman and Denis Tarasov and Alexander Nikulin and Vladislav Kurenkov},
  title={Vintix: Action Model via In-Context Reinforcement Learning},
  journal={arXiv},
  volume={2501.19400},
  year={2025}
}
Downloads last month
4
Video Preview
loading

Paper for dunnolab/Vintix

Evaluation results

  • Normalized Score IQM (95% CI) on MuJoCo
    self-reported
    0.990
  • Normalized Score IQM (95% CI) on Meta-World
    self-reported
    0.990
  • Normalized Score IQM (95% CI) on Bi-DexHands
    self-reported
    0.920
  • Normalized Score IQM (95% CI) on Industrial-Benchmark
    self-reported
    0.990
  • Total reward on ant_v4
    self-reported
    6315.00 +/- 675.00
  • Expert normalized total reward on ant_v4
    self-reported
    0.98 +/- 0.10
  • Total reward on halfcheetah_v4
    self-reported
    7226.50 +/- 241.50
  • Expert normalized total reward on halfcheetah_v4
    self-reported
    0.93 +/- 0.03
  • Total reward on hopper_v4
    self-reported
    2794.60 +/- 612.62
  • Expert normalized total reward on hopper_v4
    self-reported
    0.86 +/- 0.19
  • Total reward on humanoid_v4
    self-reported
    7376.26 +/- 0.00
  • Expert normalized total reward on humanoid_v4
    self-reported
    0.97 +/- 0.00
  • Total reward on humanoidstandup_v4
    self-reported
    320567.82 +/- 58462.11
  • Expert normalized total reward on humanoidstandup_v4
    self-reported
    1.02 +/- 0.21
  • Total reward on inverteddoublependulum_v4
    self-reported
    6105.75 +/- 4368.65
  • Expert normalized total reward on inverteddoublependulum_v4
    self-reported
    0.65 +/- 0.47
  • Total reward on invertedpendulum_v4
    self-reported
    1000.00 +/- 0.00
  • Expert normalized total reward on invertedpendulum_v4
    self-reported
    1.00 +/- 0.00
  • Total reward on pusher_v4
    self-reported
    -37.82 +/- 8.72
  • Expert normalized total reward on pusher_v4
    self-reported
    1.02 +/- 0.08