for now.
Restating your idea in implementable terms
You want an agent that treats perception and action as one coupled sequence and learns a library of reusable mechanisms by compressing recurring sub-segments of that sensorimotor stream. This sits naturally in cybernetics: effective regulation requires an internal model (Good Regulator theorem). (Governance Foundation)
A useful way to make it concrete is to represent the stream as:
- tokens: [obs_t tokens …] → [action_t token(s)] → [obs_{t+1} …] → …
- mechanism: a latent choice that explains/predicts a span of the stream and can be reused elsewhere.
Below are practical choices for each of your items (1)–(3), plus a recommended “starter architecture”.
1) Method of pattern representation: three “mechanism” definitions that work
A. Mechanisms as predictive tests (PSR-style)
Predictive State Representations (PSRs) define state as action-conditional predictions of future observations. That matches your “action+perception in one stream” premise more directly than POMDP latent states. (Incomplete Ideas)
- Representation: a mechanism is a small set of predictive features (“tests”) that become sufficient statistics for a particular regime/skill.
- When it fits: you care about principled sensorimotor modeling under partial observability and want “mechanism = predictive chunk”.
B. Mechanisms as temporally extended skills/options (hierarchical RL)
Here a mechanism is a closed-loop behavior that runs for multiple steps.
- DIAYN learns a diverse set of skills without external reward, via an information-theoretic diversity objective; it explicitly motivates skill reuse/composition. (arXiv)
- Option-Critic learns options (intra-option policies + terminations + policy-over-options) end-to-end. (arXiv)
C. Mechanisms as independent compute modules (modular networks)
A mechanism is a subnet that activates only when relevant.
- RIMs are built specifically around “nearly independent mechanisms” with sparse updates and sparse communication. (arXiv)
- Switch / MoE style routing makes “which expert handles this segment” explicit. (arXiv)
2) Inferring initially orthogonal/unrelated patterns from one stream
Orthogonality doesn’t “fall out” automatically; you need pressure toward separation. Three proven pressures:
A. Persistence + segmentation (regime discovery)
If mechanisms correspond to regimes over time, use a switching model with persistence:
- Sticky HDP-HMM encourages states to persist, avoiding pathological rapid switching; good for discovering repeated regimes without pre-specifying the number of regimes. (UC Irvine Information Department)
Practical note: if you want explicit duration modeling (“this mechanism typically lasts ~N steps”), HDP-HSMM variants exist, but sticky HDP-HMM is the common starting point. (arXiv)
B. Competition (experts specialize)
You can force specialization by making multiple candidates compete for the same data:
This is a general template you can adapt even when you don’t care about “causal” per se: competition → specialization → quasi-orthogonal mechanisms.
C. Diversity objectives (skills become distinct)
If your mechanisms are skills/options:
- DIAYN’s objective is explicitly designed to produce diverse, discriminable skills. (arXiv)
- DADS learns skills whose outcomes are easy to predict, which aligns strongly with your “compression for prediction/action” framing. (arXiv)
3) Meta-learning / recombining mechanisms
To get real recombination (not just “pick one mechanism”), you need explicit compositional training.
A. Modular Meta-Learning (direct hit)
This is one of the clearest “library of modules + compose them for new tasks” frameworks.
- Modular Meta-Learning (Alet et al.) trains reusable modules and then recombines them to generalize compositionally. (arXiv)
B. Hierarchical composition
If mechanisms are skills/options:
- Option-Critic already provides a policy-over-options + learned terminations (a natural composer). (arXiv)
- DIAYN explicitly discusses composing pretrained skills hierarchically for downstream tasks. (arXiv)
C. Factorized generative models (Active Inference flavor)
If you want Friston-style unification, “mechanisms” can be factors of a generative model, and recombination is “reuse factors/priors across contexts”.
- High-level foundation: Free-energy principle review. (UCL Fil)
- Practical tutorial for discrete-state active inference (build POMDPs, run sims, fit data). (PMC)
- Usable code:
pymdp (paper + repo). (JOSS)
- RL comparison/intuition: “Active inference: demystified and compared”. (arXiv)
How reservoir computing fits (and how to use it without derailing the rest)
Reservoir computing (ESNs / LSMs) is good as a fast temporal feature extractor. It typically does not solve “mechanism discovery” or “recombination” by itself; it’s best as a front-end whose states feed one of the mechanism-learning layers above.
- Practical tuning guide (spectral radius, washout, scaling, etc.). (RUG AI)
- Practical library: ReservoirPy (docs + repo). (ReservoirPy)
Recommended pattern:
- Reservoir encodes the stream → router/segmenter assigns mechanism IDs → per-mechanism predictors/controllers learn specialized behavior.
A recommended “starter architecture” that matches your exact description
Step 0: adopt “one stream” modeling as the backbone
Use sequence modeling over trajectories (observation/action tokens). This is a well-trodden path in modern agent work:
- Decision Transformer (paper + official code). (arXiv)
- Gato shows the same idea at scale: one model emits different token types (text, actions, etc.) based on context. (arXiv)
- Robotics exemplars of autoregressive action generation: RT-1 and VIMA. (arXiv)
Step 1: add mechanisms with explicit independence pressure
Pick one of these, depending on what you mean by “mechanism”:
-
Compute-modules approach (most “mechanism-y”)
- Replace part of the backbone with RIMs or a Switch/MoE block; add a persistence penalty so routing doesn’t thrash. (arXiv)
-
Regime segmentation approach (most “compress subsegments”)
-
Skill library approach (most “action-perception pattern”)
- Learn skills with DIAYN or DADS; treat each skill as a mechanism. DADS is particularly aligned with “predictable mechanisms”. (arXiv)
Step 2: recombination (meta-learning / composition)
- Train a modular composer so that new tasks correspond to new compositions of existing mechanisms, not new monoliths. Modular Meta-Learning is a good reference target. (arXiv)
Step 3: guardrails for known failure modes
- Trajectory stitching weakness (offline sequence models can fail to combine “best parts” of different trajectories): consider EDT as a concrete mitigation path. (arXiv)
- Return/control token under-attention: RADT is a concrete example of an architectural fix for return conditioning. (arXiv)
- Expert collapse / underuse (MoE): Switch-style load balancing is a standard mitigation. (arXiv)
Two “default builds” depending on your priorities
Build 1: simplest path to your concept (token stream first)
- Decision Transformer backbone (arXiv)
- RIMs modules or Switch MoE for mechanism separation (arXiv)
- Modular Meta-Learning composer (arXiv)
Use this when: you want the “single discrete stream” aesthetic to be primary.
Build 2: strongest “compression” interpretation (world-model first)
- Learn a compact world model (DreamerV3 is a strong modern reference). (arXiv)
- Learn predictable skills (DADS) as mechanisms. (arXiv)
- Compose skills/options for tasks (Option-Critic or Modular Meta-Learning). (arXiv)
Use this when: your priority is “mechanisms that make prediction + planning easy”.
Conceptual alignment with Active Inference (why your idea resembles it)
Active inference explicitly unifies perception, learning, and action under a single inference objective; “mechanisms” map naturally to factorized pieces of the generative model and policy priors. (UCL Fil)
If you want a practical entry point without committing to full probabilistic modeling immediately, use the “step-by-step tutorial” + pymdp to prototype discrete mechanisms and compare behaviors to RL baselines. (PMC)
If you want one sharp next decision
Decide which of these is the first-class object in your design:
- Skill mechanism (DIAYN/DADS → recombine skills) (arXiv)
- Regime mechanism (sticky switching modes over time) (UC Irvine Information Department)
- Compute mechanism (RIMs/MoE modules that specialize) (arXiv)
Everything else (token stream, reservoir front-end, meta-learning) becomes much easier to choose once that’s fixed.