Model Reson — fine-tuned LLaMA-7B for meta-cognition and recursive reasoning

Hi everyone,

I’ve uploaded the weights for a side-project I’ve been working on: Reson.
It’s a fine-tune of LLaMA-7B (LoRA, PEFT + bitsandbytes 4-bit). The dataset is ~11k instruction–response pairs that I wrote/curated, focused less on benchmarks and more on “how the model thinks”.

The aim wasn’t to squeeze out another leaderboard model, but to see what happens if you push a model toward:

  • reflecting on its own process (meta-cognition),
  • recursive / loop reasoning,
  • cross-domain adaptability,
  • edge cases like deception/strategy (to simulate human-like flexibility).

Training ran locally, final loss around 0.33 (It was in different batch)
Weights are here: Nexus-Walker/Reson · Hugging Face

What I’d like feedback on:

  1. How do you evaluate this kind of behavior? Standard metrics don’t capture it.
  2. How to keep the behavior stable without catastrophic forgetting as dataset grows?
  3. Any prior work I should read that tried something similar?

This is still experimental and definitely rough around the edges, but I think it shows interesting “proto-agent” behaviors worth exploring.
Curious to hear your thoughts.

2 Likes

Hmm. I hope it helps clarify the issues

1 Like

Thanks a lot for the reply, it’s incredibly helpful. I’ve read through everything and your action plan is very clear. I can confirm the data is indeed around 11k pairs, as you noted from the card. I’ll focus on creating an eval harness that can track PLAN, CRITIQUE, and REVISE, and then I’ll explore the approach with multiple LoRAs for new skills. I’ll keep you posted on my progress, but sounds it will be a long process…

2 Likes