If you are building agents, you already know the failure mode: long sessions get expensive, slow, and brittle because you keep paying to restate the past.
This persistence engine fixes that by moving memory out of the prompt and into a durable store, then retrieving only what is relevant per turn.
Primary capability
Token usage reduction that improves over time
You set a retrieval budget per turn. Instead of replaying transcripts, the agent retrieves a small targeted memory slice. That means token spend trends down as history grows, and can reach up to 95 percent reduction on long running workloads.
Other core capabilities
-
Durable session event logs, restart safe
-
Blob storage for large artifacts
-
Retrieval over history, lexical plus optional semantic
-
Multi tenant support for separating projects or users
-
Offline licensing using signed license files, no phone home
What I want feedback on
-
Real world token reduction numbers in your workflow
-
Recall quality, especially false positives and missed memories
-
Durability under restarts, crashes, and messy state transitions
-
Integration friction in actual agent loops
To join
Reply with your stack, use case, and your target constraint, token spend, latency, or reliability.
License text is being finalized with Australian counsel. Access starts as soon as that is signed.
2 Likes
Stack: Python/Rust based LLM Security Orchestrator (Firewall).
Use Case: Stateful specialized agents where ‘forgetting’ security constraints is catastrophic.
Target Constraint: Latency & Control.
Critical Question: Does the engine expose the retrieval scores or allow for custom re-ranking logic?
We are implementing an ‘Outcome-Weighted Retrieval’ (penalizing memories that led to failures). If your engine is a black box that just returns text, it breaks our safety loop. If it allows score injection or re-ranking hooks, it’s a perfect fit.
1 Like
Saw the build log—the Go + Alpine stack with host networking looks incredibly clean. Zero bloat.
I have three specific architectural questions to see if this fits a high-security orchestration layer (we currently run a custom Python/ONNX stack):
-
Decoupled Embeddings: The log shows it probing nomic-embed-text via Ollama. Does the API allow ingesting pre-computed vectors (BYO Embeddings)?
- Context: We use specialized multilingual models (intfloat/e5-large) for security classifiers. We need to pass you the vectors, not the raw text, to ensure our specific embedding alignment is preserved.
-
Score Visibility & Reranking: Does the retrieval endpoint return the raw similarity scores/distances for the chunks, or just the text blobs?
-
Context: We implement a ‘Safety Penalty’ layer (SRF) where we mathematically degrade the score of a chunk if it previously led to a jailbreak. We need the raw score to apply this delta (
R=S−DR=S−D
) before passing it to the agent.
-
Metadata Mutability: Can we update the metadata of a stored blob without re-indexing the vector?
- Context: When a ‘memory’ proves toxic, we need to tag it (e.g., failure_count++) instantly to trigger the penalty logic on the next retrieval.
If we can bring our own vectors and see/modify the scores, this could replace our entire vector backend.
1 Like
Thanks for the detailed questions. You are describing exactly the kind of safety critical retrieval loop we want to support.
- On score visibility and reranking: the public doc states hybrid retrieval and that results return event slices with source metadata. It does not yet specify whether raw lexical and vector scores are returned. If score visibility is a requirement, we can expose per hit lexical score, vector similarity, and the combined rank score so you can apply SRF penalties client side.
- On decoupled embeddings and bring your own vectors: the current design in the doc uses an embedding worker that computes embeddings when enabled. We have not published an interface for ingesting precomputed vectors yet. If your workflow depends on BYO vectors, tell me your vector dimensions and distance metric and I will align the interface around that requirement.
- On metadata mutability without vector reindex: the storage model is append only JSONL with tombstones. The doc does not yet define a metadata patch event type, but the intent is that state can evolve without rewriting history. If we treat metadata changes as new events, retrieval can filter or downrank immediately without recomputing vectors as long as the underlying text is unchanged.
1 Like