HallucinationGuard-Env

How it works

Three primitives. Nine reward signals. One goal: no hallucinations.

🔄

reset()

Sample a question + context document from one of 38 curated datasets, stratified by difficulty tier.

📤

step(answer)

Submit your answer with confidence and a source quote. Receive a dense reward signal across all 9 components.

📊

grade()

Aggregate episode rewards into a task score. Track accuracy, hallucination rate, and skill rating over time.

9-Component Reward System

Every answer is graded on factual correctness, source grounding, citation accuracy, confidence calibration, semantic consistency, hallucination detection, ROUGE-L, BERTScore, and AlignScore. Each component is weighted and combined into a single scalar reward in [0, 1]. Confident wrong answers are penalized harder than uncertain ones.

Curriculum Progression

Episodes advance from Beginner (single-hop factual QA with unambiguous ground-truth) through Intermediate (multi-hop synthesis across multiple context sentences) to Advanced (adversarial prompts where confident refusals score best). The environment tracks a live skill rating and adjusts difficulty sampling accordingly.

Task Tiers

Three progressively harder tasks drawn from 38 datasets with 1M+ examples.

🟢

Factual Grounding

Beginner ~450K examples

Answer straightforward factual questions from a short context passage. Single-hop retrieval with unambiguous ground truth. The grader rewards precise citation and heavily penalizes adding information not found in the context.

SQuAD BoolQ OpenBookQA ARC TriviaQA +8 more

🔵

Multi-Hop Synthesis

Intermediate ~380K examples

Synthesize evidence from multiple context sentences to reach an answer. Requires connecting disparate facts without fabricating bridge claims. AlignScore and BERTScore are weighted more heavily at this tier.

HotpotQA CoQA NQ-Open MS-MARCO MuSiQue +7 more

🔴

Adversarial Resistance

Advanced ~210K examples

Resist adversarial prompts designed to elicit hallucinations. Many questions are deliberately unanswerable — confident refusals with low confidence score better than fabricated plausible-sounding answers.

HaluEval TruthfulQA FEVER Climate-FEVER WittyQA +6 more

API Reference

RESTful JSON API. All endpoints accept and return application/json. No auth required.

Method	Endpoint	Description
POST	/reset	Start episode — returns question, context, difficulty, episode_id
POST	/step	Submit answer with confidence + source_quote, receive reward breakdown
GET	/state	Current episode metadata — accuracy, hallucination_rate, skill_rating
GET	/tasks	List all 3 tasks with action schema
POST	/grader	Score a completed episode (0.0 – 1.0) from rewards + infos
POST	/baseline	Run heuristic baseline across all 3 tasks
GET	/metadata	Environment name, version, license
GET	/schema	Full JSON schemas for action, observation, state
GET	/health	Health check — returns {"status":"healthy"}
POST	/mcp	JSON-RPC 2.0 tool discovery for MCP clients
GET	/leaderboard	Ranked leaderboard by avg_reward
POST	/leaderboard/submit	Submit model results for ranking

Quick Start

Three commands to run your first episode.

        bash
        
      
# Install and launch
pip install -e .
uvicorn server.app:app --port 7860

# Run heuristic baseline
python inference.py --heuristic --env-url http://localhost:7860

        python
        
      
import requests

BASE = "http://localhost:7860"

# 1. Reset — get a question + context
obs = requests.post(f"{BASE}/reset", json={"difficulty": "beginner"}).json()
session_id = obs["session_id"]
print(obs["question"])

# 2. Step — submit your answer
result = requests.post(f"{BASE}/step", json={
    "answer":       "Based on the context, ...",
    "confidence":   0.85,
    "source_quote": "verbatim text from context",
    "session_id":   session_id,
}).json()

print(result["reward"])            # scalar in [0, 1]
print(result["is_hallucination"])   # bool

Interactive Playground

Reset an episode, read the context, craft your answer, and see the live reward breakdown.

No episode active

Difficulty

Question & Context

Click Reset to load a question and context...

Your Answer

Confidence

0.70

Source Quote (verbatim from context)

Reward Breakdown —

Submit an answer to see the 9-component reward breakdown

▶ Raw JSON response

Observation

Click Reset to start an episode.

HallucinationGuard‑Env