Epistemic uncertainty detection for LLMs
pip install kateryna
Confident with evidence. RAG retrieval supports the response.
Gives your LLM permission to say "I don't know" instead of guessing.
Confident without evidence. Hallucination danger zone.
Most uncertainty systems only distinguish "confident" from "not confident." Kateryna adds the critical third state: confident without evidence. When your RAG returns nothing but your LLM sounds certain, that's the danger zone. That's what we catch.
Kateryna validates LLM confidence against retrieval evidence. No RAG, no baseline. With RAG, we catch the lies.
"What is the Blueridge Protocol?"
Knowledge base returns 0 relevant chunks
"The Blueridge Protocol is an innovative framework designed to enhance..."
-1 UNGROUNDED
Confident without evidence
We tested an LLM with 7 hallucination-prone queries. It confidently fabricated answers for 5 of them.
| Query | LLM Said | Reality |
|---|---|---|
| "What is the Blueridge Protocol?" | "An innovative framework designed to enhance..." | Doesn't exist |
| "Explain pandas.smart_merge()" | "An internal utility within pandas..." | Doesn't exist |
| "Summarize Smith et al. (2023)" | "The paper revealed significant insights..." | Fabricated citation |
With RAG context, Kateryna flagged all 5 as -1 UNGROUNDED. Confident language, zero evidence.
The core -1/0/+1 epistemic classification. Detect grounded, uncertain, and ungrounded responses.
Built-in support for OpenAI, Anthropic, and Ollama. Drop-in integration with your existing stack.
Calculate retrieval confidence based on chunk relevance and coverage. Works with any vector store.
Detect hedging language, uncertainty markers, and confidence patterns in LLM outputs.
Compliance-grade logging of every epistemic assessment. Tamper-proof storage for regulated industries.
Visualize hallucination rates over time. Track which queries trigger ungrounded responses.
Fine-tuned detection for legal, medical, financial, and trade compliance domains.
Custom threshold tuning for your specific accuracy/coverage tradeoffs. Per-client calibration service.
Detect when RAG chunks conflict with each other. Flag uncertainty when sources disagree.
Scan your corpus before production. Find gaps, contradictions, and ambiguity where LLMs will hallucinate.
Coming Soon
For teams that need compliance, visibility, and domain-specific accuracy.
Be first to know when Pro launches. No spam.
Need enterprise features now? Get in touch
Built on ternary logic principles from the Setun computer (1958). Named after Kateryna Yushchenko, pioneer of address programming.