Layered, learning memory for AI agents in two lines of config. Self-host the OSS library, or skip the infrastructure and use the managed version โ same engine, your call.
# pip install extremis from extremis import Extremis mem = Extremis() mem.remember("User is building a WhatsApp AI") hits = mem.recall("what is the user building?") for r in hits: print(r.memory.content) print(r.reason) # similarity 0.87 ยท score +2.0 ยท used 5ร ยท 3d
Two ways to run it
pip install extremis. SQLite locally, Postgres / Pinecone / S3 Vectors in production. All the same primitives โ you operate the infra.
Read the docs โ
HostedClient(api_key=...) and you're done. Per-tenant Postgres, automatic consolidation, hallucination detection bundled.
Open the dashboard โ
How it works
mem.remember()append to fsync'd log + episodic store
mem.remember( "user wants the SLA in writing", conversation_id="c1", )
mem.recall()ranked by cosine ร RL score ร recency
hits = mem.recall("SLA")
# returns ranked results,
# each with .reason and .verificationmem.reinforce()asymmetric 1.5ร weight on negative signals
mem.report_outcome( [h.memory.id for h in hits[:2]], success=True, )
Every recall explains itself
No black box. Every result carries a one-line reason โ the same string the library returns whether you self-host or use Cloud. You see exactly why a memory surfaced, in plain English.
example reason strings
Vs. the alternatives
| Feature | Extremis | Mem0 | Letta | Raw RAG |
|---|---|---|---|---|
| Layered memory (identity/semantic/episodic/procedural) | โ | โ | โ | โ |
| RL-scored retrieval (1.5ร asymmetric on negatives) | โ | โ | โ | โ |
| Per-recall reason strings | โ | โ | โ | โ |
| Knowledge graph built in | โ | โ | partial | โ |
| Hallucination detection bundled | โ | โ | โ | โ |
| Self-hostable (MIT) | โ | โ | โ | โ |
| Managed option available | โ | โ | โ | โ |
| MCP server (9 tools) | โ | โ | โ | โ |
Benchmarks
Hosted Extremis is the identical engine โ same numbers, fully managed. Methodology: see the reproducible benchmark run. QA accuracy is downstream-model-dependent; stronger answerers raise it significantly.
94.4%
Retrieval R@5
top-5 includes the answer session
38.8%
QA Accuracy
claude-haiku-4-5 as answerer
~35ms
p50 recall latency
local model ยท MPS ยท varies in prod
Hallucination detection
A two-tier verifier runs at write time: a fast NLI model first, then an LLM judge for grey-zone scores. Failing memories aren't silently dropped โ they're tagged unverified and downranked at recall time. Every recall returns a verification trace you can inspect.
example: contradicted recall
extremis.recall 124ms โ embedder.embed 10ms โ retrieve.hybrid 11ms โ (semantic + BM25) verifier.nli 14ms โ grounded 0.42 verifier.judge 47ms โ grounded 0.18 why it failed: sources self-correct from 99.95% to 99.9%; extracted memory captured the pre-correction value. what to try: mem.remember_now(layer="semantic", confidence=0.95) to pin the corrected fact.
Privacy
The library is open-source under MIT. Self-host on SQLite locally or any of five production backends. If Cloud isn't for you (or shuts down tomorrow), point HostedClient.base_url at your own deployment and nothing else changes. Cloud is a convenience, not a dependency.
Both run the same engine. Self-host gives you full control; Cloud gives you back the afternoon.