Pin mem0ai[nlp]==2.0.2 and fastembed for the new hybrid-search pipeline. Drop OSS graph memory (removed upstream in 2.0.0, PR #4805): remove Neo4j service, env vars, volumes, and driver deps; mark /graph/relationships deprecated. Rewrite Memory.search/get_all/chat/health call sites to use the v2 filters={} + top_k API (entity IDs at top level now raise ValueError). Tighten MCP remove_memory ownership check to O(1) verify_memory_ownership so it doesn't silently truncate at the new top_k=20 default. Downgrade base image to python:3.12-slim for spaCy. Adds scripts/migrate_qdrant_to_v3.py (scroll+upsert with per-user count parity check) and docs/MIGRATION_RUNBOOK.md covering snapshot, dump, collection rebuild, cutover, and rollback procedures.
9 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project overview
FastAPI backend that wraps the mem0ai SDK (pinned to mem0ai[nlp]==2.0.2 — the "V3 memory pipeline") to expose memory operations (add/search/update/delete, plus memory-aware chat) over a REST API, an OpenAI-compatible /v1/chat/completions endpoint, and an MCP server. Memory is stored in Qdrant (vectors + BM25 sparse), with a sister {collection}_entities Qdrant collection auto-created by mem0 for entity linking. Embeddings come from a local Ollama instance; the LLM is a custom OpenAI-compatible endpoint. The frontend is two standalone HTML files (index.html, graph.html) that call the API directly — no build step.
Common commands
# First-time setup (creates volumes, builds, brings up stack). Prompts to reset volumes if they already exist.
./setup.sh
# Day-to-day
docker compose up -d --build # rebuild + start
docker compose down # stop (keeps volumes)
docker compose down -v # stop + delete data
docker compose logs -f backend # tail backend logs (structlog JSON)
docker compose restart backend # pick up code changes (no --reload; see "Volumes" gotcha)
# Sanity check — assumes a host route to backend:8000 exists (see "Networking" gotcha).
curl http://localhost:8000/health
# Integration tests — hit the running stack, no mocks. See test_integration.py for the test list.
MEM0_API_KEY=<key-from-API_KEYS> python test_integration.py
MEM0_API_KEY=<key-from-API_KEYS> python test_integration.py -v
There are no unit tests and no separate lint/format/type-check setup — test_integration.py is the only test entry point and it requires a fully-running Docker stack. The script generates a fresh TEST_USER = f"test_user_{int(datetime.now().timestamp())}" per run, so for tests to pass auth checks the supplied MEM0_API_KEY must map to that exact user in API_KEYS (either set TEST_USER to a statically mapped user, or add a mapping for the run).
Architecture
Request flow
- Client hits FastAPI (
backend/main.py) withX-API-Key(orAuthorization: Bearerfor/v1/chat/completions). auth.pyresolves the key →user_idviasettings.api_key_mapping(parsed from theAPI_KEYSenv JSON). Every protected endpoint then verifies the caller'suser_idmatches the path/bodyuser_id— there is no admin or cross-user access.- The endpoint calls
mem0_manager.mem0_manager(singleton inbackend/mem0_manager.py), which delegates to themem0aiSDK. The SDK in turn calls Qdrant, Ollama, Cohere (reranker), and the custom OpenAI endpoint. - The
@timed("operation_name")decorator frombackend/monitoring.pywraps memory operations to log structured timings and feed the in-memorystatssingleton that powers/statsand/stats/{user_id}.
Three parallel API surfaces
All three live in the same FastAPI process and share auth + rate limiting:
- Native REST (
/chat,/memories*,/graph/relationships/{user_id}(deprecated — returns empty payload),/stats*,/models,/users) — authenticates viaX-API-Key. - OpenAI-compatible (
/v1/chat/completions, also/chat/completions) — authenticates viaAuthorization: Bearer <key>orX-API-Key; supports streaming SSE. Implemented inmain.py:openai_chat_completionsandstream_openai_response. - MCP mounted at
/mcp(seebackend/mcp_server.py) — uses a StarletteMCPAuthMiddlewarethat stuffs the resolveduser_idinto aContextVar, which the FastMCP tools (add_memory,search_memory,remove_memory,chat) read. The MCP session manager is started inside the main FastAPIlifespaninmain.py— mounted-app lifespans don't run automatically, so don't move that startup logic.
Storage layout
- Qdrant — collection name from
QDRANT_COLLECTION_NAME(defaultmem0). Embedding dim must match the embedder; see "Embedding dimensions" gotcha below. Collections created by mem0 v2 carry abm25sparse-vector slot for hybrid search (semantic + keyword + entity-boost). The slot is added automatically at collection creation; existing pre-v2 collections silently degrade to semantic-only with a logged warning — they must be recreated to gain BM25. {collection}_entities— sister Qdrant collection lazy-created by mem0 v2 on firstadd(), same dimension as the main collection. Stores entity vectors used for ranking boost. No code touches it directly.- No graph store — Neo4j and the OSS graph memory feature were removed in
mem0ai2.0.0 (PR #4805). The/graph/relationships/{user_id}endpoint is kept for client compatibility but returnsdeprecated: truewith empty arrays. - No SQL store — older docs mention PostgreSQL/pgvector; that's no longer used. Qdrant only.
Important conventions / gotchas
Networking: backend is not published to the host
docker-compose.yml defines the backend on the external npm_network (Nginx Proxy Manager) and only exposes port 8000 inside Docker. There is no ports: mapping. To hit it from the host you need either: (a) the NPM proxy in front of it, (b) docker compose exec backend curl ..., or (c) add a temporary ports: mapping. The npm_network must exist before bringing the stack up (docker network create npm_network if you don't run NPM).
Claude/OpenAI-compatible monkey-patch
mem0_manager.py patches mem0.llms.openai.OpenAILLM.generate_response at import time to clear store and top_p from the config. In mem0ai>=2.0.0, the store half is redundant (upstream made store opt-in) but kept as a harmless safety net. The top_p clearing is still load-bearing: Claude (reached via the custom OpenAI-compatible endpoint) rejects top_p whenever temperature is set, and OpenAILLM sends both unconditionally. If you upgrade mem0ai and chat starts 400-ing on the custom endpoint, this patch is the first place to look.
Embedding model and dimensions are coupled
EMBEDDING_MODEL and EMBEDDING_DIMS in .env / docker-compose.yml must agree, and they must match the dim the Qdrant collection was created with. Defaults are qwen3-embedding:4b-q8_0 / 2560. Switching the model requires either matching dims or recreating the Qdrant collection (./setup.sh → option 2 wipes volumes).
Single-model architecture
Despite what README.md, TESTING.md, and MEM0.md say about intelligent routing across o4-mini / gemini-2.5-pro / claude-sonnet-4 / o3, the code uses one model — settings.default_model (claude-sonnet-4 by default). /models returns only that. Don't reintroduce routing without first checking with the user.
mem0 v2 API: filters dict + top_k
mem0 v2 rejects user_id/agent_id/run_id as top-level kwargs on Memory.search and Memory.get_all (raises ValueError) — they must live inside a filters={...} dict. The limit kwarg is renamed top_k (default reduced 100 → 20 — pass it explicitly when you need more). Memory.add and Memory.delete_all still accept these IDs as top-level kwargs. Use the _build_filters() helper at the top of mem0_manager.py to construct the dict. Search score is now a fused multi-signal value (semantic + BM25 + entity boost), not raw cosine — don't compare against thresholds calibrated for the old scoring.
ADD-only memory algorithm
Memory.add in mem0 v2 only emits ADD events; the engine no longer issues UPDATE/DELETE events based on LLM judgment. Per-user memory count grows monotonically. Explicit Memory.update / Memory.delete still work and are how the project mutates memories.
Auth & rate limiting
- All endpoints except
/healthrequire a validX-API-Key(or Bearer for the OpenAI-compatible routes).API_KEYSis a JSON object mapping keys → user IDs. Note this contradictsAUTH_SETUP.md, which lists/statsand/modelsas public — the code is authoritative. - Rate limits via
slowapiare set per endpoint inmain.pydecorators: chat 30/min, writes 60/min, reads 120/min, bulk user-delete 10/min. Keyed by API key (fallback to remote IP). - Memory ownership is checked via
mem0_manager.verify_memory_ownership(O(1)Memory.get(memory_id)) — use this rather than fetching all user memories.
Config field aliases
backend/config.py uses Pydantic AliasChoices so both OPENAI_API_KEY and OPENAI_COMPAT_API_KEY (and OPENAI_BASE_URL / OPENAI_COMPAT_BASE_URL) populate the same field. docker-compose.yml passes OPENAI_API_KEY; .env.example documents OPENAI_COMPAT_API_KEY. Both work.
Volumes mount the source in, but no hot reload
docker-compose.yml bind-mounts ./backend:/app and ./frontend:/app/frontend, and uvicorn runs with --workers 4 and no --reload. Code edits become live only on container restart (docker compose restart backend).
Logging
Use structlog.get_logger(__name__) with keyword arguments (e.g. logger.info("msg", user_id=x)). The last few commits explicitly fixed places that mixed stdlib logging (which silently drops kwargs) with structlog. Don't reintroduce logging.getLogger in this codebase.