diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..8e3f922 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,84 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project overview + +FastAPI backend that wraps the `mem0ai` SDK (pinned to `mem0ai[nlp]==2.0.2` — the "V3 memory pipeline") to expose memory operations (add/search/update/delete, plus memory-aware chat) over a REST API, an OpenAI-compatible `/v1/chat/completions` endpoint, and an MCP server. Memory is stored in Qdrant (vectors + BM25 sparse), with a sister `{collection}_entities` Qdrant collection auto-created by mem0 for entity linking. Embeddings come from a local Ollama instance; the LLM is a custom OpenAI-compatible endpoint. The frontend is two standalone HTML files (`index.html`, `graph.html`) that call the API directly — no build step. + +## Common commands + +```bash +# First-time setup (creates volumes, builds, brings up stack). Prompts to reset volumes if they already exist. +./setup.sh + +# Day-to-day +docker compose up -d --build # rebuild + start +docker compose down # stop (keeps volumes) +docker compose down -v # stop + delete data +docker compose logs -f backend # tail backend logs (structlog JSON) +docker compose restart backend # pick up code changes (no --reload; see "Volumes" gotcha) + +# Sanity check — assumes a host route to backend:8000 exists (see "Networking" gotcha). +curl http://localhost:8000/health + +# Integration tests — hit the running stack, no mocks. See test_integration.py for the test list. +MEM0_API_KEY= python test_integration.py +MEM0_API_KEY= python test_integration.py -v +``` + +There are no unit tests and no separate lint/format/type-check setup — `test_integration.py` is the only test entry point and it requires a fully-running Docker stack. The script generates a fresh `TEST_USER = f"test_user_{int(datetime.now().timestamp())}"` per run, so for tests to pass auth checks the supplied `MEM0_API_KEY` must map to that exact user in `API_KEYS` (either set `TEST_USER` to a statically mapped user, or add a mapping for the run). + +## Architecture + +### Request flow +1. Client hits FastAPI (`backend/main.py`) with `X-API-Key` (or `Authorization: Bearer` for `/v1/chat/completions`). +2. `auth.py` resolves the key → `user_id` via `settings.api_key_mapping` (parsed from the `API_KEYS` env JSON). Every protected endpoint then verifies the caller's `user_id` matches the path/body `user_id` — there is no admin or cross-user access. +3. The endpoint calls `mem0_manager.mem0_manager` (singleton in `backend/mem0_manager.py`), which delegates to the `mem0ai` SDK. The SDK in turn calls Qdrant, Ollama, Cohere (reranker), and the custom OpenAI endpoint. +4. The `@timed("operation_name")` decorator from `backend/monitoring.py` wraps memory operations to log structured timings and feed the in-memory `stats` singleton that powers `/stats` and `/stats/{user_id}`. + +### Three parallel API surfaces +All three live in the same FastAPI process and share auth + rate limiting: +- **Native REST** (`/chat`, `/memories*`, `/graph/relationships/{user_id}` *(deprecated — returns empty payload)*, `/stats*`, `/models`, `/users`) — authenticates via `X-API-Key`. +- **OpenAI-compatible** (`/v1/chat/completions`, also `/chat/completions`) — authenticates via `Authorization: Bearer ` or `X-API-Key`; supports streaming SSE. Implemented in `main.py:openai_chat_completions` and `stream_openai_response`. +- **MCP** mounted at `/mcp` (see `backend/mcp_server.py`) — uses a Starlette `MCPAuthMiddleware` that stuffs the resolved `user_id` into a `ContextVar`, which the FastMCP tools (`add_memory`, `search_memory`, `remove_memory`, `chat`) read. The MCP session manager is started inside the main FastAPI `lifespan` in `main.py` — mounted-app lifespans don't run automatically, so don't move that startup logic. + +### Storage layout +- **Qdrant** — collection name from `QDRANT_COLLECTION_NAME` (default `mem0`). Embedding dim must match the embedder; see "Embedding dimensions" gotcha below. Collections created by mem0 v2 carry a `bm25` sparse-vector slot for hybrid search (semantic + keyword + entity-boost). The slot is added automatically at collection creation; existing pre-v2 collections silently degrade to semantic-only with a logged warning — they must be recreated to gain BM25. +- **`{collection}_entities`** — sister Qdrant collection lazy-created by mem0 v2 on first `add()`, same dimension as the main collection. Stores entity vectors used for ranking boost. No code touches it directly. +- **No graph store** — Neo4j and the OSS graph memory feature were removed in `mem0ai` 2.0.0 (PR #4805). The `/graph/relationships/{user_id}` endpoint is kept for client compatibility but returns `deprecated: true` with empty arrays. +- **No SQL store** — older docs mention PostgreSQL/pgvector; that's no longer used. Qdrant only. + +## Important conventions / gotchas + +### Networking: backend is not published to the host +`docker-compose.yml` defines the backend on the **external** `npm_network` (Nginx Proxy Manager) and only `expose`s port 8000 inside Docker. There is no `ports:` mapping. To hit it from the host you need either: (a) the NPM proxy in front of it, (b) `docker compose exec backend curl ...`, or (c) add a temporary `ports:` mapping. The `npm_network` must exist before bringing the stack up (`docker network create npm_network` if you don't run NPM). + +### Claude/OpenAI-compatible monkey-patch +`mem0_manager.py` patches `mem0.llms.openai.OpenAILLM.generate_response` at import time to clear `store` and `top_p` from the config. In `mem0ai>=2.0.0`, the `store` half is redundant (upstream made `store` opt-in) but kept as a harmless safety net. The `top_p` clearing is still load-bearing: Claude (reached via the custom OpenAI-compatible endpoint) rejects `top_p` whenever `temperature` is set, and `OpenAILLM` sends both unconditionally. If you upgrade `mem0ai` and chat starts 400-ing on the custom endpoint, this patch is the first place to look. + +### Embedding model and dimensions are coupled +`EMBEDDING_MODEL` and `EMBEDDING_DIMS` in `.env` / `docker-compose.yml` must agree, and they must match the dim the Qdrant collection was created with. Defaults are `qwen3-embedding:4b-q8_0` / `2560`. Switching the model requires either matching dims or recreating the Qdrant collection (`./setup.sh` → option 2 wipes volumes). + +### Single-model architecture +Despite what `README.md`, `TESTING.md`, and `MEM0.md` say about intelligent routing across `o4-mini` / `gemini-2.5-pro` / `claude-sonnet-4` / `o3`, the code uses **one** model — `settings.default_model` (`claude-sonnet-4` by default). `/models` returns only that. Don't reintroduce routing without first checking with the user. + +### mem0 v2 API: filters dict + top_k +mem0 v2 rejects `user_id`/`agent_id`/`run_id` as top-level kwargs on `Memory.search` and `Memory.get_all` (raises `ValueError`) — they must live inside a `filters={...}` dict. The `limit` kwarg is renamed `top_k` (default reduced 100 → 20 — pass it explicitly when you need more). `Memory.add` and `Memory.delete_all` still accept these IDs as top-level kwargs. Use the `_build_filters()` helper at the top of `mem0_manager.py` to construct the dict. Search `score` is now a fused multi-signal value (semantic + BM25 + entity boost), not raw cosine — don't compare against thresholds calibrated for the old scoring. + +### ADD-only memory algorithm +`Memory.add` in mem0 v2 only emits `ADD` events; the engine no longer issues `UPDATE`/`DELETE` events based on LLM judgment. Per-user memory count grows monotonically. Explicit `Memory.update` / `Memory.delete` still work and are how the project mutates memories. + +### Auth & rate limiting +- All endpoints except `/health` require a valid `X-API-Key` (or Bearer for the OpenAI-compatible routes). `API_KEYS` is a JSON object mapping keys → user IDs. Note this contradicts `AUTH_SETUP.md`, which lists `/stats` and `/models` as public — the code is authoritative. +- Rate limits via `slowapi` are set per endpoint in `main.py` decorators: chat 30/min, writes 60/min, reads 120/min, bulk user-delete 10/min. Keyed by API key (fallback to remote IP). +- Memory ownership is checked via `mem0_manager.verify_memory_ownership` (O(1) `Memory.get(memory_id)`) — use this rather than fetching all user memories. + +### Config field aliases +`backend/config.py` uses Pydantic `AliasChoices` so both `OPENAI_API_KEY` and `OPENAI_COMPAT_API_KEY` (and `OPENAI_BASE_URL` / `OPENAI_COMPAT_BASE_URL`) populate the same field. `docker-compose.yml` passes `OPENAI_API_KEY`; `.env.example` documents `OPENAI_COMPAT_API_KEY`. Both work. + +### Volumes mount the source in, but no hot reload +`docker-compose.yml` bind-mounts `./backend:/app` and `./frontend:/app/frontend`, and `uvicorn` runs with `--workers 4` and no `--reload`. Code edits become live only on container restart (`docker compose restart backend`). + +### Logging +Use `structlog.get_logger(__name__)` with **keyword arguments** (e.g. `logger.info("msg", user_id=x)`). The last few commits explicitly fixed places that mixed stdlib `logging` (which silently drops kwargs) with structlog. Don't reintroduce `logging.getLogger` in this codebase. diff --git a/backend/Dockerfile b/backend/Dockerfile index 7eb3825..3f6a22c 100644 --- a/backend/Dockerfile +++ b/backend/Dockerfile @@ -1,4 +1,4 @@ -FROM python:3.13-slim +FROM python:3.12-slim # Set working directory WORKDIR /app diff --git a/backend/config.py b/backend/config.py index f003824..c926b50 100644 --- a/backend/config.py +++ b/backend/config.py @@ -44,20 +44,6 @@ class Settings(BaseSettings): ), ) - # Neo4j Configuration - neo4j_uri: str = Field( - default="bolt://localhost:7687", - validation_alias=AliasChoices("NEO4J_URI", "neo4j_uri"), - ) - neo4j_username: str = Field( - default="neo4j", - validation_alias=AliasChoices("NEO4J_USERNAME", "neo4j_username"), - ) - neo4j_password: str = Field( - default="mem0_neo4j_password", - validation_alias=AliasChoices("NEO4J_PASSWORD", "neo4j_password"), - ) - # Application Configuration log_level: str = Field( default="INFO", validation_alias=AliasChoices("LOG_LEVEL", "log_level") diff --git a/backend/main.py b/backend/main.py index 1c99d28..d76de1c 100644 --- a/backend/main.py +++ b/backend/main.py @@ -115,8 +115,8 @@ async def lifespan(app: FastAPI): # Initialize FastAPI app app = FastAPI( title="Mem0 Interface POC", - description="Minimal but fully functional Mem0 interface with PostgreSQL and Neo4j integration", - version="1.0.0", + description="Minimal Mem0 interface backed by Qdrant (mem0ai v2 hybrid-search pipeline)", + version="2.0.0", lifespan=lifespan, ) @@ -535,7 +535,7 @@ async def search_memories( query=search_request.query, user_id=search_request.user_id, limit=search_request.limit, - threshold=search_request.threshold or 0.2, + threshold=search_request.threshold or 0.1, filters=search_request.filters, agent_id=search_request.agent_id, run_id=search_request.run_id, @@ -700,13 +700,15 @@ async def delete_user_memories( ) -# Graph relationships endpoint - pure Mem0 passthrough -@app.get("/graph/relationships/{user_id}") +# Graph relationships endpoint - DEPRECATED in mem0 v2 (OSS graph memory removed). +# Returns an empty payload with `deprecated: true` so the frontend can render a +# clear "Graph view unavailable" state. Kept for client compatibility. +@app.get("/graph/relationships/{user_id}", deprecated=True) @limiter.limit("60/minute") async def get_graph_relationships( request: Request, user_id: str, authenticated_user: str = Depends(get_current_user) ): - """Get graph relationships - pure Mem0 passthrough.""" + """Get graph relationships - DEPRECATED: mem0 v2 removed OSS graph memory.""" try: # Verify user can only access their own graph relationships if authenticated_user != user_id: diff --git a/backend/mcp_server.py b/backend/mcp_server.py index a7b7ae4..dc9557d 100644 --- a/backend/mcp_server.py +++ b/backend/mcp_server.py @@ -151,15 +151,8 @@ async def remove_memory( user_id = get_authenticated_user() logger.info(f"MCP remove_memory: user={user_id}, memory_id={memory_id}") - # Verify ownership: get user's memories and check if memory_id exists - user_memories = await mem0_manager.get_user_memories( - user_id=user_id, - limit=10000 # Get all to check ownership - ) - - memory_ids = {m.get("id") for m in user_memories if m.get("id")} - - if memory_id not in memory_ids: + # O(1) ownership check via Memory.get(); avoids get_all's v2 top_k=20 cap. + if not await mem0_manager.verify_memory_ownership(memory_id, user_id): raise ValueError(f"Memory '{memory_id}' not found or access denied") result = await mem0_manager.delete_memory(memory_id=memory_id) diff --git a/backend/mem0_manager.py b/backend/mem0_manager.py index b3e226f..3547023 100644 --- a/backend/mem0_manager.py +++ b/backend/mem0_manager.py @@ -19,7 +19,7 @@ from monitoring import timed logger = structlog.get_logger(__name__) -# Retry decorator for database operations (Qdrant, Neo4j) +# Retry decorator for database operations (Qdrant) db_retry = retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=1, max=10), @@ -28,7 +28,11 @@ db_retry = retry( reraise=True, ) -# Monkey-patch Mem0's OpenAI LLM to remove the 'store' parameter for LiteLLM compatibility +# Monkey-patch Mem0's OpenAI LLM to clear top_p when the configured LLM +# is Claude reached via an OpenAI-compatible endpoint: Claude rejects top_p +# whenever temperature is set, and OpenAILLM sends both unconditionally. +# (The 'store' branch is now redundant in mem0ai>=2.0.0 — upstream made it +# opt-in — but harmless; kept for safety.) from mem0.llms.openai import OpenAILLM _original_generate_response = OpenAILLM.generate_response @@ -37,10 +41,8 @@ _original_generate_response = OpenAILLM.generate_response def patched_generate_response( self, messages, response_format=None, tools=None, tool_choice="auto", **kwargs ): - # Remove 'store' parameter as LiteLLM doesn't support it if hasattr(self.config, "store"): self.config.store = None - # Remove 'top_p' to avoid conflict with temperature for Claude models if hasattr(self.config, "top_p"): self.config.top_p = None return _original_generate_response( @@ -49,7 +51,28 @@ def patched_generate_response( OpenAILLM.generate_response = patched_generate_response -logger.info("Applied LiteLLM compatibility patch: disabled 'store' parameter") +logger.info("Applied Claude/OpenAI-compatible patch: cleared top_p (and store)") + + +def _build_filters( + user_id: Optional[str], + agent_id: Optional[str] = None, + run_id: Optional[str] = None, + extra: Optional[Dict[str, Any]] = None, +) -> Dict[str, Any]: + """Build the filters dict required by mem0 v2 search/get_all. + + In mem0 v2.x, user_id/agent_id/run_id are rejected as top-level kwargs + on Memory.search and Memory.get_all — they must live inside `filters`. + """ + merged: Dict[str, Any] = dict(extra) if extra else {} + if user_id is not None: + merged["user_id"] = user_id + if agent_id is not None: + merged["agent_id"] = agent_id + if run_id is not None: + merged["run_id"] = run_id + return merged class Mem0Manager: @@ -59,18 +82,15 @@ class Mem0Manager: """ def __init__(self): - # Custom endpoint configuration with graph memory enabled logger.info( "Initializing Mem0Manager with custom endpoint", model=settings.default_model, embedding_model=settings.embedding_model, embedding_dims=settings.embedding_dims, qdrant_host=settings.qdrant_host, - neo4j_uri=settings.neo4j_uri, ) config = { "version": "v1.1", - "enable_graph": True, "llm": { "provider": "openai", "config": { @@ -99,14 +119,6 @@ class Mem0Manager: "on_disk": True, }, }, - "graph_store": { - "provider": "neo4j", - "config": { - "url": settings.neo4j_uri, - "username": settings.neo4j_username, - "password": settings.neo4j_password, - }, - }, "reranker": { "provider": "cohere", "config": { @@ -208,15 +220,12 @@ class Mem0Manager: "query": query, "note": "Empty query provided, no results returned. Use a specific query to search memories.", } - # Direct Mem0 search - trust native handling + # mem0 v2: entity IDs must live inside the `filters` dict; `limit` is now `top_k`. result = self.memory.search( query=query, - user_id=user_id, - agent_id=agent_id, - run_id=run_id, - limit=limit, + filters=_build_filters(user_id, agent_id, run_id, extra=filters), + top_k=limit, threshold=threshold, - filters=filters, ) return { "memories": result.get("results", []), @@ -238,13 +247,10 @@ class Mem0Manager: ) -> List[Dict[str, Any]]: """Get all memories for a user - native Mem0 pattern.""" try: - # Direct Mem0 get_all call - trust native parameter handling + # mem0 v2: entity IDs must live inside the `filters` dict; `limit` is now `top_k`. result = self.memory.get_all( - user_id=user_id, - limit=limit, - agent_id=agent_id, - run_id=run_id, - filters=filters, + filters=_build_filters(user_id, agent_id, run_id, extra=filters), + top_k=limit, ) return result.get("results", []) except Exception as e: @@ -323,54 +329,28 @@ class Mem0Manager: run_id: Optional[str], limit: int = 50, ) -> Dict[str, Any]: - """Get graph relationships - using correct Mem0 get_all() method.""" - try: - # Use get_all() to retrieve memories with graph relationships - result = self.memory.get_all( - user_id=user_id, agent_id=agent_id, run_id=run_id, limit=limit - ) + """Graph relationships — deprecated in mem0 v2 (OSS graph memory removed). - # Extract relationships from Mem0's response structure - relationships = result.get("relations", []) - - # For entities, we can derive them from memory results or relations - entities = [] - if "results" in result: - # Extract unique entities from memories and relationships - entity_set = set() - - # Add entities from relationships - for rel in relationships: - if "source" in rel: - entity_set.add(rel["source"]) - if "target" in rel: - entity_set.add(rel["target"]) - - entities = [{"name": entity} for entity in entity_set] - - return { - "relationships": relationships, - "entities": entities, - "user_id": user_id, - "agent_id": agent_id, - "run_id": run_id, - "total_memories": len(result.get("results", [])), - "total_relationships": len(relationships), - } - - except Exception as e: - logger.error(f"Error getting graph relationships: {e}") - # Return empty but structured response on error - return { - "relationships": [], - "entities": [], - "user_id": user_id, - "agent_id": agent_id, - "run_id": run_id, - "total_memories": 0, - "total_relationships": 0, - "error": str(e), - } + mem0 v2.0.0 deleted the OSS graph store (Neo4j/Memgraph/Kuzu/AGE drivers). + Entity relationships now influence ranking via a parallel `{collection}_entities` + Qdrant collection rather than being directly traversable. We return an empty + graph payload plus a `deprecated` marker so clients (frontend graph.html) can + render a clear "Graph view unavailable" state instead of erroring. + """ + return { + "relationships": [], + "entities": [], + "user_id": user_id, + "agent_id": agent_id, + "run_id": run_id, + "total_memories": 0, + "total_relationships": 0, + "deprecated": True, + "deprecation_note": ( + "OSS graph memory was removed in mem0 v2.0.0. Use search/get_all for " + "memory retrieval; entity links now affect ranking only." + ), + } @timed("chat_with_memory") async def chat_with_memory( @@ -392,10 +372,8 @@ class Mem0Manager: search_start_time = time.time() search_result = self.memory.search( query=message, - user_id=user_id, - agent_id=agent_id, - run_id=run_id, - limit=10, + filters=_build_filters(user_id, agent_id, run_id), + top_k=10, threshold=0.3, ) relevant_memories = search_result.get("results", []) @@ -491,7 +469,9 @@ class Mem0Manager: # Check Mem0 memory try: - self.memory.search(query="test", user_id="health_check", limit=1) + self.memory.search( + query="test", filters={"user_id": "health_check"}, top_k=1 + ) status["mem0_memory"] = "healthy" except Exception as e: status["mem0_memory"] = f"unhealthy: {str(e)}" diff --git a/backend/requirements.txt b/backend/requirements.txt index 72d8fef..f1255da 100644 --- a/backend/requirements.txt +++ b/backend/requirements.txt @@ -4,16 +4,14 @@ uvicorn[standard] python-multipart # Mem0 and AI -mem0ai +mem0ai[nlp]==2.0.2 +fastembed>=0.3.1 openai google-genai cohere # Database -qdrant-client -neo4j -langchain-neo4j -rank-bm25 +qdrant-client>=1.12.0 ollama # Utilities diff --git a/docker-compose.yml b/docker-compose.yml index 3eabf28..be50a75 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -1,7 +1,7 @@ services: - # Qdrant vector database for vector storage + # Qdrant vector database for vector + sparse (BM25) storage qdrant: - image: qdrant/qdrant:latest + image: qdrant/qdrant:v1.12.4 container_name: mem0-qdrant expose: - "6333" @@ -18,39 +18,9 @@ services: retries: 5 restart: unless-stopped - # Neo4j with APOC for graph relationships - neo4j: - image: neo4j:5.26.4 - container_name: mem0-neo4j - environment: - NEO4J_AUTH: ${NEO4J_AUTH:-neo4j/mem0_neo4j_password} - NEO4J_PLUGINS: '["apoc"]' - NEO4J_apoc_export_file_enabled: true - NEO4J_apoc_import_file_enabled: true - NEO4J_apoc_import_file_use__neo4j__config: true - NEO4J_ACCEPT_LICENSE_AGREEMENT: yes - NEO4J_dbms_security_procedures_unrestricted: apoc.* - NEO4J_dbms_security_procedures_allowlist: apoc.* - expose: - - "7474" # HTTP - Internal only - - "7687" # Bolt - Internal only - networks: - - mem0_network - volumes: - - neo4j_data:/data - - neo4j_logs:/logs - - neo4j_import:/var/lib/neo4j/import - - neo4j_plugins:/plugins - healthcheck: - test: ["CMD", "cypher-shell", "-u", "neo4j", "-p", "${NEO4J_PASSWORD:-mem0_neo4j_password}", "RETURN 1"] - interval: 10s - timeout: 10s - retries: 5 - restart: unless-stopped - # Backend API service backend: - build: + build: context: ./backend dockerfile: Dockerfile container_name: mem0-backend @@ -62,9 +32,6 @@ services: QDRANT_HOST: qdrant QDRANT_PORT: 6333 QDRANT_COLLECTION_NAME: ${QDRANT_COLLECTION_NAME:-mem0} - NEO4J_URI: bolt://neo4j:7687 - NEO4J_USERNAME: ${NEO4J_USERNAME:-neo4j} - NEO4J_PASSWORD: ${NEO4J_PASSWORD:-mem0_neo4j_password} LOG_LEVEL: ${LOG_LEVEL:-INFO} CORS_ORIGINS: ${CORS_ORIGINS:-http://localhost:3000} DEFAULT_MODEL: ${DEFAULT_MODEL:-claude-sonnet-4} @@ -80,8 +47,6 @@ services: depends_on: qdrant: condition: service_healthy - neo4j: - condition: service_healthy restart: unless-stopped volumes: - ./backend:/app @@ -90,10 +55,6 @@ services: volumes: qdrant_data: - neo4j_data: - neo4j_logs: - neo4j_import: - neo4j_plugins: networks: mem0_network: diff --git a/docs/MIGRATION_RUNBOOK.md b/docs/MIGRATION_RUNBOOK.md new file mode 100644 index 0000000..df8bfda --- /dev/null +++ b/docs/MIGRATION_RUNBOOK.md @@ -0,0 +1,197 @@ +# Migration Runbook: mem0 v0.1.x/v1.x → v2.0.2 (the "V3 pipeline") + +This runbook covers the **operational** half of the migration — backups, the +Qdrant collection rebuild for BM25, the Neo4j dump, cutover and rollback. The +**code-level** half (Dockerfile, requirements, mem0_manager rewrites, etc.) is +already committed on this branch; this document is what to follow when taking +those code changes to a stack that has live data. + +## TL;DR + +1. Snapshot Qdrant + dump Neo4j (Phase 2). +2. Deploy v2 backend to a scratch stack (Phase 3). +3. Rebuild the Qdrant collection with BM25 by warm-up-add → scroll/upsert → + swap (Phase 4). +4. Run integration tests (Phase 5). +5. Cutover production with the same steps inside a maintenance window (Phase 6). + +## Phase 1 — Pre-flight + +```bash +# 1. Capture the exact running mem0ai version (record for the ticket) +docker compose exec backend pip show mem0ai + +# 2. Tag the pre-migration commit +git tag pre-mem0-v3-migration && git push --tags + +# 3. Verify free disk on the volumes (snapshots can be sizeable) +docker system df -v | grep -E "qdrant_data|neo4j_data" +``` + +## Phase 2 — Backups (read-only on production) + +### Qdrant snapshot + +```bash +# Create snapshot of the live mem0 collection (qdrant runs inside the network as 'qdrant') +docker compose exec backend curl -X POST \ + "http://qdrant:6333/collections/mem0/snapshots?wait=true" + +# Returns JSON with the snapshot filename, e.g. mem0-XXXXXXXX.snapshot. +# Copy it off the qdrant container's volume: +docker compose exec qdrant ls /qdrant/storage/collections/mem0/snapshots/ +mkdir -p ./backups/qdrant +docker cp mem0-qdrant:/qdrant/storage/collections/mem0/snapshots/ ./backups/qdrant/ +``` + +### Neo4j offline dump (decommission path) + +Neo4j 5.x requires the database to be stopped to dump it. + +```bash +mkdir -p ./backups/neo4j +docker compose stop neo4j +docker run --rm \ + --volumes-from mem0-neo4j \ + -v "$(pwd)/backups/neo4j:/dumps" \ + neo4j:5.26.4 \ + neo4j-admin database dump neo4j --to-path=/dumps +# (No need to restart neo4j — it is being decommissioned.) +``` + +Keep both backups for **at least 30 days** post-cutover. Calendar a reminder. + +### Pre-cutover per-user memory counts + +```bash +# Iterate API_KEYS users, hit /stats/{user_id}, save the count. Adjust per your auth. +for user in $(jq -r 'values | unique[]' <<< "$API_KEYS"); do + echo -n "$user: " + docker compose exec backend curl -s -H "X-API-Key: " \ + "http://localhost:8000/stats/$user" | jq -r '.memory_count // 0' +done > pre-cutover-counts.txt +``` + +## Phase 3 — Deploy v2 backend (scratch stack first) + +Use a developer or staging machine with a **restored copy** of the prod +snapshot, not prod itself. + +```bash +# Restore the prod snapshot onto the scratch Qdrant +docker compose exec backend curl -X POST \ + "http://qdrant:6333/collections/mem0_legacy/snapshots/upload?priority=snapshot" \ + -H "Content-Type: multipart/form-data" \ + -F "snapshot=@/backups/qdrant/" +# (or restore as 'mem0' if you want to start with the legacy name) + +# Build + start the v2 backend +docker compose build --no-cache backend +docker compose up -d +docker compose logs -f backend +``` + +Watch for: +- `Applied Claude/OpenAI-compatible patch: cleared top_p (and store)` — patch loaded. +- `Initialized ultra-minimal Mem0Manager with custom endpoint` — startup OK. +- No errors mentioning `graph_store` or `enable_graph` (we removed them). +- On any search/get_all: no `ValueError` from filters. + +## Phase 4 — Rebuild the Qdrant collection for BM25 + +Pre-v2 collections lack the `bm25` sparse-vector slot. mem0 v2 silently +downgrades to semantic-only on them — to get full hybrid search you must +recreate the collection. + +```bash +# 1. Set the env var so the v2 backend creates a NEW collection with the right schema +docker compose exec backend sh -c 'QDRANT_COLLECTION_NAME=mem0_v3 \ + python -c "from mem0_manager import mem0_manager; \ + mem0_manager.memory.add([{\"role\":\"user\",\"content\":\"warm-up\"}], user_id=\"__warmup__\")"' +# This lazy-creates mem0_v3 + mem0_v3_entities with the bm25 slot. +# (Delete the warm-up memory after if you care.) + +# 2. Run the migration script — preserves id + vector + payload, no re-embed +docker compose exec backend python /app/../scripts/migrate_qdrant_to_v3.py \ + --source mem0 --target mem0_v3 \ + --qdrant-host qdrant --qdrant-port 6333 --dry-run +# Inspect the per-user counts. If OK, run for real: +docker compose exec backend python /app/../scripts/migrate_qdrant_to_v3.py \ + --source mem0 --target mem0_v3 \ + --qdrant-host qdrant --qdrant-port 6333 + +# 3. Swap names. Qdrant has no in-place rename — use snapshot+upload. +# Snapshot mem0_v3, upload as mem0_swap, then snapshot mem0 as mem0_legacy, then +# upload mem0_swap as mem0. Or simply point QDRANT_COLLECTION_NAME at mem0_v3 in +# docker-compose.yml and keep `mem0` around as the legacy backup. +``` + +Easiest path: **leave the legacy collection alone** and update +`QDRANT_COLLECTION_NAME` to `mem0_v3` in `.env` / `docker-compose.yml`. The +legacy `mem0` collection sits there as an extra backup until you delete it. + +## Phase 5 — Integration tests + +```bash +MEM0_API_KEY= python test_integration.py -v +``` + +The test script generates a fresh `TEST_USER` per run — make sure the supplied +API key maps to that user (see CLAUDE.md "There are no unit tests..." note). + +Expected: all pass. The `/graph/relationships/{user_id}` test should accept the +new `deprecated: true` payload. + +## Phase 6 — Production cutover + +Maintenance window ~30 min. + +1. Communicate the window. +2. Re-snapshot Qdrant immediately before the deploy (so the rollback snapshot + is the freshest possible). +3. `git pull` the migration branch (or merge to main first). +4. `docker compose build --no-cache backend && docker compose up -d backend`. +5. Run the Phase 4 collection rebuild on prod. +6. Smoke test: `/health`, one `/chat` round-trip, one `/memories` write, one + `/memories/search` read. +7. Verify per-user counts match `pre-cutover-counts.txt` (use the same loop). + +## Rollback + +### Before the first v2 write hits prod (fully safe) + +```bash +git revert +docker compose build --no-cache backend +docker compose up -d backend +``` + +### After cutover but snapshot still on disk (loses post-cutover writes) + +```bash +# Stop the backend so no more writes land on the v2 collection +docker compose stop backend + +# Restore the pre-cutover Qdrant snapshot to a fresh name, then swap +docker compose exec qdrant curl -X POST \ + "http://qdrant:6333/collections/mem0_rollback/snapshots/upload?priority=snapshot" \ + -H "Content-Type: multipart/form-data" \ + -F "snapshot=@/qdrant/snapshots/" +# Update QDRANT_COLLECTION_NAME=mem0_rollback or rename via snapshot+upload. + +# Restore Neo4j if needed +docker run --rm \ + --volumes-from mem0-neo4j \ + -v "$(pwd)/backups/neo4j:/dumps" \ + neo4j:5.26.4 \ + neo4j-admin database load neo4j --from-path=/dumps --overwrite-destination=true + +# Revert code and restart +git revert +docker compose build --no-cache backend +docker compose up -d backend +``` + +### After snapshot retention expires + +**Irreversible.** Keep the pre-cutover snapshot and Neo4j dump for ≥30 days. diff --git a/scripts/migrate_qdrant_to_v3.py b/scripts/migrate_qdrant_to_v3.py new file mode 100644 index 0000000..3d47c35 --- /dev/null +++ b/scripts/migrate_qdrant_to_v3.py @@ -0,0 +1,189 @@ +#!/usr/bin/env python3 +"""Migrate a legacy mem0 v0.1.x/v1.x Qdrant collection to a v2-compatible one. + +Why this script exists +---------------------- +mem0 v2 stores a `bm25` sparse vector alongside each dense vector to enable +hybrid search. Pre-v2 collections lack that slot — mem0's Qdrant adapter +silently downgrades to semantic-only writes on them. To unlock BM25 you must +recreate the collection with the sparse slot AND copy the existing points +over (preserving id, vector, payload — no re-embed needed). + +How it works +------------ +1. Connect to Qdrant. +2. Scroll all points from the source collection (with vectors + payload). +3. Upsert them into the target collection in batches. +4. Verify counts match per `user_id`. + +The target collection MUST already exist with the BM25 slot. The recommended +way to create it is to boot the v2 backend pointed at `QDRANT_COLLECTION_NAME=` +and trigger one `add()` call — mem0 lazy-creates the collection (and the sister +`_entities` collection) with the right schema. + +Usage +----- + # Dry run (no writes): + python scripts/migrate_qdrant_to_v3.py \\ + --source mem0 --target mem0_v3 \\ + --qdrant-host localhost --qdrant-port 6333 \\ + --dry-run + + # Real migration: + python scripts/migrate_qdrant_to_v3.py \\ + --source mem0 --target mem0_v3 \\ + --qdrant-host localhost --qdrant-port 6333 + + # From inside the backend container (where Qdrant resolves as `qdrant`): + docker compose exec backend python /app/../scripts/migrate_qdrant_to_v3.py \\ + --source mem0 --target mem0_v3 --qdrant-host qdrant --qdrant-port 6333 + +Prereqs +------- +- qdrant-client>=1.12.0 installed +- A fresh Qdrant snapshot of the source collection (see docs/MIGRATION_RUNBOOK.md) +- The target collection created via a v2 backend warm-up add() +""" + +import argparse +import sys +from collections import Counter +from typing import Optional + +from qdrant_client import QdrantClient +from qdrant_client.http import models + + +def parse_args() -> argparse.Namespace: + p = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter) + p.add_argument("--source", required=True, help="Source (legacy) collection name") + p.add_argument("--target", required=True, help="Target (v2-created) collection name") + p.add_argument("--qdrant-host", default="localhost") + p.add_argument("--qdrant-port", type=int, default=6333) + p.add_argument("--batch-size", type=int, default=256, help="Scroll/upsert batch size") + p.add_argument("--dry-run", action="store_true", help="Read-only — show counts, no writes") + return p.parse_args() + + +def collection_must_exist(client: QdrantClient, name: str) -> models.CollectionInfo: + if not client.collection_exists(name): + print(f"ERROR: collection {name!r} does not exist on Qdrant.", file=sys.stderr) + sys.exit(2) + return client.get_collection(name) + + +def verify_target_has_bm25(target_info: models.CollectionInfo) -> None: + sparse = getattr(target_info.config.params, "sparse_vectors", None) + if not sparse or "bm25" not in sparse: + print( + "ERROR: target collection has no `bm25` sparse-vector slot. Did you create " + "it via a v2 backend warm-up add()? See docs/MIGRATION_RUNBOOK.md.", + file=sys.stderr, + ) + sys.exit(2) + + +def count_per_user(client: QdrantClient, collection: str) -> Counter: + counts: Counter = Counter() + offset: Optional[models.PointId] = None + while True: + points, offset = client.scroll( + collection_name=collection, + limit=1024, + with_payload=["user_id"], + with_vectors=False, + offset=offset, + ) + for p in points: + uid = (p.payload or {}).get("user_id", "") + counts[uid] += 1 + if offset is None: + break + return counts + + +def migrate( + client: QdrantClient, source: str, target: str, batch_size: int, dry_run: bool +) -> int: + transferred = 0 + offset: Optional[models.PointId] = None + while True: + points, offset = client.scroll( + collection_name=source, + limit=batch_size, + with_payload=True, + with_vectors=True, + offset=offset, + ) + if not points: + break + + if not dry_run: + client.upsert( + collection_name=target, + points=[ + models.PointStruct(id=p.id, vector=p.vector, payload=p.payload) + for p in points + ], + wait=True, + ) + transferred += len(points) + print(f" ... transferred {transferred} points") + + if offset is None: + break + return transferred + + +def main() -> None: + args = parse_args() + client = QdrantClient(host=args.qdrant_host, port=args.qdrant_port) + + src_info = collection_must_exist(client, args.source) + tgt_info = collection_must_exist(client, args.target) + verify_target_has_bm25(tgt_info) + + src_count = client.count(args.source, exact=True).count + tgt_count_before = client.count(args.target, exact=True).count + print(f"Source {args.source!r}: {src_count} points") + print(f"Target {args.target!r}: {tgt_count_before} points (before)") + if tgt_count_before > 1: + print( + "WARNING: target collection is non-empty (>1 point). Migration will " + "upsert into it; ids collide → existing points overwritten." + ) + + print("\nPer-user count (source):") + src_per_user = count_per_user(client, args.source) + for uid, c in src_per_user.most_common(): + print(f" {uid}: {c}") + + if args.dry_run: + print("\nDRY RUN — no writes performed.") + sys.exit(0) + + print("\nMigrating points (preserving id + vector + payload, no re-embed)...") + transferred = migrate(client, args.source, args.target, args.batch_size, dry_run=False) + + tgt_count_after = client.count(args.target, exact=True).count + print(f"\nDone. Transferred {transferred} points.") + print(f"Target {args.target!r}: {tgt_count_after} points (after)") + + print("\nPer-user count (target, after):") + tgt_per_user = count_per_user(client, args.target) + mismatches = 0 + for uid, src_c in src_per_user.most_common(): + tgt_c = tgt_per_user.get(uid, 0) + marker = "OK" if tgt_c == src_c else f"MISMATCH ({tgt_c})" + print(f" {uid}: src={src_c} tgt={tgt_c} [{marker}]") + if tgt_c != src_c: + mismatches += 1 + + if mismatches: + print(f"\nERROR: {mismatches} user(s) have count mismatches. Investigate before swap.", file=sys.stderr) + sys.exit(3) + print("\nAll per-user counts match. Safe to proceed with collection swap.") + + +if __name__ == "__main__": + main() diff --git a/setup.sh b/setup.sh index 3848dea..4950899 100755 --- a/setup.sh +++ b/setup.sh @@ -3,7 +3,7 @@ set -e COMPOSE_PROJECT_NAME="${COMPOSE_PROJECT_NAME:-mem0}" -VOLUMES=("qdrant_data" "neo4j_data" "neo4j_logs" "neo4j_import" "neo4j_plugins") +VOLUMES=("qdrant_data") echo "==========================================" echo " Mem0 Setup Script"