chore: migrate to mem0ai v2.0.2 (V3 memory pipeline)

Pin mem0ai[nlp]==2.0.2 and fastembed for the new hybrid-search pipeline. Drop OSS graph memory (removed upstream in 2.0.0, PR #4805): remove Neo4j service, env vars, volumes, and driver deps; mark /graph/relationships deprecated. Rewrite Memory.search/get_all/chat/health call sites to use the v2 filters={} + top_k API (entity IDs at top level now raise ValueError). Tighten MCP remove_memory ownership check to O(1) verify_memory_ownership so it doesn't silently truncate at the new top_k=20 default. Downgrade base image to python:3.12-slim for spaCy. Adds scripts/migrate_qdrant_to_v3.py (scroll+upsert with per-user count parity check) and docs/MIGRATION_RUNBOOK.md covering snapshot, dump, collection rebuild, cutover, and rollback procedures.
2026-05-23 14:49:45 +05:30 · 2026-05-23 14:49:45 +05:30 · 0f0addb36b
commit 0f0addb36b
parent 82accabc73
11 changed files with 548 additions and 158 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -0,0 +1,84 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project overview
+
+FastAPI backend that wraps the `mem0ai` SDK (pinned to `mem0ai[nlp]==2.0.2` — the "V3 memory pipeline") to expose memory operations (add/search/update/delete, plus memory-aware chat) over a REST API, an OpenAI-compatible `/v1/chat/completions` endpoint, and an MCP server. Memory is stored in Qdrant (vectors + BM25 sparse), with a sister `{collection}_entities` Qdrant collection auto-created by mem0 for entity linking. Embeddings come from a local Ollama instance; the LLM is a custom OpenAI-compatible endpoint. The frontend is two standalone HTML files (`index.html`, `graph.html`) that call the API directly — no build step.
+
+## Common commands
+
+```bash
+# First-time setup (creates volumes, builds, brings up stack). Prompts to reset volumes if they already exist.
+./setup.sh
+
+# Day-to-day
+docker compose up -d --build      # rebuild + start
+docker compose down               # stop (keeps volumes)
+docker compose down -v            # stop + delete data
+docker compose logs -f backend    # tail backend logs (structlog JSON)
+docker compose restart backend    # pick up code changes (no --reload; see "Volumes" gotcha)
+
+# Sanity check — assumes a host route to backend:8000 exists (see "Networking" gotcha).
+curl http://localhost:8000/health
+
+# Integration tests — hit the running stack, no mocks. See test_integration.py for the test list.
+MEM0_API_KEY=<key-from-API_KEYS> python test_integration.py
+MEM0_API_KEY=<key-from-API_KEYS> python test_integration.py -v
+```
+
+There are no unit tests and no separate lint/format/type-check setup — `test_integration.py` is the only test entry point and it requires a fully-running Docker stack. The script generates a fresh `TEST_USER = f"test_user_{int(datetime.now().timestamp())}"` per run, so for tests to pass auth checks the supplied `MEM0_API_KEY` must map to that exact user in `API_KEYS` (either set `TEST_USER` to a statically mapped user, or add a mapping for the run).
+
+## Architecture
+
+### Request flow
+1. Client hits FastAPI (`backend/main.py`) with `X-API-Key` (or `Authorization: Bearer` for `/v1/chat/completions`).
+2. `auth.py` resolves the key → `user_id` via `settings.api_key_mapping` (parsed from the `API_KEYS` env JSON). Every protected endpoint then verifies the caller's `user_id` matches the path/body `user_id` — there is no admin or cross-user access.
+3. The endpoint calls `mem0_manager.mem0_manager` (singleton in `backend/mem0_manager.py`), which delegates to the `mem0ai` SDK. The SDK in turn calls Qdrant, Ollama, Cohere (reranker), and the custom OpenAI endpoint.
+4. The `@timed("operation_name")` decorator from `backend/monitoring.py` wraps memory operations to log structured timings and feed the in-memory `stats` singleton that powers `/stats` and `/stats/{user_id}`.
+
+### Three parallel API surfaces
+All three live in the same FastAPI process and share auth + rate limiting:
+- **Native REST** (`/chat`, `/memories*`, `/graph/relationships/{user_id}` *(deprecated — returns empty payload)*, `/stats*`, `/models`, `/users`) — authenticates via `X-API-Key`.
+- **OpenAI-compatible** (`/v1/chat/completions`, also `/chat/completions`) — authenticates via `Authorization: Bearer <key>` or `X-API-Key`; supports streaming SSE. Implemented in `main.py:openai_chat_completions` and `stream_openai_response`.
+- **MCP** mounted at `/mcp` (see `backend/mcp_server.py`) — uses a Starlette `MCPAuthMiddleware` that stuffs the resolved `user_id` into a `ContextVar`, which the FastMCP tools (`add_memory`, `search_memory`, `remove_memory`, `chat`) read. The MCP session manager is started inside the main FastAPI `lifespan` in `main.py` — mounted-app lifespans don't run automatically, so don't move that startup logic.
+
+### Storage layout
+- **Qdrant** — collection name from `QDRANT_COLLECTION_NAME` (default `mem0`). Embedding dim must match the embedder; see "Embedding dimensions" gotcha below. Collections created by mem0 v2 carry a `bm25` sparse-vector slot for hybrid search (semantic + keyword + entity-boost). The slot is added automatically at collection creation; existing pre-v2 collections silently degrade to semantic-only with a logged warning — they must be recreated to gain BM25.
+- **`{collection}_entities`** — sister Qdrant collection lazy-created by mem0 v2 on first `add()`, same dimension as the main collection. Stores entity vectors used for ranking boost. No code touches it directly.
+- **No graph store** — Neo4j and the OSS graph memory feature were removed in `mem0ai` 2.0.0 (PR #4805). The `/graph/relationships/{user_id}` endpoint is kept for client compatibility but returns `deprecated: true` with empty arrays.
+- **No SQL store** — older docs mention PostgreSQL/pgvector; that's no longer used. Qdrant only.
+
+## Important conventions / gotchas
+
+### Networking: backend is not published to the host
+`docker-compose.yml` defines the backend on the **external** `npm_network` (Nginx Proxy Manager) and only `expose`s port 8000 inside Docker. There is no `ports:` mapping. To hit it from the host you need either: (a) the NPM proxy in front of it, (b) `docker compose exec backend curl ...`, or (c) add a temporary `ports:` mapping. The `npm_network` must exist before bringing the stack up (`docker network create npm_network` if you don't run NPM).
+
+### Claude/OpenAI-compatible monkey-patch
+`mem0_manager.py` patches `mem0.llms.openai.OpenAILLM.generate_response` at import time to clear `store` and `top_p` from the config. In `mem0ai>=2.0.0`, the `store` half is redundant (upstream made `store` opt-in) but kept as a harmless safety net. The `top_p` clearing is still load-bearing: Claude (reached via the custom OpenAI-compatible endpoint) rejects `top_p` whenever `temperature` is set, and `OpenAILLM` sends both unconditionally. If you upgrade `mem0ai` and chat starts 400-ing on the custom endpoint, this patch is the first place to look.
+
+### Embedding model and dimensions are coupled
+`EMBEDDING_MODEL` and `EMBEDDING_DIMS` in `.env` / `docker-compose.yml` must agree, and they must match the dim the Qdrant collection was created with. Defaults are `qwen3-embedding:4b-q8_0` / `2560`. Switching the model requires either matching dims or recreating the Qdrant collection (`./setup.sh` → option 2 wipes volumes).
+
+### Single-model architecture
+Despite what `README.md`, `TESTING.md`, and `MEM0.md` say about intelligent routing across `o4-mini` / `gemini-2.5-pro` / `claude-sonnet-4` / `o3`, the code uses **one** model — `settings.default_model` (`claude-sonnet-4` by default). `/models` returns only that. Don't reintroduce routing without first checking with the user.
+
+### mem0 v2 API: filters dict + top_k
+mem0 v2 rejects `user_id`/`agent_id`/`run_id` as top-level kwargs on `Memory.search` and `Memory.get_all` (raises `ValueError`) — they must live inside a `filters={...}` dict. The `limit` kwarg is renamed `top_k` (default reduced 100 → 20 — pass it explicitly when you need more). `Memory.add` and `Memory.delete_all` still accept these IDs as top-level kwargs. Use the `_build_filters()` helper at the top of `mem0_manager.py` to construct the dict. Search `score` is now a fused multi-signal value (semantic + BM25 + entity boost), not raw cosine — don't compare against thresholds calibrated for the old scoring.
+
+### ADD-only memory algorithm
+`Memory.add` in mem0 v2 only emits `ADD` events; the engine no longer issues `UPDATE`/`DELETE` events based on LLM judgment. Per-user memory count grows monotonically. Explicit `Memory.update` / `Memory.delete` still work and are how the project mutates memories.
+
+### Auth & rate limiting
+- All endpoints except `/health` require a valid `X-API-Key` (or Bearer for the OpenAI-compatible routes). `API_KEYS` is a JSON object mapping keys → user IDs. Note this contradicts `AUTH_SETUP.md`, which lists `/stats` and `/models` as public — the code is authoritative.
+- Rate limits via `slowapi` are set per endpoint in `main.py` decorators: chat 30/min, writes 60/min, reads 120/min, bulk user-delete 10/min. Keyed by API key (fallback to remote IP).
+- Memory ownership is checked via `mem0_manager.verify_memory_ownership` (O(1) `Memory.get(memory_id)`) — use this rather than fetching all user memories.
+
+### Config field aliases
+`backend/config.py` uses Pydantic `AliasChoices` so both `OPENAI_API_KEY` and `OPENAI_COMPAT_API_KEY` (and `OPENAI_BASE_URL` / `OPENAI_COMPAT_BASE_URL`) populate the same field. `docker-compose.yml` passes `OPENAI_API_KEY`; `.env.example` documents `OPENAI_COMPAT_API_KEY`. Both work.
+
+### Volumes mount the source in, but no hot reload
+`docker-compose.yml` bind-mounts `./backend:/app` and `./frontend:/app/frontend`, and `uvicorn` runs with `--workers 4` and no `--reload`. Code edits become live only on container restart (`docker compose restart backend`).
+
+### Logging
+Use `structlog.get_logger(__name__)` with **keyword arguments** (e.g. `logger.info("msg", user_id=x)`). The last few commits explicitly fixed places that mixed stdlib `logging` (which silently drops kwargs) with structlog. Don't reintroduce `logging.getLogger` in this codebase.
--- a/backend/Dockerfile
+++ b/backend/Dockerfile
@ -1,4 +1,4 @@
-FROM python:3.13-slim
+FROM python:3.12-slim

 # Set working directory
 WORKDIR /app
--- a/backend/config.py
+++ b/backend/config.py
@ -44,20 +44,6 @@ class Settings(BaseSettings):
        ),
    )

-    # Neo4j Configuration
-    neo4j_uri: str = Field(
-        default="bolt://localhost:7687",
-        validation_alias=AliasChoices("NEO4J_URI", "neo4j_uri"),
-    )
-    neo4j_username: str = Field(
-        default="neo4j",
-        validation_alias=AliasChoices("NEO4J_USERNAME", "neo4j_username"),
-    )
-    neo4j_password: str = Field(
-        default="mem0_neo4j_password",
-        validation_alias=AliasChoices("NEO4J_PASSWORD", "neo4j_password"),
-    )
-
    # Application Configuration
    log_level: str = Field(
        default="INFO", validation_alias=AliasChoices("LOG_LEVEL", "log_level")
--- a/backend/main.py
+++ b/backend/main.py
@ -115,8 +115,8 @@ async def lifespan(app: FastAPI):
 # Initialize FastAPI app
 app = FastAPI(
    title="Mem0 Interface POC",
-    description="Minimal but fully functional Mem0 interface with PostgreSQL and Neo4j integration",
-    version="1.0.0",
+    description="Minimal Mem0 interface backed by Qdrant (mem0ai v2 hybrid-search pipeline)",
+    version="2.0.0",
    lifespan=lifespan,
 )

@ -535,7 +535,7 @@ async def search_memories(
            query=search_request.query,
            user_id=search_request.user_id,
            limit=search_request.limit,
-            threshold=search_request.threshold or 0.2,
+            threshold=search_request.threshold or 0.1,
            filters=search_request.filters,
            agent_id=search_request.agent_id,
            run_id=search_request.run_id,
@ -700,13 +700,15 @@ async def delete_user_memories(
        )


-# Graph relationships endpoint - pure Mem0 passthrough
-@app.get("/graph/relationships/{user_id}")
+# Graph relationships endpoint - DEPRECATED in mem0 v2 (OSS graph memory removed).
+# Returns an empty payload with `deprecated: true` so the frontend can render a
+# clear "Graph view unavailable" state. Kept for client compatibility.
+@app.get("/graph/relationships/{user_id}", deprecated=True)
@limiter.limit("60/minute")
 async def get_graph_relationships(
    request: Request, user_id: str, authenticated_user: str = Depends(get_current_user)
 ):
-    """Get graph relationships - pure Mem0 passthrough."""
+    """Get graph relationships - DEPRECATED: mem0 v2 removed OSS graph memory."""
    try:
        # Verify user can only access their own graph relationships
        if authenticated_user != user_id:
--- a/backend/mcp_server.py
+++ b/backend/mcp_server.py
@ -151,15 +151,8 @@ async def remove_memory(
    user_id = get_authenticated_user()
    logger.info(f"MCP remove_memory: user={user_id}, memory_id={memory_id}")

-    # Verify ownership: get user's memories and check if memory_id exists
-    user_memories = await mem0_manager.get_user_memories(
-        user_id=user_id,
-        limit=10000  # Get all to check ownership
-    )
-
-    memory_ids = {m.get("id") for m in user_memories if m.get("id")}
-
-    if memory_id not in memory_ids:
+    # O(1) ownership check via Memory.get(); avoids get_all's v2 top_k=20 cap.
+    if not await mem0_manager.verify_memory_ownership(memory_id, user_id):
        raise ValueError(f"Memory '{memory_id}' not found or access denied")

    result = await mem0_manager.delete_memory(memory_id=memory_id)
--- a/backend/mem0_manager.py
+++ b/backend/mem0_manager.py
@ -19,7 +19,7 @@ from monitoring import timed

 logger = structlog.get_logger(__name__)

-# Retry decorator for database operations (Qdrant, Neo4j)
+# Retry decorator for database operations (Qdrant)
 db_retry = retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=1, max=10),
@ -28,7 +28,11 @@ db_retry = retry(
    reraise=True,
 )

-# Monkey-patch Mem0's OpenAI LLM to remove the 'store' parameter for LiteLLM compatibility
+# Monkey-patch Mem0's OpenAI LLM to clear top_p when the configured LLM
+# is Claude reached via an OpenAI-compatible endpoint: Claude rejects top_p
+# whenever temperature is set, and OpenAILLM sends both unconditionally.
+# (The 'store' branch is now redundant in mem0ai>=2.0.0 — upstream made it
+# opt-in — but harmless; kept for safety.)
 from mem0.llms.openai import OpenAILLM

 _original_generate_response = OpenAILLM.generate_response
@ -37,10 +41,8 @@ _original_generate_response = OpenAILLM.generate_response
 def patched_generate_response(
    self, messages, response_format=None, tools=None, tool_choice="auto", **kwargs
 ):
-    # Remove 'store' parameter as LiteLLM doesn't support it
    if hasattr(self.config, "store"):
        self.config.store = None
-    # Remove 'top_p' to avoid conflict with temperature for Claude models
    if hasattr(self.config, "top_p"):
        self.config.top_p = None
    return _original_generate_response(
@ -49,7 +51,28 @@ def patched_generate_response(


 OpenAILLM.generate_response = patched_generate_response
-logger.info("Applied LiteLLM compatibility patch: disabled 'store' parameter")
+logger.info("Applied Claude/OpenAI-compatible patch: cleared top_p (and store)")
+
+
+def _build_filters(
+    user_id: Optional[str],
+    agent_id: Optional[str] = None,
+    run_id: Optional[str] = None,
+    extra: Optional[Dict[str, Any]] = None,
+) -> Dict[str, Any]:
+    """Build the filters dict required by mem0 v2 search/get_all.
+
+    In mem0 v2.x, user_id/agent_id/run_id are rejected as top-level kwargs
+    on Memory.search and Memory.get_all — they must live inside `filters`.
+    """
+    merged: Dict[str, Any] = dict(extra) if extra else {}
+    if user_id is not None:
+        merged["user_id"] = user_id
+    if agent_id is not None:
+        merged["agent_id"] = agent_id
+    if run_id is not None:
+        merged["run_id"] = run_id
+    return merged


 class Mem0Manager:
@ -59,18 +82,15 @@ class Mem0Manager:
    """

    def __init__(self):
-        # Custom endpoint configuration with graph memory enabled
        logger.info(
            "Initializing Mem0Manager with custom endpoint",
            model=settings.default_model,
            embedding_model=settings.embedding_model,
            embedding_dims=settings.embedding_dims,
            qdrant_host=settings.qdrant_host,
-            neo4j_uri=settings.neo4j_uri,
        )
        config = {
            "version": "v1.1",
-            "enable_graph": True,
            "llm": {
                "provider": "openai",
                "config": {
@ -99,14 +119,6 @@ class Mem0Manager:
                    "on_disk": True,
                },
            },
-            "graph_store": {
-                "provider": "neo4j",
-                "config": {
-                    "url": settings.neo4j_uri,
-                    "username": settings.neo4j_username,
-                    "password": settings.neo4j_password,
-                },
-            },
            "reranker": {
                "provider": "cohere",
                "config": {
@ -208,15 +220,12 @@ class Mem0Manager:
                    "query": query,
                    "note": "Empty query provided, no results returned. Use a specific query to search memories.",
                }
-            # Direct Mem0 search - trust native handling
+            # mem0 v2: entity IDs must live inside the `filters` dict; `limit` is now `top_k`.
            result = self.memory.search(
                query=query,
-                user_id=user_id,
-                agent_id=agent_id,
-                run_id=run_id,
-                limit=limit,
+                filters=_build_filters(user_id, agent_id, run_id, extra=filters),
+                top_k=limit,
                threshold=threshold,
-                filters=filters,
            )
            return {
                "memories": result.get("results", []),
@ -238,13 +247,10 @@ class Mem0Manager:
    ) -> List[Dict[str, Any]]:
        """Get all memories for a user - native Mem0 pattern."""
        try:
-            # Direct Mem0 get_all call - trust native parameter handling
+            # mem0 v2: entity IDs must live inside the `filters` dict; `limit` is now `top_k`.
            result = self.memory.get_all(
-                user_id=user_id,
-                limit=limit,
-                agent_id=agent_id,
-                run_id=run_id,
-                filters=filters,
+                filters=_build_filters(user_id, agent_id, run_id, extra=filters),
+                top_k=limit,
            )
            return result.get("results", [])
        except Exception as e:
@ -323,44 +329,14 @@ class Mem0Manager:
        run_id: Optional[str],
        limit: int = 50,
    ) -> Dict[str, Any]:
-        """Get graph relationships - using correct Mem0 get_all() method."""
-        try:
-            # Use get_all() to retrieve memories with graph relationships
-            result = self.memory.get_all(
-                user_id=user_id, agent_id=agent_id, run_id=run_id, limit=limit
-            )
+        """Graph relationships — deprecated in mem0 v2 (OSS graph memory removed).

-            # Extract relationships from Mem0's response structure
-            relationships = result.get("relations", [])
-
-            # For entities, we can derive them from memory results or relations
-            entities = []
-            if "results" in result:
-                # Extract unique entities from memories and relationships
-                entity_set = set()
-
-                # Add entities from relationships
-                for rel in relationships:
-                    if "source" in rel:
-                        entity_set.add(rel["source"])
-                    if "target" in rel:
-                        entity_set.add(rel["target"])
-
-                entities = [{"name": entity} for entity in entity_set]
-
-            return {
-                "relationships": relationships,
-                "entities": entities,
-                "user_id": user_id,
-                "agent_id": agent_id,
-                "run_id": run_id,
-                "total_memories": len(result.get("results", [])),
-                "total_relationships": len(relationships),
-            }
-
-        except Exception as e:
-            logger.error(f"Error getting graph relationships: {e}")
-            # Return empty but structured response on error
+        mem0 v2.0.0 deleted the OSS graph store (Neo4j/Memgraph/Kuzu/AGE drivers).
+        Entity relationships now influence ranking via a parallel `{collection}_entities`
+        Qdrant collection rather than being directly traversable. We return an empty
+        graph payload plus a `deprecated` marker so clients (frontend graph.html) can
+        render a clear "Graph view unavailable" state instead of erroring.
+        """
        return {
            "relationships": [],
            "entities": [],
@ -369,7 +345,11 @@ class Mem0Manager:
            "run_id": run_id,
            "total_memories": 0,
            "total_relationships": 0,
-                "error": str(e),
+            "deprecated": True,
+            "deprecation_note": (
+                "OSS graph memory was removed in mem0 v2.0.0. Use search/get_all for "
+                "memory retrieval; entity links now affect ranking only."
+            ),
        }

    @timed("chat_with_memory")
@ -392,10 +372,8 @@ class Mem0Manager:
            search_start_time = time.time()
            search_result = self.memory.search(
                query=message,
-                user_id=user_id,
-                agent_id=agent_id,
-                run_id=run_id,
-                limit=10,
+                filters=_build_filters(user_id, agent_id, run_id),
+                top_k=10,
                threshold=0.3,
            )
            relevant_memories = search_result.get("results", [])
@ -491,7 +469,9 @@ class Mem0Manager:

        # Check Mem0 memory
        try:
-            self.memory.search(query="test", user_id="health_check", limit=1)
+            self.memory.search(
+                query="test", filters={"user_id": "health_check"}, top_k=1
+            )
            status["mem0_memory"] = "healthy"
        except Exception as e:
            status["mem0_memory"] = f"unhealthy: {str(e)}"
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@ -4,16 +4,14 @@ uvicorn[standard]
 python-multipart

 # Mem0 and AI
-mem0ai
+mem0ai[nlp]==2.0.2
+fastembed>=0.3.1
 openai
 google-genai
 cohere

 # Database
-qdrant-client
-neo4j
-langchain-neo4j
-rank-bm25
+qdrant-client>=1.12.0
 ollama

 # Utilities
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -1,7 +1,7 @@
 services:
-  # Qdrant vector database for vector storage
+  # Qdrant vector database for vector + sparse (BM25) storage
  qdrant:
-    image: qdrant/qdrant:latest
+    image: qdrant/qdrant:v1.12.4
    container_name: mem0-qdrant
    expose:
      - "6333"
@ -18,36 +18,6 @@ services:
      retries: 5
    restart: unless-stopped

-  # Neo4j with APOC for graph relationships
-  neo4j:
-    image: neo4j:5.26.4
-    container_name: mem0-neo4j
-    environment:
-      NEO4J_AUTH: ${NEO4J_AUTH:-neo4j/mem0_neo4j_password}
-      NEO4J_PLUGINS: '["apoc"]'
-      NEO4J_apoc_export_file_enabled: true
-      NEO4J_apoc_import_file_enabled: true
-      NEO4J_apoc_import_file_use__neo4j__config: true
-      NEO4J_ACCEPT_LICENSE_AGREEMENT: yes
-      NEO4J_dbms_security_procedures_unrestricted: apoc.*
-      NEO4J_dbms_security_procedures_allowlist: apoc.*
-    expose:
-      - "7474"  # HTTP - Internal only
-      - "7687"  # Bolt - Internal only
-    networks:
-      - mem0_network
-    volumes:
-      - neo4j_data:/data
-      - neo4j_logs:/logs
-      - neo4j_import:/var/lib/neo4j/import
-      - neo4j_plugins:/plugins
-    healthcheck:
-      test: ["CMD", "cypher-shell", "-u", "neo4j", "-p", "${NEO4J_PASSWORD:-mem0_neo4j_password}", "RETURN 1"]
-      interval: 10s
-      timeout: 10s
-      retries: 5
-    restart: unless-stopped
-
  # Backend API service
  backend:
    build:
@ -62,9 +32,6 @@ services:
      QDRANT_HOST: qdrant
      QDRANT_PORT: 6333
      QDRANT_COLLECTION_NAME: ${QDRANT_COLLECTION_NAME:-mem0}
-      NEO4J_URI: bolt://neo4j:7687
-      NEO4J_USERNAME: ${NEO4J_USERNAME:-neo4j}
-      NEO4J_PASSWORD: ${NEO4J_PASSWORD:-mem0_neo4j_password}
      LOG_LEVEL: ${LOG_LEVEL:-INFO}
      CORS_ORIGINS: ${CORS_ORIGINS:-http://localhost:3000}
      DEFAULT_MODEL: ${DEFAULT_MODEL:-claude-sonnet-4}
@ -80,8 +47,6 @@ services:
    depends_on:
      qdrant:
        condition: service_healthy
-      neo4j:
-        condition: service_healthy
    restart: unless-stopped
    volumes:
      - ./backend:/app
@ -90,10 +55,6 @@ services:

 volumes:
  qdrant_data:
-  neo4j_data:
-  neo4j_logs:
-  neo4j_import:
-  neo4j_plugins:

 networks:
  mem0_network:
--- a/docs/MIGRATION_RUNBOOK.md
+++ b/docs/MIGRATION_RUNBOOK.md
@ -0,0 +1,197 @@
+# Migration Runbook: mem0 v0.1.x/v1.x → v2.0.2 (the "V3 pipeline")
+
+This runbook covers the **operational** half of the migration — backups, the
+Qdrant collection rebuild for BM25, the Neo4j dump, cutover and rollback. The
+**code-level** half (Dockerfile, requirements, mem0_manager rewrites, etc.) is
+already committed on this branch; this document is what to follow when taking
+those code changes to a stack that has live data.
+
+## TL;DR
+
+1. Snapshot Qdrant + dump Neo4j (Phase 2).
+2. Deploy v2 backend to a scratch stack (Phase 3).
+3. Rebuild the Qdrant collection with BM25 by warm-up-add → scroll/upsert →
+   swap (Phase 4).
+4. Run integration tests (Phase 5).
+5. Cutover production with the same steps inside a maintenance window (Phase 6).
+
+## Phase 1 — Pre-flight
+
+```bash
+# 1. Capture the exact running mem0ai version (record for the ticket)
+docker compose exec backend pip show mem0ai
+
+# 2. Tag the pre-migration commit
+git tag pre-mem0-v3-migration && git push --tags
+
+# 3. Verify free disk on the volumes (snapshots can be sizeable)
+docker system df -v | grep -E "qdrant_data|neo4j_data"
+```
+
+## Phase 2 — Backups (read-only on production)
+
+### Qdrant snapshot
+
+```bash
+# Create snapshot of the live mem0 collection (qdrant runs inside the network as 'qdrant')
+docker compose exec backend curl -X POST \
+  "http://qdrant:6333/collections/mem0/snapshots?wait=true"
+
+# Returns JSON with the snapshot filename, e.g. mem0-XXXXXXXX.snapshot.
+# Copy it off the qdrant container's volume:
+docker compose exec qdrant ls /qdrant/storage/collections/mem0/snapshots/
+mkdir -p ./backups/qdrant
+docker cp mem0-qdrant:/qdrant/storage/collections/mem0/snapshots/<snapshot-file> ./backups/qdrant/
+```
+
+### Neo4j offline dump (decommission path)
+
+Neo4j 5.x requires the database to be stopped to dump it.
+
+```bash
+mkdir -p ./backups/neo4j
+docker compose stop neo4j
+docker run --rm \
+  --volumes-from mem0-neo4j \
+  -v "$(pwd)/backups/neo4j:/dumps" \
+  neo4j:5.26.4 \
+  neo4j-admin database dump neo4j --to-path=/dumps
+# (No need to restart neo4j — it is being decommissioned.)
+```
+
+Keep both backups for **at least 30 days** post-cutover. Calendar a reminder.
+
+### Pre-cutover per-user memory counts
+
+```bash
+# Iterate API_KEYS users, hit /stats/{user_id}, save the count. Adjust per your auth.
+for user in $(jq -r 'values | unique[]' <<< "$API_KEYS"); do
+  echo -n "$user: "
+  docker compose exec backend curl -s -H "X-API-Key: <admin-or-user-key>" \
+    "http://localhost:8000/stats/$user" | jq -r '.memory_count // 0'
+done > pre-cutover-counts.txt
+```
+
+## Phase 3 — Deploy v2 backend (scratch stack first)
+
+Use a developer or staging machine with a **restored copy** of the prod
+snapshot, not prod itself.
+
+```bash
+# Restore the prod snapshot onto the scratch Qdrant
+docker compose exec backend curl -X POST \
+  "http://qdrant:6333/collections/mem0_legacy/snapshots/upload?priority=snapshot" \
+  -H "Content-Type: multipart/form-data" \
+  -F "snapshot=@/backups/qdrant/<snapshot-file>"
+# (or restore as 'mem0' if you want to start with the legacy name)
+
+# Build + start the v2 backend
+docker compose build --no-cache backend
+docker compose up -d
+docker compose logs -f backend
+```
+
+Watch for:
+- `Applied Claude/OpenAI-compatible patch: cleared top_p (and store)` — patch loaded.
+- `Initialized ultra-minimal Mem0Manager with custom endpoint` — startup OK.
+- No errors mentioning `graph_store` or `enable_graph` (we removed them).
+- On any search/get_all: no `ValueError` from filters.
+
+## Phase 4 — Rebuild the Qdrant collection for BM25
+
+Pre-v2 collections lack the `bm25` sparse-vector slot. mem0 v2 silently
+downgrades to semantic-only on them — to get full hybrid search you must
+recreate the collection.
+
+```bash
+# 1. Set the env var so the v2 backend creates a NEW collection with the right schema
+docker compose exec backend sh -c 'QDRANT_COLLECTION_NAME=mem0_v3 \
+  python -c "from mem0_manager import mem0_manager; \
+             mem0_manager.memory.add([{\"role\":\"user\",\"content\":\"warm-up\"}], user_id=\"__warmup__\")"'
+# This lazy-creates mem0_v3 + mem0_v3_entities with the bm25 slot.
+# (Delete the warm-up memory after if you care.)
+
+# 2. Run the migration script — preserves id + vector + payload, no re-embed
+docker compose exec backend python /app/../scripts/migrate_qdrant_to_v3.py \
+  --source mem0 --target mem0_v3 \
+  --qdrant-host qdrant --qdrant-port 6333 --dry-run
+# Inspect the per-user counts. If OK, run for real:
+docker compose exec backend python /app/../scripts/migrate_qdrant_to_v3.py \
+  --source mem0 --target mem0_v3 \
+  --qdrant-host qdrant --qdrant-port 6333
+
+# 3. Swap names. Qdrant has no in-place rename — use snapshot+upload.
+# Snapshot mem0_v3, upload as mem0_swap, then snapshot mem0 as mem0_legacy, then
+# upload mem0_swap as mem0. Or simply point QDRANT_COLLECTION_NAME at mem0_v3 in
+# docker-compose.yml and keep `mem0` around as the legacy backup.
+```
+
+Easiest path: **leave the legacy collection alone** and update
+`QDRANT_COLLECTION_NAME` to `mem0_v3` in `.env` / `docker-compose.yml`. The
+legacy `mem0` collection sits there as an extra backup until you delete it.
+
+## Phase 5 — Integration tests
+
+```bash
+MEM0_API_KEY=<dev-key-mapped-to-test-user> python test_integration.py -v
+```
+
+The test script generates a fresh `TEST_USER` per run — make sure the supplied
+API key maps to that user (see CLAUDE.md "There are no unit tests..." note).
+
+Expected: all pass. The `/graph/relationships/{user_id}` test should accept the
+new `deprecated: true` payload.
+
+## Phase 6 — Production cutover
+
+Maintenance window ~30 min.
+
+1. Communicate the window.
+2. Re-snapshot Qdrant immediately before the deploy (so the rollback snapshot
+   is the freshest possible).
+3. `git pull` the migration branch (or merge to main first).
+4. `docker compose build --no-cache backend && docker compose up -d backend`.
+5. Run the Phase 4 collection rebuild on prod.
+6. Smoke test: `/health`, one `/chat` round-trip, one `/memories` write, one
+   `/memories/search` read.
+7. Verify per-user counts match `pre-cutover-counts.txt` (use the same loop).
+
+## Rollback
+
+### Before the first v2 write hits prod (fully safe)
+
+```bash
+git revert <migration-commit-sha>
+docker compose build --no-cache backend
+docker compose up -d backend
+```
+
+### After cutover but snapshot still on disk (loses post-cutover writes)
+
+```bash
+# Stop the backend so no more writes land on the v2 collection
+docker compose stop backend
+
+# Restore the pre-cutover Qdrant snapshot to a fresh name, then swap
+docker compose exec qdrant curl -X POST \
+  "http://qdrant:6333/collections/mem0_rollback/snapshots/upload?priority=snapshot" \
+  -H "Content-Type: multipart/form-data" \
+  -F "snapshot=@/qdrant/snapshots/<pre-cutover-snapshot>"
+# Update QDRANT_COLLECTION_NAME=mem0_rollback or rename via snapshot+upload.
+
+# Restore Neo4j if needed
+docker run --rm \
+  --volumes-from mem0-neo4j \
+  -v "$(pwd)/backups/neo4j:/dumps" \
+  neo4j:5.26.4 \
+  neo4j-admin database load neo4j --from-path=/dumps --overwrite-destination=true
+
+# Revert code and restart
+git revert <migration-commit-sha>
+docker compose build --no-cache backend
+docker compose up -d backend
+```
+
+### After snapshot retention expires
+
+**Irreversible.** Keep the pre-cutover snapshot and Neo4j dump for ≥30 days.
--- a/scripts/migrate_qdrant_to_v3.py
+++ b/scripts/migrate_qdrant_to_v3.py
@ -0,0 +1,189 @@
+#!/usr/bin/env python3
+"""Migrate a legacy mem0 v0.1.x/v1.x Qdrant collection to a v2-compatible one.
+
+Why this script exists
+----------------------
+mem0 v2 stores a `bm25` sparse vector alongside each dense vector to enable
+hybrid search. Pre-v2 collections lack that slot — mem0's Qdrant adapter
+silently downgrades to semantic-only writes on them. To unlock BM25 you must
+recreate the collection with the sparse slot AND copy the existing points
+over (preserving id, vector, payload — no re-embed needed).
+
+How it works
+------------
+1. Connect to Qdrant.
+2. Scroll all points from the source collection (with vectors + payload).
+3. Upsert them into the target collection in batches.
+4. Verify counts match per `user_id`.
+
+The target collection MUST already exist with the BM25 slot. The recommended
+way to create it is to boot the v2 backend pointed at `QDRANT_COLLECTION_NAME=<target>`
+and trigger one `add()` call — mem0 lazy-creates the collection (and the sister
+`<target>_entities` collection) with the right schema.
+
+Usage
+-----
+    # Dry run (no writes):
+    python scripts/migrate_qdrant_to_v3.py \\
+        --source mem0 --target mem0_v3 \\
+        --qdrant-host localhost --qdrant-port 6333 \\
+        --dry-run
+
+    # Real migration:
+    python scripts/migrate_qdrant_to_v3.py \\
+        --source mem0 --target mem0_v3 \\
+        --qdrant-host localhost --qdrant-port 6333
+
+    # From inside the backend container (where Qdrant resolves as `qdrant`):
+    docker compose exec backend python /app/../scripts/migrate_qdrant_to_v3.py \\
+        --source mem0 --target mem0_v3 --qdrant-host qdrant --qdrant-port 6333
+
+Prereqs
+-------
+- qdrant-client>=1.12.0 installed
+- A fresh Qdrant snapshot of the source collection (see docs/MIGRATION_RUNBOOK.md)
+- The target collection created via a v2 backend warm-up add()
+"""
+
+import argparse
+import sys
+from collections import Counter
+from typing import Optional
+
+from qdrant_client import QdrantClient
+from qdrant_client.http import models
+
+
+def parse_args() -> argparse.Namespace:
+    p = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter)
+    p.add_argument("--source", required=True, help="Source (legacy) collection name")
+    p.add_argument("--target", required=True, help="Target (v2-created) collection name")
+    p.add_argument("--qdrant-host", default="localhost")
+    p.add_argument("--qdrant-port", type=int, default=6333)
+    p.add_argument("--batch-size", type=int, default=256, help="Scroll/upsert batch size")
+    p.add_argument("--dry-run", action="store_true", help="Read-only — show counts, no writes")
+    return p.parse_args()
+
+
+def collection_must_exist(client: QdrantClient, name: str) -> models.CollectionInfo:
+    if not client.collection_exists(name):
+        print(f"ERROR: collection {name!r} does not exist on Qdrant.", file=sys.stderr)
+        sys.exit(2)
+    return client.get_collection(name)
+
+
+def verify_target_has_bm25(target_info: models.CollectionInfo) -> None:
+    sparse = getattr(target_info.config.params, "sparse_vectors", None)
+    if not sparse or "bm25" not in sparse:
+        print(
+            "ERROR: target collection has no `bm25` sparse-vector slot. Did you create "
+            "it via a v2 backend warm-up add()? See docs/MIGRATION_RUNBOOK.md.",
+            file=sys.stderr,
+        )
+        sys.exit(2)
+
+
+def count_per_user(client: QdrantClient, collection: str) -> Counter:
+    counts: Counter = Counter()
+    offset: Optional[models.PointId] = None
+    while True:
+        points, offset = client.scroll(
+            collection_name=collection,
+            limit=1024,
+            with_payload=["user_id"],
+            with_vectors=False,
+            offset=offset,
+        )
+        for p in points:
+            uid = (p.payload or {}).get("user_id", "<none>")
+            counts[uid] += 1
+        if offset is None:
+            break
+    return counts
+
+
+def migrate(
+    client: QdrantClient, source: str, target: str, batch_size: int, dry_run: bool
+) -> int:
+    transferred = 0
+    offset: Optional[models.PointId] = None
+    while True:
+        points, offset = client.scroll(
+            collection_name=source,
+            limit=batch_size,
+            with_payload=True,
+            with_vectors=True,
+            offset=offset,
+        )
+        if not points:
+            break
+
+        if not dry_run:
+            client.upsert(
+                collection_name=target,
+                points=[
+                    models.PointStruct(id=p.id, vector=p.vector, payload=p.payload)
+                    for p in points
+                ],
+                wait=True,
+            )
+        transferred += len(points)
+        print(f"  ... transferred {transferred} points")
+
+        if offset is None:
+            break
+    return transferred
+
+
+def main() -> None:
+    args = parse_args()
+    client = QdrantClient(host=args.qdrant_host, port=args.qdrant_port)
+
+    src_info = collection_must_exist(client, args.source)
+    tgt_info = collection_must_exist(client, args.target)
+    verify_target_has_bm25(tgt_info)
+
+    src_count = client.count(args.source, exact=True).count
+    tgt_count_before = client.count(args.target, exact=True).count
+    print(f"Source {args.source!r}: {src_count} points")
+    print(f"Target {args.target!r}: {tgt_count_before} points (before)")
+    if tgt_count_before > 1:
+        print(
+            "WARNING: target collection is non-empty (>1 point). Migration will "
+            "upsert into it; ids collide → existing points overwritten."
+        )
+
+    print("\nPer-user count (source):")
+    src_per_user = count_per_user(client, args.source)
+    for uid, c in src_per_user.most_common():
+        print(f"  {uid}: {c}")
+
+    if args.dry_run:
+        print("\nDRY RUN — no writes performed.")
+        sys.exit(0)
+
+    print("\nMigrating points (preserving id + vector + payload, no re-embed)...")
+    transferred = migrate(client, args.source, args.target, args.batch_size, dry_run=False)
+
+    tgt_count_after = client.count(args.target, exact=True).count
+    print(f"\nDone. Transferred {transferred} points.")
+    print(f"Target {args.target!r}: {tgt_count_after} points (after)")
+
+    print("\nPer-user count (target, after):")
+    tgt_per_user = count_per_user(client, args.target)
+    mismatches = 0
+    for uid, src_c in src_per_user.most_common():
+        tgt_c = tgt_per_user.get(uid, 0)
+        marker = "OK" if tgt_c == src_c else f"MISMATCH ({tgt_c})"
+        print(f"  {uid}: src={src_c} tgt={tgt_c} [{marker}]")
+        if tgt_c != src_c:
+            mismatches += 1
+
+    if mismatches:
+        print(f"\nERROR: {mismatches} user(s) have count mismatches. Investigate before swap.", file=sys.stderr)
+        sys.exit(3)
+    print("\nAll per-user counts match. Safe to proceed with collection swap.")
+
+
+if __name__ == "__main__":
+    main()
--- a/setup.sh
+++ b/setup.sh
@ -3,7 +3,7 @@
 set -e

 COMPOSE_PROJECT_NAME="${COMPOSE_PROJECT_NAME:-mem0}"
-VOLUMES=("qdrant_data" "neo4j_data" "neo4j_logs" "neo4j_import" "neo4j_plugins")
+VOLUMES=("qdrant_data")

 echo "=========================================="
 echo "   Mem0 Setup Script"