Pin mem0ai[nlp]==2.0.2 and fastembed for the new hybrid-search pipeline. Drop OSS graph memory (removed upstream in 2.0.0, PR #4805): remove Neo4j service, env vars, volumes, and driver deps; mark /graph/relationships deprecated. Rewrite Memory.search/get_all/chat/health call sites to use the v2 filters={} + top_k API (entity IDs at top level now raise ValueError). Tighten MCP remove_memory ownership check to O(1) verify_memory_ownership so it doesn't silently truncate at the new top_k=20 default. Downgrade base image to python:3.12-slim for spaCy. Adds scripts/migrate_qdrant_to_v3.py (scroll+upsert with per-user count parity check) and docs/MIGRATION_RUNBOOK.md covering snapshot, dump, collection rebuild, cutover, and rollback procedures.
197 lines
7.1 KiB
Markdown
197 lines
7.1 KiB
Markdown
# Migration Runbook: mem0 v0.1.x/v1.x → v2.0.2 (the "V3 pipeline")
|
|
|
|
This runbook covers the **operational** half of the migration — backups, the
|
|
Qdrant collection rebuild for BM25, the Neo4j dump, cutover and rollback. The
|
|
**code-level** half (Dockerfile, requirements, mem0_manager rewrites, etc.) is
|
|
already committed on this branch; this document is what to follow when taking
|
|
those code changes to a stack that has live data.
|
|
|
|
## TL;DR
|
|
|
|
1. Snapshot Qdrant + dump Neo4j (Phase 2).
|
|
2. Deploy v2 backend to a scratch stack (Phase 3).
|
|
3. Rebuild the Qdrant collection with BM25 by warm-up-add → scroll/upsert →
|
|
swap (Phase 4).
|
|
4. Run integration tests (Phase 5).
|
|
5. Cutover production with the same steps inside a maintenance window (Phase 6).
|
|
|
|
## Phase 1 — Pre-flight
|
|
|
|
```bash
|
|
# 1. Capture the exact running mem0ai version (record for the ticket)
|
|
docker compose exec backend pip show mem0ai
|
|
|
|
# 2. Tag the pre-migration commit
|
|
git tag pre-mem0-v3-migration && git push --tags
|
|
|
|
# 3. Verify free disk on the volumes (snapshots can be sizeable)
|
|
docker system df -v | grep -E "qdrant_data|neo4j_data"
|
|
```
|
|
|
|
## Phase 2 — Backups (read-only on production)
|
|
|
|
### Qdrant snapshot
|
|
|
|
```bash
|
|
# Create snapshot of the live mem0 collection (qdrant runs inside the network as 'qdrant')
|
|
docker compose exec backend curl -X POST \
|
|
"http://qdrant:6333/collections/mem0/snapshots?wait=true"
|
|
|
|
# Returns JSON with the snapshot filename, e.g. mem0-XXXXXXXX.snapshot.
|
|
# Copy it off the qdrant container's volume:
|
|
docker compose exec qdrant ls /qdrant/storage/collections/mem0/snapshots/
|
|
mkdir -p ./backups/qdrant
|
|
docker cp mem0-qdrant:/qdrant/storage/collections/mem0/snapshots/<snapshot-file> ./backups/qdrant/
|
|
```
|
|
|
|
### Neo4j offline dump (decommission path)
|
|
|
|
Neo4j 5.x requires the database to be stopped to dump it.
|
|
|
|
```bash
|
|
mkdir -p ./backups/neo4j
|
|
docker compose stop neo4j
|
|
docker run --rm \
|
|
--volumes-from mem0-neo4j \
|
|
-v "$(pwd)/backups/neo4j:/dumps" \
|
|
neo4j:5.26.4 \
|
|
neo4j-admin database dump neo4j --to-path=/dumps
|
|
# (No need to restart neo4j — it is being decommissioned.)
|
|
```
|
|
|
|
Keep both backups for **at least 30 days** post-cutover. Calendar a reminder.
|
|
|
|
### Pre-cutover per-user memory counts
|
|
|
|
```bash
|
|
# Iterate API_KEYS users, hit /stats/{user_id}, save the count. Adjust per your auth.
|
|
for user in $(jq -r 'values | unique[]' <<< "$API_KEYS"); do
|
|
echo -n "$user: "
|
|
docker compose exec backend curl -s -H "X-API-Key: <admin-or-user-key>" \
|
|
"http://localhost:8000/stats/$user" | jq -r '.memory_count // 0'
|
|
done > pre-cutover-counts.txt
|
|
```
|
|
|
|
## Phase 3 — Deploy v2 backend (scratch stack first)
|
|
|
|
Use a developer or staging machine with a **restored copy** of the prod
|
|
snapshot, not prod itself.
|
|
|
|
```bash
|
|
# Restore the prod snapshot onto the scratch Qdrant
|
|
docker compose exec backend curl -X POST \
|
|
"http://qdrant:6333/collections/mem0_legacy/snapshots/upload?priority=snapshot" \
|
|
-H "Content-Type: multipart/form-data" \
|
|
-F "snapshot=@/backups/qdrant/<snapshot-file>"
|
|
# (or restore as 'mem0' if you want to start with the legacy name)
|
|
|
|
# Build + start the v2 backend
|
|
docker compose build --no-cache backend
|
|
docker compose up -d
|
|
docker compose logs -f backend
|
|
```
|
|
|
|
Watch for:
|
|
- `Applied Claude/OpenAI-compatible patch: cleared top_p (and store)` — patch loaded.
|
|
- `Initialized ultra-minimal Mem0Manager with custom endpoint` — startup OK.
|
|
- No errors mentioning `graph_store` or `enable_graph` (we removed them).
|
|
- On any search/get_all: no `ValueError` from filters.
|
|
|
|
## Phase 4 — Rebuild the Qdrant collection for BM25
|
|
|
|
Pre-v2 collections lack the `bm25` sparse-vector slot. mem0 v2 silently
|
|
downgrades to semantic-only on them — to get full hybrid search you must
|
|
recreate the collection.
|
|
|
|
```bash
|
|
# 1. Set the env var so the v2 backend creates a NEW collection with the right schema
|
|
docker compose exec backend sh -c 'QDRANT_COLLECTION_NAME=mem0_v3 \
|
|
python -c "from mem0_manager import mem0_manager; \
|
|
mem0_manager.memory.add([{\"role\":\"user\",\"content\":\"warm-up\"}], user_id=\"__warmup__\")"'
|
|
# This lazy-creates mem0_v3 + mem0_v3_entities with the bm25 slot.
|
|
# (Delete the warm-up memory after if you care.)
|
|
|
|
# 2. Run the migration script — preserves id + vector + payload, no re-embed
|
|
docker compose exec backend python /app/../scripts/migrate_qdrant_to_v3.py \
|
|
--source mem0 --target mem0_v3 \
|
|
--qdrant-host qdrant --qdrant-port 6333 --dry-run
|
|
# Inspect the per-user counts. If OK, run for real:
|
|
docker compose exec backend python /app/../scripts/migrate_qdrant_to_v3.py \
|
|
--source mem0 --target mem0_v3 \
|
|
--qdrant-host qdrant --qdrant-port 6333
|
|
|
|
# 3. Swap names. Qdrant has no in-place rename — use snapshot+upload.
|
|
# Snapshot mem0_v3, upload as mem0_swap, then snapshot mem0 as mem0_legacy, then
|
|
# upload mem0_swap as mem0. Or simply point QDRANT_COLLECTION_NAME at mem0_v3 in
|
|
# docker-compose.yml and keep `mem0` around as the legacy backup.
|
|
```
|
|
|
|
Easiest path: **leave the legacy collection alone** and update
|
|
`QDRANT_COLLECTION_NAME` to `mem0_v3` in `.env` / `docker-compose.yml`. The
|
|
legacy `mem0` collection sits there as an extra backup until you delete it.
|
|
|
|
## Phase 5 — Integration tests
|
|
|
|
```bash
|
|
MEM0_API_KEY=<dev-key-mapped-to-test-user> python test_integration.py -v
|
|
```
|
|
|
|
The test script generates a fresh `TEST_USER` per run — make sure the supplied
|
|
API key maps to that user (see CLAUDE.md "There are no unit tests..." note).
|
|
|
|
Expected: all pass. The `/graph/relationships/{user_id}` test should accept the
|
|
new `deprecated: true` payload.
|
|
|
|
## Phase 6 — Production cutover
|
|
|
|
Maintenance window ~30 min.
|
|
|
|
1. Communicate the window.
|
|
2. Re-snapshot Qdrant immediately before the deploy (so the rollback snapshot
|
|
is the freshest possible).
|
|
3. `git pull` the migration branch (or merge to main first).
|
|
4. `docker compose build --no-cache backend && docker compose up -d backend`.
|
|
5. Run the Phase 4 collection rebuild on prod.
|
|
6. Smoke test: `/health`, one `/chat` round-trip, one `/memories` write, one
|
|
`/memories/search` read.
|
|
7. Verify per-user counts match `pre-cutover-counts.txt` (use the same loop).
|
|
|
|
## Rollback
|
|
|
|
### Before the first v2 write hits prod (fully safe)
|
|
|
|
```bash
|
|
git revert <migration-commit-sha>
|
|
docker compose build --no-cache backend
|
|
docker compose up -d backend
|
|
```
|
|
|
|
### After cutover but snapshot still on disk (loses post-cutover writes)
|
|
|
|
```bash
|
|
# Stop the backend so no more writes land on the v2 collection
|
|
docker compose stop backend
|
|
|
|
# Restore the pre-cutover Qdrant snapshot to a fresh name, then swap
|
|
docker compose exec qdrant curl -X POST \
|
|
"http://qdrant:6333/collections/mem0_rollback/snapshots/upload?priority=snapshot" \
|
|
-H "Content-Type: multipart/form-data" \
|
|
-F "snapshot=@/qdrant/snapshots/<pre-cutover-snapshot>"
|
|
# Update QDRANT_COLLECTION_NAME=mem0_rollback or rename via snapshot+upload.
|
|
|
|
# Restore Neo4j if needed
|
|
docker run --rm \
|
|
--volumes-from mem0-neo4j \
|
|
-v "$(pwd)/backups/neo4j:/dumps" \
|
|
neo4j:5.26.4 \
|
|
neo4j-admin database load neo4j --from-path=/dumps --overwrite-destination=true
|
|
|
|
# Revert code and restart
|
|
git revert <migration-commit-sha>
|
|
docker compose build --no-cache backend
|
|
docker compose up -d backend
|
|
```
|
|
|
|
### After snapshot retention expires
|
|
|
|
**Irreversible.** Keep the pre-cutover snapshot and Neo4j dump for ≥30 days.
|