The Qdrant snapshots directory is /qdrant/snapshots/{collection}/, not
/qdrant/storage/collections/{collection}/snapshots/ as the script assumed.
Verified against running mem0-qdrant container on beast.
Three S-effort wins from the post-migration audit:
#1 Enable Cohere reranker on both Memory.search call sites
(rerank=True), over-fetch top_k=max(limit*3, 30) to give the
reranker a 30-50 candidate pool, then truncate to the caller's
limit. Bump reranker config to rerank-v3.5 (4096 ctx, multilingual
— matters for Hindi/Hinglish traffic) and top_n 10 → 50 so the
output cap doesn't truncate below typical over-fetch sizes. Cohere
was configured but never invoked; this is the single biggest
quality lift the audit surfaced.
#2 Add scripts/backup_qdrant.sh and scripts/restore_test.sh. Daily
snapshot of both collections back-to-back, docker cp to local
YYYY-MM-DD dir, optional rclone off-host, prune local >14d, emit
Prometheus textfile metric. Weekly restore_test.sh restores into a
transient collection and asserts point count parity. Closes the
zero-automated-backup gap.
#3 Add CUSTOM_FACT_EXTRACTION_INSTRUCTIONS, wired via MemoryConfig's
custom_instructions field. mem0 appends this as its own
'## Custom Instructions' section in the additive-extraction user
prompt (verified against generate_additive_extraction_prompt) —
does not replace mem0's role/format guidance. Re-prioritizes the
default consumer-organizer few-shots toward work/projects/
relationships/recurring context, the actual usage pattern here.
Two helpers built during the beast deployment to migrate the legacy
Neo4j knowledge graph (decommissioned in the v3 cutover) into mem0 v2
as natural-language memories.
scripts/import_neo4j_to_mem0.py
- Connects to Neo4j via Bolt, iterates per-user relationships,
POSTs each as a /memories request.
- Two modes:
raw: "humanize(src) verb humanize(dest)." (snake_case → spaces)
--llm-rewrite: minimax-m2 via OpenAI-compat proxy rewrites each
tuple into a grammatical English sentence; the LLM
may also output SKIP for non-meaningful tuples
(postal codes, timezone offsets, self-refs).
- Tags every imported memory with metadata.source="neo4j_legacy_import"
plus neo4j_rel_type + import_timestamp for traceability/cleanup.
- Caches LLM rewrites by (source, rel, dest, user_id).
scripts/cleanup_neo4j_imports.py
- Finds and DELETEs all memories with source="neo4j_legacy_import"
for given users, via the /memories DELETE endpoint (per-user API
key, so the deletes go through mem0's normal auth + cleanup path).
Run on beast (2026-05-23): 2007 Neo4j edges → 615 net new memories in
mem0_v3 (30.6% yield after LLM SKIPs + mem0 fact-extraction dedup).
mem0 v3's fact extractor correctly deduplicated edges that restated
facts already in vector memory (e.g., manju's 9 existing memories
absorbed all 17 of her Neo4j edges).
Backend startup needs ~30-60s (spaCy NLP models load, mem0 v2 init,
MCP session manager, 4 workers). The Dockerfile's 5s start-period was
too short, causing willfarrell/autoheal (running on the host with
AUTOHEAL_CONTAINER_LABEL=all) to kill the container before it finished
booting. Overriding the healthcheck in compose with a longer start_period
keeps failures from counting until the app is actually ready.
Existing prod storage was written by a newer Qdrant (qdrant/qdrant:latest
resolved to 1.18.1 on 2026-05-23). Pinning to 1.12.4 caused a shard-holder
deserialization panic on startup because Qdrant's storage format is
forward-compatible (newer reads older) but not backward.
The custom OpenAI-compatible endpoint (LiteLLM) serves the same
qwen3-embedding model and is reachable from the container in all
deployments; direct Ollama may not be. Vectors stay compatible because
the underlying model is the same.
Captured from a beast production hotfix.
Pin mem0ai[nlp]==2.0.2 and fastembed for the new hybrid-search pipeline.
Drop OSS graph memory (removed upstream in 2.0.0, PR #4805): remove Neo4j
service, env vars, volumes, and driver deps; mark /graph/relationships
deprecated. Rewrite Memory.search/get_all/chat/health call sites to use
the v2 filters={} + top_k API (entity IDs at top level now raise
ValueError). Tighten MCP remove_memory ownership check to O(1)
verify_memory_ownership so it doesn't silently truncate at the new
top_k=20 default. Downgrade base image to python:3.12-slim for spaCy.
Adds scripts/migrate_qdrant_to_v3.py (scroll+upsert with per-user count
parity check) and docs/MIGRATION_RUNBOOK.md covering snapshot, dump,
collection rebuild, cutover, and rollback procedures.
- Make Ollama URL configurable via OLLAMA_BASE_URL env var
- Add version: v1.1 to Mem0 config (required for latest features)
- Make embedding model and dimensions configurable
- Fix ownership check: O(1) lookup instead of fetching 10k records
- Add tenacity retry logic for database operations
- Add auth to /models and /users endpoints
- Add rate limiting to all endpoints (10-120/min based on operation type)
- Fix 11 info disclosure issues (detail=str(e) -> generic message)
- Fix 2 silent except blocks with proper logging
- Fix 7 raise e -> raise for proper exception chaining
- Fix health check to not expose exception details
- Update tests with X-API-Key headers and security tests
Exposes memory operations as MCP tools over /mcp endpoint:
- add_memory, search_memory, remove_memory, chat
- API key auth via x-api-key or Authorization header
- User isolation enforced via contextvars
Security Enhancement:
- Remove external port exposure for PostgreSQL and Neo4j databases
- Replace 'ports' with 'expose' for internal-only database access
- Maintain full internal connectivity while eliminating external attack vectors
- Follow container security best practices
Benchmarking Framework:
- Add agent1.md: Professional Manager persona testing protocol
- Add agent2.md: Creative Researcher persona testing protocol
- Add benchmark1.md: Baseline test results and analysis
Benchmark Results Summary:
- Core engine quality: 4.9/5 average across both agent personas
- Memory intelligence: Exceptional context retention and relationship inference
- Automatic relationship generation: 50+ meaningful connections from minimal inputs
- Multi-project context management: Seamless switching with persistent context
- Cross-domain synthesis: AI-native capabilities for knowledge work enhancement
Key Findings:
- Core memory technology provides strong competitive moat
- Memory-enhanced conversations unique in market
- Ready for frontend wrapper development
- Establishes quality baseline for future model comparisons
Future Use: Framework enables systematic comparison across different
LLM endpoints, models, and configurations using identical test protocols.