Implements the subset of the hosted mem0 platform API that mem0ai==2.0.2
MemoryClient calls, so MemoryClient(host=..., api_key=...) works against this
server. Verified end-to-end (construct/add/search/get_all/get/history/update/delete).
- platform_compat.py: GET /v1/ping/ (returns non-empty org_id/project_id, which
the SDK's Project init requires), POST /v3/memories/{add,search}/,
POST /v3/memories/ (paginated get_all), /v1/memories/{id}/ item ops, and
GET /v1/entities/ -- all mapped onto the existing mem0_manager.
- auth.get_current_user_platform: accepts Authorization: Token (mem0 SDK),
Bearer, or X-API-Key.
- main.py: include the platform router; remove the /v1/memories* aliases added
in ea07a82 (the SDK uses /v3 and trailing-slash /v1/memories/{id}/, not those
paths); keep /v1/chat/completions and the native /memories* routes.
- docker-compose: run uvicorn with --proxy-headers --forwarded-allow-ips=* so the
proxy's https scheme is honoured. This stops trailing-slash 307 redirects from
downgrading https->http and dropping the Authorization header -- the actual
cause of the reported "POST auth broken" symptom (auth was never broken).
- test_sdk_compat.py: end-to-end MemoryClient round-trip against the server.
Part 1: strip leading newlines from chat_with_memory's LLM response
(minimax-m2 reasoning output leaks blank lines; .lstrip() at the source
covers /chat, /v1/chat/completions, and the MCP chat tool).
Part 2: replace the /v1/chat/completions handler with an httpx-based
pass-through proxy that preserves every upstream field (tool_calls,
reasoning_tokens, system_fingerprint, finish_reason, etc.) and supports
end-to-end MCP-style tool calling.
What changed:
- models.py: OpenAIChatCompletionRequest is now permissive — typed for
the common fields (tools, tool_choice, parallel_tool_calls,
response_format, max_completion_tokens, seed, stream_options,
reasoning_effort, modalities, etc.) and extra='allow' for forward-
compat. The typed response models (OpenAIChatCompletionResponse and
friends) are deleted — the handler returns upstream's JSON dict
directly so unknown fields aren't silently dropped.
- mem0_manager.py: adds httpx.AsyncClient + an openai_proxy_completion()
method that injects a "Relevant memories" system message only when
the last role is 'user' AND no tool flow is in progress, then forwards
to the upstream LLM. Non-stream returns upstream JSON; stream returns
an async iterator that yields raw upstream SSE bytes verbatim while
side-channel-parsing for the post-stream mem0.add. Codifies the
Memori #434 lessons: never mutates existing messages (only prepends
system), never touches tool_call_id, runs post-add even on mid-stream
error via try/finally.
- main.py: handler is now ~50 lines — model_dump(exclude_unset) the
request, hand off to openai_proxy_completion, return dict OR wrap in
StreamingResponse. response_model=None so FastAPI doesn't validate.
Deleted stream_openai_response (post-hoc word-chunking is gone).
Lifespan shutdown closes mem0_manager.async_http.
Research confirmed mem0 itself does not ship an HTTP /v1/chat/completions
(only the in-process mem0.proxy.main.Mem0 SDK pattern), so we replicate
the pattern without adding a litellm dependency. SSE/tool_calls patterns
are modeled after microsoft/agent-lightning's llm_proxy.
Verified locally: ast.parse OK on all three files. End-to-end smoke tests
will run on beast.
Three S-effort wins from the post-migration audit:
#1 Enable Cohere reranker on both Memory.search call sites
(rerank=True), over-fetch top_k=max(limit*3, 30) to give the
reranker a 30-50 candidate pool, then truncate to the caller's
limit. Bump reranker config to rerank-v3.5 (4096 ctx, multilingual
— matters for Hindi/Hinglish traffic) and top_n 10 → 50 so the
output cap doesn't truncate below typical over-fetch sizes. Cohere
was configured but never invoked; this is the single biggest
quality lift the audit surfaced.
#2 Add scripts/backup_qdrant.sh and scripts/restore_test.sh. Daily
snapshot of both collections back-to-back, docker cp to local
YYYY-MM-DD dir, optional rclone off-host, prune local >14d, emit
Prometheus textfile metric. Weekly restore_test.sh restores into a
transient collection and asserts point count parity. Closes the
zero-automated-backup gap.
#3 Add CUSTOM_FACT_EXTRACTION_INSTRUCTIONS, wired via MemoryConfig's
custom_instructions field. mem0 appends this as its own
'## Custom Instructions' section in the additive-extraction user
prompt (verified against generate_additive_extraction_prompt) —
does not replace mem0's role/format guidance. Re-prioritizes the
default consumer-organizer few-shots toward work/projects/
relationships/recurring context, the actual usage pattern here.
The custom OpenAI-compatible endpoint (LiteLLM) serves the same
qwen3-embedding model and is reachable from the container in all
deployments; direct Ollama may not be. Vectors stay compatible because
the underlying model is the same.
Captured from a beast production hotfix.
Pin mem0ai[nlp]==2.0.2 and fastembed for the new hybrid-search pipeline.
Drop OSS graph memory (removed upstream in 2.0.0, PR #4805): remove Neo4j
service, env vars, volumes, and driver deps; mark /graph/relationships
deprecated. Rewrite Memory.search/get_all/chat/health call sites to use
the v2 filters={} + top_k API (entity IDs at top level now raise
ValueError). Tighten MCP remove_memory ownership check to O(1)
verify_memory_ownership so it doesn't silently truncate at the new
top_k=20 default. Downgrade base image to python:3.12-slim for spaCy.
Adds scripts/migrate_qdrant_to_v3.py (scroll+upsert with per-user count
parity check) and docs/MIGRATION_RUNBOOK.md covering snapshot, dump,
collection rebuild, cutover, and rollback procedures.
- Make Ollama URL configurable via OLLAMA_BASE_URL env var
- Add version: v1.1 to Mem0 config (required for latest features)
- Make embedding model and dimensions configurable
- Fix ownership check: O(1) lookup instead of fetching 10k records
- Add tenacity retry logic for database operations
- Add auth to /models and /users endpoints
- Add rate limiting to all endpoints (10-120/min based on operation type)
- Fix 11 info disclosure issues (detail=str(e) -> generic message)
- Fix 2 silent except blocks with proper logging
- Fix 7 raise e -> raise for proper exception chaining
- Fix health check to not expose exception details
- Update tests with X-API-Key headers and security tests
Exposes memory operations as MCP tools over /mcp endpoint:
- add_memory, search_memory, remove_memory, chat
- API key auth via x-api-key or Authorization header
- User isolation enforced via contextvars