Commit graph

33 commits

Author SHA1 Message Date
Pratik Narola
ed11a00ab3 feat: mem0 platform SDK (MemoryClient) compatibility + proxy-header redirect fix
Implements the subset of the hosted mem0 platform API that mem0ai==2.0.2
MemoryClient calls, so MemoryClient(host=..., api_key=...) works against this
server. Verified end-to-end (construct/add/search/get_all/get/history/update/delete).

- platform_compat.py: GET /v1/ping/ (returns non-empty org_id/project_id, which
  the SDK's Project init requires), POST /v3/memories/{add,search}/,
  POST /v3/memories/ (paginated get_all), /v1/memories/{id}/ item ops, and
  GET /v1/entities/ -- all mapped onto the existing mem0_manager.
- auth.get_current_user_platform: accepts Authorization: Token (mem0 SDK),
  Bearer, or X-API-Key.
- main.py: include the platform router; remove the /v1/memories* aliases added
  in ea07a82 (the SDK uses /v3 and trailing-slash /v1/memories/{id}/, not those
  paths); keep /v1/chat/completions and the native /memories* routes.
- docker-compose: run uvicorn with --proxy-headers --forwarded-allow-ips=* so the
  proxy's https scheme is honoured. This stops trailing-slash 307 redirects from
  downgrading https->http and dropping the Authorization header -- the actual
  cause of the reported "POST auth broken" symptom (auth was never broken).
- test_sdk_compat.py: end-to-end MemoryClient round-trip against the server.
2026-05-26 00:09:22 +05:30
ea07a82bd7 added v1 endpoints 2026-05-25 17:45:17 +00:00
Pratik Narola
e5a4d1c7c2 feat: rewrite /v1/chat/completions as a real OpenAI-compat proxy
Part 1: strip leading newlines from chat_with_memory's LLM response
(minimax-m2 reasoning output leaks blank lines; .lstrip() at the source
covers /chat, /v1/chat/completions, and the MCP chat tool).

Part 2: replace the /v1/chat/completions handler with an httpx-based
pass-through proxy that preserves every upstream field (tool_calls,
reasoning_tokens, system_fingerprint, finish_reason, etc.) and supports
end-to-end MCP-style tool calling.

What changed:
- models.py: OpenAIChatCompletionRequest is now permissive — typed for
  the common fields (tools, tool_choice, parallel_tool_calls,
  response_format, max_completion_tokens, seed, stream_options,
  reasoning_effort, modalities, etc.) and extra='allow' for forward-
  compat. The typed response models (OpenAIChatCompletionResponse and
  friends) are deleted — the handler returns upstream's JSON dict
  directly so unknown fields aren't silently dropped.
- mem0_manager.py: adds httpx.AsyncClient + an openai_proxy_completion()
  method that injects a "Relevant memories" system message only when
  the last role is 'user' AND no tool flow is in progress, then forwards
  to the upstream LLM. Non-stream returns upstream JSON; stream returns
  an async iterator that yields raw upstream SSE bytes verbatim while
  side-channel-parsing for the post-stream mem0.add. Codifies the
  Memori #434 lessons: never mutates existing messages (only prepends
  system), never touches tool_call_id, runs post-add even on mid-stream
  error via try/finally.
- main.py: handler is now ~50 lines — model_dump(exclude_unset) the
  request, hand off to openai_proxy_completion, return dict OR wrap in
  StreamingResponse. response_model=None so FastAPI doesn't validate.
  Deleted stream_openai_response (post-hoc word-chunking is gone).
  Lifespan shutdown closes mem0_manager.async_http.

Research confirmed mem0 itself does not ship an HTTP /v1/chat/completions
(only the in-process mem0.proxy.main.Mem0 SDK pattern), so we replicate
the pattern without adding a litellm dependency. SSE/tool_calls patterns
are modeled after microsoft/agent-lightning's llm_proxy.

Verified locally: ast.parse OK on all three files. End-to-end smoke tests
will run on beast.
2026-05-23 21:08:31 +05:30
Pratik Narola
e99b382b16 fix: restore_test.sh uses upload endpoint and tightens source-file glob
file:// snapshot recovery is disabled by default in recent Qdrant
(returns 403 on /collections/.../snapshots/recover with a file:// URL).
Switched to POST /snapshots/upload with multipart form-data which
doesn't need an allowlist.

Also tightened the find -name glob from "${SOURCE_COLLECTION}_*" to
"${SOURCE_COLLECTION}_[0-9]*" so a source named "mem0_v3" does not
accidentally match "mem0_v3_entities_*" files in the same dir.
2026-05-23 19:57:41 +05:30
Pratik Narola
fd109ef892 fix: backup_qdrant.sh uses Qdrant's snapshot path (/qdrant/snapshots/)
The Qdrant snapshots directory is /qdrant/snapshots/{collection}/, not
/qdrant/storage/collections/{collection}/snapshots/ as the script assumed.
Verified against running mem0-qdrant container on beast.
2026-05-23 19:56:38 +05:30
Pratik Narola
06875473b2 feat: enable reranker, automate backups, tune extraction prompt
Three S-effort wins from the post-migration audit:

#1 Enable Cohere reranker on both Memory.search call sites
   (rerank=True), over-fetch top_k=max(limit*3, 30) to give the
   reranker a 30-50 candidate pool, then truncate to the caller's
   limit. Bump reranker config to rerank-v3.5 (4096 ctx, multilingual
   — matters for Hindi/Hinglish traffic) and top_n 10 → 50 so the
   output cap doesn't truncate below typical over-fetch sizes. Cohere
   was configured but never invoked; this is the single biggest
   quality lift the audit surfaced.

#2 Add scripts/backup_qdrant.sh and scripts/restore_test.sh. Daily
   snapshot of both collections back-to-back, docker cp to local
   YYYY-MM-DD dir, optional rclone off-host, prune local >14d, emit
   Prometheus textfile metric. Weekly restore_test.sh restores into a
   transient collection and asserts point count parity. Closes the
   zero-automated-backup gap.

#3 Add CUSTOM_FACT_EXTRACTION_INSTRUCTIONS, wired via MemoryConfig's
   custom_instructions field. mem0 appends this as its own
   '## Custom Instructions' section in the additive-extraction user
   prompt (verified against generate_additive_extraction_prompt) —
   does not replace mem0's role/format guidance. Re-prioritizes the
   default consumer-organizer few-shots toward work/projects/
   relationships/recurring context, the actual usage pattern here.
2026-05-23 19:53:59 +05:30
Pratik Narola
3a10b72051 scripts: Neo4j → mem0 v2 graph relationship import
Two helpers built during the beast deployment to migrate the legacy
Neo4j knowledge graph (decommissioned in the v3 cutover) into mem0 v2
as natural-language memories.

scripts/import_neo4j_to_mem0.py
  - Connects to Neo4j via Bolt, iterates per-user relationships,
    POSTs each as a /memories request.
  - Two modes:
      raw:           "humanize(src) verb humanize(dest)." (snake_case → spaces)
      --llm-rewrite: minimax-m2 via OpenAI-compat proxy rewrites each
                     tuple into a grammatical English sentence; the LLM
                     may also output SKIP for non-meaningful tuples
                     (postal codes, timezone offsets, self-refs).
  - Tags every imported memory with metadata.source="neo4j_legacy_import"
    plus neo4j_rel_type + import_timestamp for traceability/cleanup.
  - Caches LLM rewrites by (source, rel, dest, user_id).

scripts/cleanup_neo4j_imports.py
  - Finds and DELETEs all memories with source="neo4j_legacy_import"
    for given users, via the /memories DELETE endpoint (per-user API
    key, so the deletes go through mem0's normal auth + cleanup path).

Run on beast (2026-05-23): 2007 Neo4j edges → 615 net new memories in
mem0_v3 (30.6% yield after LLM SKIPs + mem0 fact-extraction dedup).
mem0 v3's fact extractor correctly deduplicated edges that restated
facts already in vector memory (e.g., manju's 9 existing memories
absorbed all 17 of her Neo4j edges).
2026-05-23 18:27:00 +05:30
Pratik Narola
3385b8397b fix: give backend healthcheck a 90s start_period
Backend startup needs ~30-60s (spaCy NLP models load, mem0 v2 init,
MCP session manager, 4 workers). The Dockerfile's 5s start-period was
too short, causing willfarrell/autoheal (running on the host with
AUTOHEAL_CONTAINER_LABEL=all) to kill the container before it finished
booting. Overriding the healthcheck in compose with a longer start_period
keeps failures from counting until the app is actually ready.
2026-05-23 15:14:04 +05:30
Pratik Narola
14c47238bd fix: bump Qdrant pin to v1.18.1 (was v1.12.4 — downgrade panic)
Existing prod storage was written by a newer Qdrant (qdrant/qdrant:latest
resolved to 1.18.1 on 2026-05-23). Pinning to 1.12.4 caused a shard-holder
deserialization panic on startup because Qdrant's storage format is
forward-compatible (newer reads older) but not backward.
2026-05-23 15:09:28 +05:30
Pratik Narola
7cc8fc5112 deploy: route embedder through OpenAI-compat proxy instead of Ollama
The custom OpenAI-compatible endpoint (LiteLLM) serves the same
qwen3-embedding model and is reachable from the container in all
deployments; direct Ollama may not be. Vectors stay compatible because
the underlying model is the same.

Captured from a beast production hotfix.
2026-05-23 15:03:02 +05:30
Pratik Narola
0f0addb36b chore: migrate to mem0ai v2.0.2 (V3 memory pipeline)
Pin mem0ai[nlp]==2.0.2 and fastembed for the new hybrid-search pipeline.
Drop OSS graph memory (removed upstream in 2.0.0, PR #4805): remove Neo4j
service, env vars, volumes, and driver deps; mark /graph/relationships
deprecated. Rewrite Memory.search/get_all/chat/health call sites to use
the v2 filters={} + top_k API (entity IDs at top level now raise
ValueError). Tighten MCP remove_memory ownership check to O(1)
verify_memory_ownership so it doesn't silently truncate at the new
top_k=20 default. Downgrade base image to python:3.12-slim for spaCy.

Adds scripts/migrate_qdrant_to_v3.py (scroll+upsert with per-user count
parity check) and docs/MIGRATION_RUNBOOK.md covering snapshot, dump,
collection rebuild, cutover, and rollback procedures.
2026-05-23 14:49:45 +05:30
82accabc73 fix: properly log settings as structlog kwargs instead of dropping them 2026-01-16 00:40:53 +05:30
6f9b545c15 fix: remove invalid positional arg from structlog call 2026-01-16 00:38:41 +05:30
5bcecf4649 fix: use structlog instead of logging in mem0_manager for kwargs support 2026-01-16 00:34:51 +05:30
638a591dc5 add setup script with volume reset option, fix default embedding dims 2026-01-16 00:29:29 +05:30
a190527076 fix: use npm_network for NPM proxy, expose instead of ports 2026-01-16 00:01:41 +05:30
9e86c30548 fix: pass OLLAMA_BASE_URL, EMBEDDING_MODEL, EMBEDDING_DIMS to container 2026-01-15 23:55:44 +05:30
2c1d73a1ec add OpenAI-compatible endpoint and improved login UI
- Add /v1/chat/completions and /chat/completions endpoints (OpenAI SDK compatible)
- Add streaming support with SSE for chat completions
- Add get_current_user_openai auth supporting Bearer token and X-API-Key
- Add OpenAI-compatible request/response models (OpenAIChatCompletionRequest, etc.)
- Cherry-pick improved login UI from cloud branch (styled login screen, logout button)
2026-01-15 23:29:08 +05:30
a228780146 production improvements: configurable embeddings, v1.1, O(1) ownership, retries
- Make Ollama URL configurable via OLLAMA_BASE_URL env var
- Add version: v1.1 to Mem0 config (required for latest features)
- Make embedding model and dimensions configurable
- Fix ownership check: O(1) lookup instead of fetching 10k records
- Add tenacity retry logic for database operations
2026-01-15 23:01:18 +05:30
50edce2d3c security hardening: add auth, rate limiting, fix info disclosure
- Add auth to /models and /users endpoints
- Add rate limiting to all endpoints (10-120/min based on operation type)
- Fix 11 info disclosure issues (detail=str(e) -> generic message)
- Fix 2 silent except blocks with proper logging
- Fix 7 raise e -> raise for proper exception chaining
- Fix health check to not expose exception details
- Update tests with X-API-Key headers and security tests
2026-01-15 22:41:24 +05:30
35c1bbec4e added MCP HTTP endpoint with auth
Exposes memory operations as MCP tools over /mcp endpoint:
- add_memory, search_memory, remove_memory, chat
- API key auth via x-api-key or Authorization header
- User isolation enforced via contextvars
2026-01-11 14:00:16 +05:30
997865283f added auth 2025-10-23 22:22:07 +05:30
Pratik Narola
e7b839810b added reranker 2025-10-21 14:25:51 +05:30
Pratik Narola
fb332c5346 Frontend improvement 2025-09-09 14:38:14 +05:30
Pratik Narola
971548321f references updated for qdrant 2025-09-09 13:05:16 +05:30
Pratik Narola
f625f8f556 Migrated to Qdrant 2025-09-09 12:54:46 +05:30
Pratik Narola
28a8953ac5 Stable checkpoint with chat interface 2025-09-04 17:07:49 +05:30
Pratik Narola
f929165a89 updated versions fixed issues 2025-09-02 23:40:10 +05:30
Pratik Narola
a8a676f860 clean up and fixing 2025-09-02 23:02:59 +05:30
Pratik Narola
aa7742c5ad Added support for run id agent id 2025-09-02 17:28:41 +00:00
Pratik Narola
c864fe8895 small chang 2025-08-12 16:25:43 +05:30
Pratik Narola
cac9674f0e feat: Add comprehensive benchmarking framework and secure database ports
Security Enhancement:
- Remove external port exposure for PostgreSQL and Neo4j databases
- Replace 'ports' with 'expose' for internal-only database access
- Maintain full internal connectivity while eliminating external attack vectors
- Follow container security best practices

Benchmarking Framework:
- Add agent1.md: Professional Manager persona testing protocol
- Add agent2.md: Creative Researcher persona testing protocol
- Add benchmark1.md: Baseline test results and analysis

Benchmark Results Summary:
- Core engine quality: 4.9/5 average across both agent personas
- Memory intelligence: Exceptional context retention and relationship inference
- Automatic relationship generation: 50+ meaningful connections from minimal inputs
- Multi-project context management: Seamless switching with persistent context
- Cross-domain synthesis: AI-native capabilities for knowledge work enhancement

Key Findings:
- Core memory technology provides strong competitive moat
- Memory-enhanced conversations unique in market
- Ready for frontend wrapper development
- Establishes quality baseline for future model comparisons

Future Use: Framework enables systematic comparison across different
LLM endpoints, models, and configurations using identical test protocols.
2025-08-10 23:45:27 +05:30
Pratik Narola
7689409950 Initial commit: Production-ready Mem0 interface with monitoring
- Complete Mem0 OSS integration with hybrid datastore
- PostgreSQL + pgvector for vector storage
- Neo4j 5.18 for graph relationships
- Google Gemini embeddings integration
- Comprehensive monitoring with correlation IDs
- Real-time statistics and performance tracking
- Production-grade observability features
- Clean repository with no exposed secrets
2025-08-10 17:34:41 +05:30