Commit graph

11 commits

Author SHA1 Message Date
ea07a82bd7 added v1 endpoints 2026-05-25 17:45:17 +00:00
Pratik Narola
e5a4d1c7c2 feat: rewrite /v1/chat/completions as a real OpenAI-compat proxy
Part 1: strip leading newlines from chat_with_memory's LLM response
(minimax-m2 reasoning output leaks blank lines; .lstrip() at the source
covers /chat, /v1/chat/completions, and the MCP chat tool).

Part 2: replace the /v1/chat/completions handler with an httpx-based
pass-through proxy that preserves every upstream field (tool_calls,
reasoning_tokens, system_fingerprint, finish_reason, etc.) and supports
end-to-end MCP-style tool calling.

What changed:
- models.py: OpenAIChatCompletionRequest is now permissive — typed for
  the common fields (tools, tool_choice, parallel_tool_calls,
  response_format, max_completion_tokens, seed, stream_options,
  reasoning_effort, modalities, etc.) and extra='allow' for forward-
  compat. The typed response models (OpenAIChatCompletionResponse and
  friends) are deleted — the handler returns upstream's JSON dict
  directly so unknown fields aren't silently dropped.
- mem0_manager.py: adds httpx.AsyncClient + an openai_proxy_completion()
  method that injects a "Relevant memories" system message only when
  the last role is 'user' AND no tool flow is in progress, then forwards
  to the upstream LLM. Non-stream returns upstream JSON; stream returns
  an async iterator that yields raw upstream SSE bytes verbatim while
  side-channel-parsing for the post-stream mem0.add. Codifies the
  Memori #434 lessons: never mutates existing messages (only prepends
  system), never touches tool_call_id, runs post-add even on mid-stream
  error via try/finally.
- main.py: handler is now ~50 lines — model_dump(exclude_unset) the
  request, hand off to openai_proxy_completion, return dict OR wrap in
  StreamingResponse. response_model=None so FastAPI doesn't validate.
  Deleted stream_openai_response (post-hoc word-chunking is gone).
  Lifespan shutdown closes mem0_manager.async_http.

Research confirmed mem0 itself does not ship an HTTP /v1/chat/completions
(only the in-process mem0.proxy.main.Mem0 SDK pattern), so we replicate
the pattern without adding a litellm dependency. SSE/tool_calls patterns
are modeled after microsoft/agent-lightning's llm_proxy.

Verified locally: ast.parse OK on all three files. End-to-end smoke tests
will run on beast.
2026-05-23 21:08:31 +05:30
Pratik Narola
0f0addb36b chore: migrate to mem0ai v2.0.2 (V3 memory pipeline)
Pin mem0ai[nlp]==2.0.2 and fastembed for the new hybrid-search pipeline.
Drop OSS graph memory (removed upstream in 2.0.0, PR #4805): remove Neo4j
service, env vars, volumes, and driver deps; mark /graph/relationships
deprecated. Rewrite Memory.search/get_all/chat/health call sites to use
the v2 filters={} + top_k API (entity IDs at top level now raise
ValueError). Tighten MCP remove_memory ownership check to O(1)
verify_memory_ownership so it doesn't silently truncate at the new
top_k=20 default. Downgrade base image to python:3.12-slim for spaCy.

Adds scripts/migrate_qdrant_to_v3.py (scroll+upsert with per-user count
parity check) and docs/MIGRATION_RUNBOOK.md covering snapshot, dump,
collection rebuild, cutover, and rollback procedures.
2026-05-23 14:49:45 +05:30
2c1d73a1ec add OpenAI-compatible endpoint and improved login UI
- Add /v1/chat/completions and /chat/completions endpoints (OpenAI SDK compatible)
- Add streaming support with SSE for chat completions
- Add get_current_user_openai auth supporting Bearer token and X-API-Key
- Add OpenAI-compatible request/response models (OpenAIChatCompletionRequest, etc.)
- Cherry-pick improved login UI from cloud branch (styled login screen, logout button)
2026-01-15 23:29:08 +05:30
a228780146 production improvements: configurable embeddings, v1.1, O(1) ownership, retries
- Make Ollama URL configurable via OLLAMA_BASE_URL env var
- Add version: v1.1 to Mem0 config (required for latest features)
- Make embedding model and dimensions configurable
- Fix ownership check: O(1) lookup instead of fetching 10k records
- Add tenacity retry logic for database operations
2026-01-15 23:01:18 +05:30
50edce2d3c security hardening: add auth, rate limiting, fix info disclosure
- Add auth to /models and /users endpoints
- Add rate limiting to all endpoints (10-120/min based on operation type)
- Fix 11 info disclosure issues (detail=str(e) -> generic message)
- Fix 2 silent except blocks with proper logging
- Fix 7 raise e -> raise for proper exception chaining
- Fix health check to not expose exception details
- Update tests with X-API-Key headers and security tests
2026-01-15 22:41:24 +05:30
35c1bbec4e added MCP HTTP endpoint with auth
Exposes memory operations as MCP tools over /mcp endpoint:
- add_memory, search_memory, remove_memory, chat
- API key auth via x-api-key or Authorization header
- User isolation enforced via contextvars
2026-01-11 14:00:16 +05:30
997865283f added auth 2025-10-23 22:22:07 +05:30
Pratik Narola
28a8953ac5 Stable checkpoint with chat interface 2025-09-04 17:07:49 +05:30
Pratik Narola
f929165a89 updated versions fixed issues 2025-09-02 23:40:10 +05:30
Pratik Narola
7689409950 Initial commit: Production-ready Mem0 interface with monitoring
- Complete Mem0 OSS integration with hybrid datastore
- PostgreSQL + pgvector for vector storage
- Neo4j 5.18 for graph relationships
- Google Gemini embeddings integration
- Comprehensive monitoring with correlation IDs
- Real-time statistics and performance tracking
- Production-grade observability features
- Clean repository with no exposed secrets
2025-08-10 17:34:41 +05:30