From 5c6a98308765e6954219173a16d72462fd1ea7aa Mon Sep 17 00:00:00 2001
From: Nikhil Mundra <nikhil.mundra@nutanix.com>
Date: Thu, 12 Jun 2025 15:41:17 +0530
Subject: [PATCH] add concept of code_knowledge and code_index

---
 latest/Global.md              | 16 ++++++++++++++++
 memory_bank_best_practices.md | 17 +++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/latest/Global.md b/latest/Global.md
index a5f1e96..1175190 100644
--- a/latest/Global.md
+++ b/latest/Global.md
@@ -92,6 +92,17 @@ This section is a comprehensive reference for each file in my Memory Bank, detai
   - **Update Frequency**: Continuously during a task.
   - **Update Triggers**: At the start, during, and end of every task.
 
+- **`code_index.json` - The "Code Skeleton" (Automated)**:
+  - **Purpose**: An automatically generated, disposable index containing only a list of file names and the function names within them. It provides a fresh, accurate map of what exists and where.
+  - **Update Frequency**: On-demand or periodically.
+  - **CRITICAL RULE**: This file **MUST NOT** be edited manually. It is a cache to be overwritten.
+
+- **`code_knowledge.json` - The "Code Flesh" (AI-Managed)**:
+  - **Purpose**: A persistent knowledge base of granular details and subtleties for specific code elements. It is a key-value store where the key is a stable identifier (e.g., `filePath::functionName`) that is directly mapped from an entry in `code_index.json`.
+  - **Update Frequency**: Constantly, as new insights are discovered.
+  - **CRITICAL RULE**: To find knowledge about a function, first locate it in `code_index.json` to get its structure, then use its stable identifier as a key to look up the corresponding deep knowledge in this file.
+
+
 *I am free to create any more files if I feel like. Each specialized mode is free to create any number of files for memory bank.*
 
 ---
@@ -103,6 +114,11 @@ This section provides practical guidelines for applying my core doctrine and pro
 ### 4.1. Practical Workflow Blueprints
 - **Debugging (Audit Trail Approach)**: A systematic investigation process: Observe -> Hypothesize -> Execute & Document -> Iterate -> Synthesize.
 - **Refactoring (Safety-First Approach)**: A process to de-risk changes: Define Scope -> Gather Info -> Plan -> Execute & Verify -> Synthesize.
+- **Granular Code Analysis (Symbex Model)**: The standard method for linking conceptual knowledge to specific code.
+    1.  **Consult the Skeleton**: Use `code_index.json` to get an up-to-date map of the code structure and find the stable identifier for a target function or class.
+    2.  **Consult the Flesh**: Use the stable identifier to look up any existing granular knowledge, subtleties, or past observations in `code_knowledge.json`.
+    3.  **Synthesize and Act**: Combine the structural awareness from the index with the deep knowledge from the knowledge base to inform your action.
+    4.  **Update the Flesh**: If a new, valuable, needle-point insight is discovered, add it to the `code_knowledge.json` file under the appropriate stable identifier.
 
 ### 4.2. Task Management Guidelines
 - **Creating a Task**: Update `currentTask.md` with objectives, a detailed plan, and an "Impact Analysis" for refactors.
diff --git a/memory_bank_best_practices.md b/memory_bank_best_practices.md
index 907e75e..7b32af2 100644
--- a/memory_bank_best_practices.md
+++ b/memory_bank_best_practices.md
@@ -51,6 +51,23 @@ A proven architecture for structuring this knowledge consists of the following c
 
 This structured approach ensures that when the AI needs to perform a task, it can consult a specific, relevant document rather than parsing a massive, undifferentiated blob of text, leading to more accurate and context-aware actions.
 
+### Distinguishing Between a Knowledge Base and a Code Index
+
+While the seven-file architecture provides a robust framework for conceptual knowledge, a mature system benefits from explicitly distinguishing between two types of information stores:
+
+*   **The Knowledge Base (e.g., `techContext.md`, `systemPatterns.md`)**: This is the source of truth for the *why* behind the project. It contains conceptual, synthesized information like architectural decisions, rationales, and approved patterns. It is resilient to minor code changes and is curated through disciplined workflows.
+
+*   **The Code Index (e.g., an auto-generated `code_index.json`)**: This is a disposable, automated map of the codebase. It answers the question *what* is *where*. It is highly precise but brittle, and should be treated as a cache that can be regenerated at any time. It should **never** be edited manually.
+
+**The Hybrid Model Best Practice**:
+
+The most effective approach is a hybrid model that leverages both:
+
+1.  **Maintain the Conceptual Knowledge Base**: Continue using the core memory bank files to document high-level, resilient knowledge.
+2.  **Introduce an Automated Code Index**: Use tools to periodically parse the codebase and generate a detailed index of files, classes, and functions. This index is used for fast, precise lookups.
+3.  **Bridge the Gap**: The AI uses the **Code Index** for discovery (e.g., "Where is the `processPayment` function?") and the **Knowledge Base** for understanding (e.g., "What is our standard pattern for payment processing?"). Insights gained during a task are synthesized and added to the Knowledge Base, not the temporary index.
+
+This separation of concerns provides the precision of a detailed index without the maintenance overhead, while preserving the deep, conceptual knowledge that is crucial for long-term development.
 ## 2. Contextual Retrieval for Development Tasks
 
 Retrieval-Augmented Generation (RAG) is the process of fetching relevant information from a knowledge base to augment the AI's context before it generates a response. For software development, this is not a one-size-fits-all problem. The optimal retrieval strategy depends heavily on the specific task (e.g., debugging vs. refactoring).