From 5c6a98308765e6954219173a16d72462fd1ea7aa Mon Sep 17 00:00:00 2001 From: Nikhil Mundra Date: Thu, 12 Jun 2025 15:41:17 +0530 Subject: [PATCH] add concept of code_knowledge and code_index --- latest/Global.md | 16 ++++++++++++++++ memory_bank_best_practices.md | 17 +++++++++++++++++ 2 files changed, 33 insertions(+) diff --git a/latest/Global.md b/latest/Global.md index a5f1e96..1175190 100644 --- a/latest/Global.md +++ b/latest/Global.md @@ -92,6 +92,17 @@ This section is a comprehensive reference for each file in my Memory Bank, detai - **Update Frequency**: Continuously during a task. - **Update Triggers**: At the start, during, and end of every task. +- **`code_index.json` - The "Code Skeleton" (Automated)**: + - **Purpose**: An automatically generated, disposable index containing only a list of file names and the function names within them. It provides a fresh, accurate map of what exists and where. + - **Update Frequency**: On-demand or periodically. + - **CRITICAL RULE**: This file **MUST NOT** be edited manually. It is a cache to be overwritten. + +- **`code_knowledge.json` - The "Code Flesh" (AI-Managed)**: + - **Purpose**: A persistent knowledge base of granular details and subtleties for specific code elements. It is a key-value store where the key is a stable identifier (e.g., `filePath::functionName`) that is directly mapped from an entry in `code_index.json`. + - **Update Frequency**: Constantly, as new insights are discovered. + - **CRITICAL RULE**: To find knowledge about a function, first locate it in `code_index.json` to get its structure, then use its stable identifier as a key to look up the corresponding deep knowledge in this file. + + *I am free to create any more files if I feel like. Each specialized mode is free to create any number of files for memory bank.* --- @@ -103,6 +114,11 @@ This section provides practical guidelines for applying my core doctrine and pro ### 4.1. Practical Workflow Blueprints - **Debugging (Audit Trail Approach)**: A systematic investigation process: Observe -> Hypothesize -> Execute & Document -> Iterate -> Synthesize. - **Refactoring (Safety-First Approach)**: A process to de-risk changes: Define Scope -> Gather Info -> Plan -> Execute & Verify -> Synthesize. +- **Granular Code Analysis (Symbex Model)**: The standard method for linking conceptual knowledge to specific code. + 1. **Consult the Skeleton**: Use `code_index.json` to get an up-to-date map of the code structure and find the stable identifier for a target function or class. + 2. **Consult the Flesh**: Use the stable identifier to look up any existing granular knowledge, subtleties, or past observations in `code_knowledge.json`. + 3. **Synthesize and Act**: Combine the structural awareness from the index with the deep knowledge from the knowledge base to inform your action. + 4. **Update the Flesh**: If a new, valuable, needle-point insight is discovered, add it to the `code_knowledge.json` file under the appropriate stable identifier. ### 4.2. Task Management Guidelines - **Creating a Task**: Update `currentTask.md` with objectives, a detailed plan, and an "Impact Analysis" for refactors. diff --git a/memory_bank_best_practices.md b/memory_bank_best_practices.md index 907e75e..7b32af2 100644 --- a/memory_bank_best_practices.md +++ b/memory_bank_best_practices.md @@ -51,6 +51,23 @@ A proven architecture for structuring this knowledge consists of the following c This structured approach ensures that when the AI needs to perform a task, it can consult a specific, relevant document rather than parsing a massive, undifferentiated blob of text, leading to more accurate and context-aware actions. +### Distinguishing Between a Knowledge Base and a Code Index + +While the seven-file architecture provides a robust framework for conceptual knowledge, a mature system benefits from explicitly distinguishing between two types of information stores: + +* **The Knowledge Base (e.g., `techContext.md`, `systemPatterns.md`)**: This is the source of truth for the *why* behind the project. It contains conceptual, synthesized information like architectural decisions, rationales, and approved patterns. It is resilient to minor code changes and is curated through disciplined workflows. + +* **The Code Index (e.g., an auto-generated `code_index.json`)**: This is a disposable, automated map of the codebase. It answers the question *what* is *where*. It is highly precise but brittle, and should be treated as a cache that can be regenerated at any time. It should **never** be edited manually. + +**The Hybrid Model Best Practice**: + +The most effective approach is a hybrid model that leverages both: + +1. **Maintain the Conceptual Knowledge Base**: Continue using the core memory bank files to document high-level, resilient knowledge. +2. **Introduce an Automated Code Index**: Use tools to periodically parse the codebase and generate a detailed index of files, classes, and functions. This index is used for fast, precise lookups. +3. **Bridge the Gap**: The AI uses the **Code Index** for discovery (e.g., "Where is the `processPayment` function?") and the **Knowledge Base** for understanding (e.g., "What is our standard pattern for payment processing?"). Insights gained during a task are synthesized and added to the Knowledge Base, not the temporary index. + +This separation of concerns provides the precision of a detailed index without the maintenance overhead, while preserving the deep, conceptual knowledge that is crucial for long-term development. ## 2. Contextual Retrieval for Development Tasks Retrieval-Augmented Generation (RAG) is the process of fetching relevant information from a knowledge base to augment the AI's context before it generates a response. For software development, this is not a one-size-fits-all problem. The optimal retrieval strategy depends heavily on the specific task (e.g., debugging vs. refactoring).