add concept of code_knowledge and code_index

This commit is contained in:
Nikhil Mundra 2025-06-12 15:41:17 +05:30
parent fe30478f1f
commit 5c6a983087
2 changed files with 33 additions and 0 deletions

View file

@ -92,6 +92,17 @@ This section is a comprehensive reference for each file in my Memory Bank, detai
- **Update Frequency**: Continuously during a task. - **Update Frequency**: Continuously during a task.
- **Update Triggers**: At the start, during, and end of every task. - **Update Triggers**: At the start, during, and end of every task.
- **`code_index.json` - The "Code Skeleton" (Automated)**:
- **Purpose**: An automatically generated, disposable index containing only a list of file names and the function names within them. It provides a fresh, accurate map of what exists and where.
- **Update Frequency**: On-demand or periodically.
- **CRITICAL RULE**: This file **MUST NOT** be edited manually. It is a cache to be overwritten.
- **`code_knowledge.json` - The "Code Flesh" (AI-Managed)**:
- **Purpose**: A persistent knowledge base of granular details and subtleties for specific code elements. It is a key-value store where the key is a stable identifier (e.g., `filePath::functionName`) that is directly mapped from an entry in `code_index.json`.
- **Update Frequency**: Constantly, as new insights are discovered.
- **CRITICAL RULE**: To find knowledge about a function, first locate it in `code_index.json` to get its structure, then use its stable identifier as a key to look up the corresponding deep knowledge in this file.
*I am free to create any more files if I feel like. Each specialized mode is free to create any number of files for memory bank.* *I am free to create any more files if I feel like. Each specialized mode is free to create any number of files for memory bank.*
--- ---
@ -103,6 +114,11 @@ This section provides practical guidelines for applying my core doctrine and pro
### 4.1. Practical Workflow Blueprints ### 4.1. Practical Workflow Blueprints
- **Debugging (Audit Trail Approach)**: A systematic investigation process: Observe -> Hypothesize -> Execute & Document -> Iterate -> Synthesize. - **Debugging (Audit Trail Approach)**: A systematic investigation process: Observe -> Hypothesize -> Execute & Document -> Iterate -> Synthesize.
- **Refactoring (Safety-First Approach)**: A process to de-risk changes: Define Scope -> Gather Info -> Plan -> Execute & Verify -> Synthesize. - **Refactoring (Safety-First Approach)**: A process to de-risk changes: Define Scope -> Gather Info -> Plan -> Execute & Verify -> Synthesize.
- **Granular Code Analysis (Symbex Model)**: The standard method for linking conceptual knowledge to specific code.
1. **Consult the Skeleton**: Use `code_index.json` to get an up-to-date map of the code structure and find the stable identifier for a target function or class.
2. **Consult the Flesh**: Use the stable identifier to look up any existing granular knowledge, subtleties, or past observations in `code_knowledge.json`.
3. **Synthesize and Act**: Combine the structural awareness from the index with the deep knowledge from the knowledge base to inform your action.
4. **Update the Flesh**: If a new, valuable, needle-point insight is discovered, add it to the `code_knowledge.json` file under the appropriate stable identifier.
### 4.2. Task Management Guidelines ### 4.2. Task Management Guidelines
- **Creating a Task**: Update `currentTask.md` with objectives, a detailed plan, and an "Impact Analysis" for refactors. - **Creating a Task**: Update `currentTask.md` with objectives, a detailed plan, and an "Impact Analysis" for refactors.

View file

@ -51,6 +51,23 @@ A proven architecture for structuring this knowledge consists of the following c
This structured approach ensures that when the AI needs to perform a task, it can consult a specific, relevant document rather than parsing a massive, undifferentiated blob of text, leading to more accurate and context-aware actions. This structured approach ensures that when the AI needs to perform a task, it can consult a specific, relevant document rather than parsing a massive, undifferentiated blob of text, leading to more accurate and context-aware actions.
### Distinguishing Between a Knowledge Base and a Code Index
While the seven-file architecture provides a robust framework for conceptual knowledge, a mature system benefits from explicitly distinguishing between two types of information stores:
* **The Knowledge Base (e.g., `techContext.md`, `systemPatterns.md`)**: This is the source of truth for the *why* behind the project. It contains conceptual, synthesized information like architectural decisions, rationales, and approved patterns. It is resilient to minor code changes and is curated through disciplined workflows.
* **The Code Index (e.g., an auto-generated `code_index.json`)**: This is a disposable, automated map of the codebase. It answers the question *what* is *where*. It is highly precise but brittle, and should be treated as a cache that can be regenerated at any time. It should **never** be edited manually.
**The Hybrid Model Best Practice**:
The most effective approach is a hybrid model that leverages both:
1. **Maintain the Conceptual Knowledge Base**: Continue using the core memory bank files to document high-level, resilient knowledge.
2. **Introduce an Automated Code Index**: Use tools to periodically parse the codebase and generate a detailed index of files, classes, and functions. This index is used for fast, precise lookups.
3. **Bridge the Gap**: The AI uses the **Code Index** for discovery (e.g., "Where is the `processPayment` function?") and the **Knowledge Base** for understanding (e.g., "What is our standard pattern for payment processing?"). Insights gained during a task are synthesized and added to the Knowledge Base, not the temporary index.
This separation of concerns provides the precision of a detailed index without the maintenance overhead, while preserving the deep, conceptual knowledge that is crucial for long-term development.
## 2. Contextual Retrieval for Development Tasks ## 2. Contextual Retrieval for Development Tasks
Retrieval-Augmented Generation (RAG) is the process of fetching relevant information from a knowledge base to augment the AI's context before it generates a response. For software development, this is not a one-size-fits-all problem. The optimal retrieval strategy depends heavily on the specific task (e.g., debugging vs. refactoring). Retrieval-Augmented Generation (RAG) is the process of fetching relevant information from a knowledge base to augment the AI's context before it generates a response. For software development, this is not a one-size-fits-all problem. The optimal retrieval strategy depends heavily on the specific task (e.g., debugging vs. refactoring).