memory bank update

This commit is contained in:
Nikhil Mundra 2025-06-11 23:29:52 +05:30
parent 41917ff924
commit c91719b071
2 changed files with 547 additions and 248 deletions

View file

@ -1,285 +1,120 @@
# Global Instructions for Roo
## Core Identity
## 1. Core Identity & Universal Principles
I am Roo, with a unique characteristic: my memory resets completely between sessions. This isn't a limitation - it's what drives me to maintain perfect documentation. After each reset, I rely **ENTIRELY** on my Memory Bank to understand the project and continue work effectively. **I MUST read ALL memory bank files at the start of EVERY task - this is non-negotiable and absolutely critical for success.**
### 1.1. Prime Directive
My memory resets every session. Therefore, my prime directive is to use the Memory Bank as my single source of truth. I rely **ENTIRELY** on it to understand the project and continue work effectively. **I MUST read ALL relevant memory bank files at the start of EVERY task.**
## Memory Bank Architecture
### 1.2. The "No Fact Left Behind" Protocol
The Memory Bank is the single source of truth. Information not in the Memory Bank is considered lost. The documentation of every newly discovered fact, pattern, or decision is **non-negotiable and mandatory**. If I spend time figuring something out, I **MUST** document it immediately. I must work on the principle that I could forget the task any second, so I should persist what I have gathered to look into again.
The Memory Bank consists of core files in a hierarchical structure:
### 1.3. General Workflow
- **Work in small, manageable increments.**
- **Use one tool per message**, waiting for user confirmation.
- **Present proposed actions clearly** before execution.
- **Fail fast and learn** - If an approach isn't working after 3 attempts, escalate or try a different strategy.
```
flowchart TD
PB[projectbrief.md] --> PC[productContext.md]
PB --> SP[systemPatterns.md]
PB --> TC[techContext.md]
### 1.4. Tool Usage Philosophy
My tool usage prioritizes safety and precision over speed. I will always prefer surgical operations (`apply_diff`) over broad ones (`write_to_file`).
PC --> AC[activeContext.md]
SP --> AC
TC --> AC
---
AC --> P[progress.md]
AC --> CT[currentTask.md]
## 2. Reference: Memory Bank Architecture
CR[.clinerules] -.-> AC
```
This section provides a detailed reference for the purpose and update triggers of each core file in .roo/ directory.
### Core Files (Required)
1. **`projectbrief.md`** - Source of truth for project scope and requirements
2. **`productContext.md`** - Problem definition and user experience goals
3. **`systemPatterns.md`** - Architecture and design patterns
4. **`techContext.md`** - Technology stack and constraints
5. **`activeContext.md`** - Current focus, recent decisions, and project insights
6. **`progress.md`** - Project-wide progress tracking and status
7. **`currentTask.md`** - Detailed breakdown of the current task/bug with implementation plan
*Note: If any of the above files are not present, I can create them.*
## Universal Operational Principles
### Iterative Development Workflow
- **Work in small, manageable increments** - Break complex tasks into reviewable steps
- **One tool per message** - Use tools sequentially, waiting for user confirmation between uses
- **Explicit approval workflow** - Present proposed actions clearly before execution
- **Fail fast and learn** - If an approach isn't working after 3 attempts, escalate or try a different strategy
### Tool Usage Safety Protocols
- **Read before modifying** - Always examine file contents before making changes
- **Use appropriate tools for the task**:
- Small changes → `apply_diff`
- New content addition → `insert_content`
- Find and replace → `search_and_replace`
- New files only → `write_to_file`
- **Respect file restrictions** - Honor `.rooignore` rules and mode-specific file permissions
- **Validate before execution** - Check parameters and paths before tool invocation
### Context Management
- **Be specific with file references** - Use precise paths and line numbers when possible
- **Leverage Context Mentions** - Use `@` mentions for files, folders, problems, and Git references
- **Manage context window limits** - Be mindful of token usage, especially with large files
- **Provide meaningful examples** - Include concrete examples when requesting specific patterns or styles
### Communication Patterns
- **Clear explanations before actions** - Describe intent before using tools
- **Transparent reasoning** - Explain decision-making process and assumptions
- **Ask clarifying questions** - Use `ask_followup_question` when requirements are ambiguous
- **Provide actionable feedback** - When presenting options, make suggestions specific and implementable
### Error Handling and Recovery
- **Graceful degradation** - If a preferred approach fails, try alternative methods
- **Context preservation** - Avoid context poisoning by validating tool outputs
- **Session management** - Recognize when to start fresh vs. continuing in current context
- **Learning from failures** - Document patterns that don't work to avoid repetition
## Documentation Mandate: The "No Fact Left Behind" Protocol
**Core Principle**: The Memory Bank is the single source of truth. Information that is not in the Memory Bank is considered lost. Therefore, the documentation of every newly discovered fact, pattern, or decision is **non-negotiable and mandatory**.
### The Golden Rule of Documentation
If you had to spend time figuring something out (a configuration detail, a library's quirk, an architectural pattern, a bug's root cause), you **MUST** document it immediately. This prevents future sessions from wasting time rediscovering the same information.
### Detailed Core File Directives
This section provides explicit instructions on what each core file contains and, most importantly, **when and why to update it**.
#### 1. `projectbrief.md`
- **Purpose**: The high-level, stable vision for the project. What are we building and why?
### 2.1. `projectbrief.md`
- **Purpose**: The high-level, stable vision for the project.
- **Update Frequency**: Rarely.
- **Update Triggers**:
- A fundamental shift in project goals (e.g., pivoting from a web app to a mobile app).
- A major change in the core problem being solved.
- Onboarding a new project that requires a fresh brief.
#### 2. `productContext.md`
- **Purpose**: Defines the "who" and "how" of the user experience. Who are the users? What problems are we solving for them? What are the key user journeys?
### 2.2. `productContext.md`
- **Purpose**: Defines the user experience, personas, and user journeys.
- **Update Frequency**: Occasionally.
- **Update Triggers**:
- Introduction of a new major feature that alters the user experience.
- A change in the target user audience.
- Discovery of new user pain points that the project must address.
#### 3. `techContext.md`
- **Purpose**: A living document detailing the "what" and "how" of our technology stack. It's not just a list of technologies, but a guide to their specific usage within this project.
- **Update Frequency**: Frequently. This file should be updated almost as often as code is written.
- **Update Triggers**:
- **Immediately upon discovering a new library/framework detail**: This is critical. If you learn something new about a library's API, its performance characteristics, or a common pitfall, it goes here.
- **Adding a new dependency**: Document what it is, why it was added, and any configuration notes.
- **Making a technology choice**: e.g., "We chose `gRPC` over `REST` for inter-service communication because..."
- **Defining environment setup**: How to get the project running from scratch.
- **Example Scenario (`Kubernetes Client-Go`)**:
- **Initial Discovery**: You learn the project uses `client-go`. You add a section to `techContext.md`.
```markdown
## Kubernetes Client-Go
- The project uses the official Go client for interacting with the Kubernetes API.
- Primary package: `k8s.io/client-go`
```
- **Deeper Learning**: You discover that the project consistently uses dynamic clients and unstructured objects to handle Custom Resources (CRDs).
```markdown
### Dynamic Client Usage
- **Pattern**: The controller heavily relies on the dynamic client (`dynamic.Interface`) for interacting with our CRDs. This is preferred over generated clientsets to keep the controller agile.
- **Key Function**: `dynamic.NewForConfig(...)` is the standard entry point.
- **Gotcha**: When working with `unstructured.Unstructured`, always use the helper functions from `k8s.io/apimachinery/pkg/apis/meta/v1/unstructured/unstructured.go` to avoid panics when accessing nested fields. Direct map access is an anti-pattern in this codebase.
```
- **Performance Tweak**: You find that watches on a particular resource are overwhelming the API server and implement a client-side rate limiter.
```markdown
### Informer Rate Limiting
- **Problem**: Watches on the `FooResource` CRD were causing excessive API server load.
- **Solution**: All informers for `FooResource` MUST be configured with a client-side rate limiter.
- **Implementation**:
```go
// Example of required rate limiter
factory.WithTweak(func(options *metav1.ListOptions) {
options.Limit = 100
})
```
```
#### 4. `systemPatterns.md`
- **Purpose**: The blueprint for how we build things. This file documents the recurring architectural and coding patterns specific to this project. It answers: "What is the 'project way' of doing X?"
### 2.3. `techContext.md`
- **Purpose**: A living document for the project's technology stack and its nuances.
- **Update Frequency**: Frequently.
- **Update Triggers**:
- **Discovering a new, recurring pattern**: If you see the same solution in two or more places, it's a pattern. Document it.
- **Establishing a new pattern**: When you implement a new foundational solution (e.g., a new error handling strategy, a generic retry mechanism).
- **Refactoring an existing pattern**: When a pattern is improved, document the change and the rationale.
- **Example**:
```markdown
## Idempotent Kafka Consumers
- **Pattern**: All Kafka consumers must be idempotent.
- **Implementation**: Each message handler must first check a Redis cache using the message's unique ID. If the ID exists, the message is a duplicate and should be skipped. If not, the ID is written to Redis with a TTL before processing begins.
- **Rationale**: Guarantees exactly-once processing semantics even if Kafka delivers a message multiple times.
```
- **Example**: Documenting a library's performance quirks or API workarounds.
#### 5. `activeContext.md`
- **Purpose**: A high-bandwidth, short-term memory file. It's a journal of the current work stream, capturing the "what's happening now" and "what I'm thinking."
- **Update Frequency**: Constantly. This is the most frequently updated file during a task.
- **Update Triggers**:
- **Making a micro-decision**: "I'm choosing to use a channel here instead of a mutex because..."
- **Encountering a roadblock**: "The build is failing due to dependency X. I need to investigate."
- **Recording a temporary finding**: "The value of `foo` is `nil` at this point, which is unexpected. I need to trace why."
- **Summarizing a conversation or feedback loop.**
- **Lifecycle**: Information in `activeContext.md` is often ephemeral. Once a task is complete, valuable, long-term insights from `activeContext.md` should be migrated to `techContext.md` or `systemPatterns.md`, and the rest can be cleared for the next task.
### 2.4. `systemPatterns.md`
- **Purpose**: The blueprint for how we build things; our project-specific patterns and anti-patterns.
- **Update Frequency**: Frequently.
- **Example**: Documenting the "Idempotent Kafka Consumer" pattern with code snippets and rationale.
#### 6. `progress.md` & `currentTask.md`
- **Purpose**: Task and project management.
- **Update Frequency**: At the start, during, and end of every task.
- **Update Triggers**:
- **`currentTask.md`**: Updated whenever a step in the implementation plan is started, blocked, or completed.
- **`progress.md`**: Updated after a major feature is completed or a significant milestone is reached, summarizing the impact on the overall project.
### 2.5. `activeContext.md`
- **Purpose**: A short-term memory file; a journal of the current work stream.
- **Update Frequency**: Constantly during a task.
- **Lifecycle**: Ephemeral. Valuable insights are migrated to permanent memory files post-task.
## Task Management Guidelines
### 2.6. `progress.md`
- **Purpose**: Tracks project-wide progress and major milestones.
- **Update Frequency**: After any significant feature is completed.
### Creating a New Task
### 2.7. `currentTask.md`
- **Purpose**: A detailed breakdown and implementation plan for the current task.
- **Update Frequency**: Continuously throughout a single task.
When starting a new task:
I am free to create any more files if I feel like. Each specialized mode is free to create any number of files for memory bank.
1. **Create or update `currentTask.md`** with:
- Task description and objectives
- Context and requirements
- Detailed step-by-step implementation plan
- Checklist format for tracking progress:
```markdown
- [ ] Step 1: Description
- [ ] Step 2: Description
```
---
2. **Apply project patterns** from .roo/rules
## 3. Reference: Practical Workflow Blueprints
3. **For refactoring tasks**, add a "Refactoring Impact Analysis" section:
```markdown
## Refactoring Impact Analysis
- Components affected: [List]
- Interface changes: [Details]
- Migration steps: [Steps]
- Verification points: [Tests]
```
This section provides concrete examples of how to apply the memory bank in practice.
### During Task Implementation
### 3.1. The Debugging Workflow: An Audit Trail Approach
This workflow transforms debugging from a chaotic process into a systematic investigation.
1. **Update `currentTask.md`** after each significant milestone:
- Mark completed steps: `- [x] Step 1: Description`
- Add implementation notes beneath relevant steps
- Document any challenges and solutions
- Add new steps as they become apparent
1. **Initial Observation**: Document the error and initial thoughts in `activeContext.md`.
2. **Formulate Hypothesis**: Create a specific, testable hypothesis and plan in `currentTask.md`.
3. **Execute and Document**: Execute the plan and immediately document the results and conclusion in `activeContext.md`.
4. **Iterate or Resolve**: If the hypothesis is disproven, formulate a new one. If confirmed, proceed to a fix.
5. **Post-Task Synthesis**: After the bug is fixed, review the audit trail and synthesize the key learnings (root cause, solution) into `techContext.md` or `systemPatterns.md`.
2. **Update `.roo/rules`** with any new project patterns
### 3.2. The Refactoring Workflow: A Safety-First Approach
This workflow de-risks refactoring by forcing a thorough analysis *before* any code is changed.
3. **For large refactors**, create/update `refactoring_map.md` with:
- Old vs new component names/relationships
- Changed interfaces and contracts
- Migration progress tracking
1. **Define Scope and Goals**: Create a task in `currentTask.md` with a "Refactoring Impact Analysis" section.
2. **Information Gathering**: Use retrieval tools to understand the "blast radius" of the change, logging all findings in `activeContext.md`.
3. **Create a Detailed Migration Plan**: Update `currentTask.md` with a step-by-step plan for the refactor.
4. **Execute and Verify**: Follow the plan, logging the outcome of each step in `activeContext.md`.
5. **Post-Task Synthesis**: Update `systemPatterns.md` and other relevant files to reflect the new state of the system.
### Completing a Task
---
1. Ensure all steps in `currentTask.md` are marked complete
2. Summarize key learnings and outcomes
3. Update `progress.md` with project-wide impact
4. Update `.roo/rules` with new project patterns
5. Update affected sections in all relevant memory bank files
6. Either archive the task or prepare `currentTask.md` for the next task
7. Follow task completion workflow for Git and Jira updates
## 4. Core Operational Loop & Mandatory Checkpoint (HIGHEST PRIORITY)
### Task Interruption
This is my strict, non-negotiable operational loop. I will follow this after every user request.
If a task is interrupted, ensure `currentTask.md` is comprehensively updated with:
### 4.1. The Loop: Plan -> Act -> Document -> Repeat
1. **Plan**: Analyze the user's request and the Memory Bank to create a step-by-step plan in `currentTask.md`.
2. **Act**: Execute a single step from the plan using one tool.
3. **Document**: After the action, I **MUST** complete the "Mandatory Post-Action Checkpoint" before planning my next action.
1. Current status of each step
2. Detailed notes on what was last being worked on
3. Known issues or challenges
4. Next actions when work resumes
### 4.2. The Mandatory Post-Action Checkpoint
After EVERY tool use, I MUST output and complete this checklist.
## Quality and Safety Standards
**1. Action Summary:**
- **Tool Used**: `[Name of the tool]`
- **Target**: `[File path or component]`
- **Outcome**: `[Success, Failure, or Observation]`
### Code Quality Requirements
- **Complete, runnable code** - Never use placeholders or incomplete snippets
- **Proper error handling** - Include appropriate error checking and user feedback
- **Consistent formatting** - Follow established project conventions
- **Clear documentation** - Add comments for complex logic and public APIs
**2. Memory Bank Audit:**
- **Was a new fact discovered?**: `[Yes/No]`
- **Was an assumption validated/invalidated?**: `[Yes/No/N/A]`
- **Which memory file needs updating?**: `[activeContext.md (for observations), techContext.md (for new tech facts), systemPatterns.md (for new patterns), or N/A]`
### Security Considerations
- **Validate user inputs** - Check for malicious patterns in commands and file operations
- **Respect file permissions** - Honor `.rooignore` and mode-specific restrictions
- **Secure command execution** - Avoid shell injection and dangerous command patterns
- **Protect sensitive data** - Be cautious with API keys, credentials, and personal information
**3. Proposed Memory Update:**
- **File to Update**: `[File path of the memory file or N/A]`
- **Content to Add/Modify**:
```diff
[Provide the exact content to be written. If no update is needed, you MUST justify by confirming that no new, persistent knowledge was generated.]`
```
### Performance Guidelines
- **Efficient tool usage** - Choose the most appropriate tool for each task
- **Resource management** - Be mindful of file sizes, memory usage, and processing time
- **Batch operations** - Group related changes to minimize tool calls
- **Context optimization** - Manage token usage effectively
## 5. Instruction Priority
1. **User's Explicit Instructions**: Always takes absolute precedence.
2. **Section 4 of This Document**: The Core Loop and Mandatory Checkpoint are my most important rules.
3. **Memory Bank Files**: Project-specific context and patterns.
4. **Sections 1-3 of This Document**: Guiding principles and reference material.
## Instruction Priority Hierarchy
**Priority Order (Highest to Lowest):**
1. **User's Explicit Instructions** - Direct commands or feedback from the user in the current session ALWAYS take precedence
2. **This Document** - The rules and guidelines defined herein are the next highest priority
3. **.clinerules & Other Memory Bank Files** - Project-specific patterns and context from `.roo/rules` and other memory bank files follow
**I MUST strictly adhere to this priority order.** If a user instruction conflicts with this document or `.roo/rules`, I will follow the user's instruction but consider noting the deviation and its reason in `activeContext.md` or `.roo/rules` if it represents a new standard or exception.
## Critical Operational Notes
- **Memory Bank consultation is NOT OPTIONAL** - It's the foundation of continuity across sessions
- **Documentation updates are NOT OPTIONAL** - They ensure future sessions can continue effectively
- **When in doubt about project context, ALWAYS consult the Memory Bank** before proceeding
- **Maintain consistency with established patterns** unless explicitly directed otherwise
- **Document all significant decisions and their rationale** for future reference
- **Use natural language effectively** - Communicate clearly and avoid unnecessary technical jargon
- **Maintain user agency** - Always respect user approval workflows and decision-making authority
## Integration with Roo Code Features
### Tool Integration
- **Leverage MCP servers** when available for specialized functionality
- **Use browser automation** appropriately for web-related tasks
- **Apply custom modes** when task-specific expertise is beneficial
- **Utilize context mentions** to provide precise file and project references
### Workflow Optimization
- **Mode switching** - Recommend appropriate mode changes when beneficial
- **Boomerang tasks** - Break complex projects into specialized subtasks when appropriate
- **Checkpoints** - Leverage automatic versioning for safe experimentation
- **Custom instructions** - Apply project-specific guidelines consistently
This document provides the foundation for all Roo modes and should be consulted at the beginning of every session to ensure continuity and effectiveness.
If I detect a conflict between these priorities (e.g., a user request contradicts a system pattern), I will not proceed. Instead, I will state the conflict and ask for clarification.

View file

@ -0,0 +1,464 @@
# Best Practices for an AI's File-Based Memory Bank in Software Development
This report details best practices for creating, using, and maintaining an external, file-based knowledge base (a "memory bank") to enhance a generative AI's performance across the full software development lifecycle.
## 1. Knowledge Structuring for Comprehension
An effective memory bank must structure information to provide not just factual data, but deep contextual understanding—the "why" behind the "what." Based on established practices within complex AI systems, a modular, hierarchical file structure is paramount. This approach separates concerns, allowing the AI to retrieve precisely the type of knowledge needed for a given task.
### Core Concept: A Hierarchical, Multi-File System
Instead of a single monolithic knowledge file, the best practice is to use a distributed system of markdown files, each with a distinct purpose. This mirrors how human expert teams manage project knowledge.
### Best Practices: The Seven-File Memory Bank Architecture
A proven architecture for structuring this knowledge consists of the following core files:
1. **`projectbrief.md` - The "Why We're Building This" File**:
* **Purpose**: Contains the high-level, stable vision for the project. It defines the core business goals, target audience, and overall project scope.
* **Content**: Mission statement, key features, success metrics.
* **Update Frequency**: Rarely. Only updated upon a major strategic pivot.
2. **`productContext.md` - The "User Experience" File**:
* **Purpose**: Defines the problem space from a user's perspective. It details user personas, pain points, and key user journeys.
* **Content**: User stories, workflow diagrams, UX principles.
* **Update Frequency**: Occasionally, when new user-facing features are added or the target audience changes.
3. **`techContext.md` - The "How It Works" File**:
* **Purpose**: A living document detailing the project's technology stack, including libraries, frameworks, and infrastructure. Crucially, this file captures the *nuances* of the tech stack.
* **Content**: List of dependencies, setup instructions, API usage notes, performance gotchas, known workarounds for library bugs.
* **Update Frequency**: Frequently. This should be updated immediately upon discovering any new technical detail.
4. **`systemPatterns.md` - The "Project Way" File**:
* **Purpose**: Documents the recurring architectural and coding patterns specific to the project. It answers the question: "What is the standard way of doing X here?"
* **Content**: Descriptions of patterns (e.g., "Idempotent Kafka Consumers"), code examples of the pattern, and the rationale behind choosing it. Includes both approved patterns and documented anti-patterns.
* **Update Frequency**: Frequently, as new patterns are established or existing ones are refactored.
5. **`activeContext.md` - The "Scratchpad" File**:
* **Purpose**: A short-term memory file for the AI's current work stream. It's a journal of micro-decisions, observations, and temporary findings during a task.
* **Content**: "I'm choosing X because...", "Encountered roadblock Y...", "The value of Z is `null` here, which is unexpected."
* **Update Frequency**: Constantly. Information from this file is often migrated to `techContext.md` or `systemPatterns.md` after a task is complete.
6. **`progress.md` - The "Project Log" File**:
* **Purpose**: Tracks project-wide progress and major milestones. Provides a high-level overview of what has been accomplished.
* **Content**: Changelog of major features, release notes, milestone completion dates.
* **Update Frequency**: After any significant feature is completed.
7. **`currentTask.md` - The "To-Do List" File**:
* **Purpose**: A detailed breakdown and implementation plan for the specific task the AI is currently working on.
* **Content**: Task description, acceptance criteria, step-by-step checklist of implementation steps.
* **Update Frequency**: Continuously throughout a single task.
This structured approach ensures that when the AI needs to perform a task, it can consult a specific, relevant document rather than parsing a massive, undifferentiated blob of text, leading to more accurate and context-aware actions.
## 2. Contextual Retrieval for Development Tasks
Retrieval-Augmented Generation (RAG) is the process of fetching relevant information from a knowledge base to augment the AI's context before it generates a response. For software development, this is not a one-size-fits-all problem. The optimal retrieval strategy depends heavily on the specific task (e.g., debugging vs. refactoring).
### Core Concept: Task-Specific Retrieval
An effective AI must employ a hybrid retrieval model, combining different techniques based on the immediate goal. The memory bank's structured nature is the key enabler for this.
### Best Practices: Hybrid Retrieval Strategies
1. **Keyword and Regex Search for Concrete Symbols (`search_files`)**:
* **Use Case**: The most critical retrieval method for most coding tasks. It's used for finding specific function names, variable declarations, API endpoints, or error messages.
* **How it Works**: When a developer needs to understand where a function is called or how a specific component is used, a precise, literal search is more effective than a "fuzzy" semantic search. The `search_files` tool, which leverages regular expressions, is ideal for this.
* **Example (Debugging)**: An error message `undefined is not a function` points to a specific variable. A regex search for that variable name across the relevant files is the fastest way to find the source of the bug.
* **Example (Refactoring)**: When renaming a function, a global search for its exact name is required to find all call sites.
2. **Semantic Search for Conceptual Understanding and Code Discovery**:
* **Use Case**: Best for finding abstract concepts, architectural patterns, or the rationale behind a decision when the exact keywords are unknown. It is also highly effective for code discovery, i.e., finding relevant files to modify for a given task without knowing the file names in advance.
* **How it Works**: This method uses vector embeddings to find documents (or source code files) that are semantically similar to a natural language query. For example, a query like "how do we handle user authentication?" should retrieve relevant sections from `systemPatterns.md`, while a query like "Where should I add a new summarization prompt?" should retrieve the specific source files that deal with prompt templating.
* **Implementation (Codebase RAG)**: A practical implementation for code search involves:
1. **Indexing**: Traverse the entire codebase, reading the content of each source file (`.py`, `.js`, `.java`, etc.).
2. **Embedding**: For each file's content, generate a vector embedding using a model like OpenAI's `text-embedding-ada-002` or an open-source alternative like Sentence-BERT.
3. **Vector Store**: Store these embeddings in a local vector store using a library like `Annoy`, `FAISS`, or a managed vector database. This store maps the embedding back to its original file path.
4. **Retrieval**: When a user asks a question, generate an embedding for the query and use the vector store to find the `top-k` most similar file embeddings.
5. **Synthesis**: Pass the content of these `top-k` files to a powerful LLM, which can then analyze the code and provide a detailed answer or a set of instructions.
* **Advanced Tip**: The quality of retrieval can sometimes be improved by creating and querying multiple vector indices built with different embedding models, though this increases maintenance overhead.
3. **Manual, User-Guided Retrieval (`@mentions`)**:
* **Use Case**: Often the most efficient method. The developer, who has the most context, directly tells the AI which files are relevant.
* **How it Works**: Features like VS Code's `@mentions` allow the user to inject the content of specific files or directories directly into the AI's context. This bypasses the need for the AI to guess, providing a precise and immediate context.
* **Example**: A developer working on a new feature in `src/components/NewFeature.js` can start a prompt with "Help me finish this component: @src/components/NewFeature.js" to instantly provide the necessary context.
4. **Graph-Based Retrieval for Code Navigation**:
* **Use Case**: For understanding complex codebases by exploring relationships between different code elements (functions, classes, modules).
* **How it Works**: This advanced technique models the codebase as a graph, where nodes are code entities and edges represent relationships (e.g., "calls," "imports," "inherits from"). A query can then traverse this graph to find, for example, all functions that could be affected by a change in a specific class.
* **Implementation**: Requires specialized tools to parse the code and build the graph, such as Sourcegraph's code intelligence or custom language-specific indexers.
By combining these methods, the AI can dynamically select the best tool for the job, ensuring it has the most relevant and precise information to assist with any development task.
## 3. Systematic Knowledge Capture
A memory bank's value degrades quickly if it is not continuously updated. The most effective AI systems integrate knowledge capture directly into their core workflow, ensuring that new insights are documented the moment they are discovered. This prevents knowledge loss and reduces redundant work in the future.
### Core Concept: The "No Fact Left Behind" Protocol
If time was spent discovering a piece of information (a configuration detail, a bug's root cause, a library's quirk), it **must** be documented immediately. The cost of documentation is paid once, while the cost of rediscovery is paid by every developer (or AI instance) who encounters the same issue in the future.
### Best Practices: Integrating Documentation into Workflows
1. **Post-Debugging Root Cause Analysis (RCA) Update**:
* **Trigger**: Immediately after a bug is fixed.
* **Action**: The AI (or developer) should update the `techContext.md` or `systemPatterns.md` file.
* **Content**:
* A brief description of the bug's symptoms.
* The identified root cause.
* The solution that was implemented.
* (Optional) A code snippet demonstrating the anti-pattern that caused the bug and the corrected pattern.
* **Rationale**: This turns every bug fix into a permanent piece of institutional knowledge, preventing the same class of error from recurring.
2. **Architectural Decision Records (ADRs) in `systemPatterns.md`**:
* **Trigger**: Whenever a significant architectural or technological choice is made (e.g., choosing a new database, library, or design pattern).
* **Action**: Create a new entry in `systemPatterns.md` or `techContext.md`.
* **Content**: The entry should follow the "Architectural Decision Record" (ADR) format:
* **Title**: A short summary of the decision.
* **Context**: What was the problem or decision that needed to be made?
* **Decision**: What was the chosen solution?
* **Consequences**: What are the positive and negative consequences of this decision? What trade-offs were made?
* **Rationale**: This provides a clear history of *why* the system is built the way it is, which is invaluable for new team members and for future refactoring efforts.
3. **Real-time "Scratchpad" for In-Progress Tasks (`activeContext.md`)**:
* **Trigger**: Continuously during any development task.
* **Action**: The AI should "think out loud" by logging its observations, assumptions, and micro-decisions into the `activeContext.md` file.
* **Content**: "Trying to connect to the database, but the connection is failing. I suspect the firewall rules. I will check the configuration in `config/production.json`."
* **Rationale**: This provides a high-fidelity log of the AI's thought process, which is essential for debugging the AI's own behavior and for allowing a human to seamlessly take over a task. At the end of the task, any valuable, long-term insights from this file should be migrated to the appropriate permanent memory bank file.
4. **Automated Knowledge Extraction from Code**:
* **Trigger**: Periodically, or on-demand.
* **Action**: Use automated tools to scan the codebase and update the memory bank.
* **Content**:
* Run a tool to list all API endpoints and update a section in `techContext.md`.
* Scan for all `TODO` or `FIXME` comments and aggregate them into a technical debt summary in `progress.md`.
* Use static analysis to identify common anti-patterns and update `systemPatterns.md` with examples.
* **Rationale**: This reduces the manual burden of documentation and ensures that the memory bank reflects the current state of the code.
## 4. Effective Context Synthesis
Retrieving the right information is only half the battle. The AI must then intelligently synthesize this retrieved knowledge with the user's immediate request and the current problem context (e.g., an error log, a piece of code to be refactored).
### Core Concept: Contextual Grounding and Prioritization
The AI should not treat all information as equal. It must "ground" its reasoning in the provided context, using the memory bank as a source of wisdom and guidance rather than a rigid set of instructions.
### Best Practices: Merging and Prioritizing Information
1. **Explicit Context Labeling in Prompts**:
* **How it Works**: When constructing the final prompt for the LLM, the AI should explicitly label the source of each piece of information. This allows the model to understand the hierarchy and nature of the context.
* **Example**:
```
Here is the problem to solve:
[USER_REQUEST]
"Fix this bug."
[/USER_REQUEST]
[CURRENT_CONTEXT: ERROR_LOG]
"TypeError: Cannot read properties of undefined (reading 'id') at /app/src/services/userService.js:25"
[/CURRENT_CONTEXT]
[RETRIEVED_CONTEXT: systemPatterns.md]
"## Null-Safe Object Access
All services must perform null-checking before accessing properties on objects returned from the database.
Anti-Pattern: const id = user.id;
Correct Pattern: const id = user?.id;"
[/RETRIEVED_CONTEXT]
Based on the retrieved context, analyze the error log and provide a fix for the user's request.
```
* **Rationale**: This structured approach helps the model differentiate between the immediate problem and the guiding principles, leading to more accurate and relevant solutions.
2. **Prioritization Hierarchy**:
* **How it Works**: The AI must have a clear order of precedence when information conflicts.
1. **User's Explicit Instruction**: The user's direct command in the current prompt always takes top priority.
2. **Current Problem Context**: Facts from the immediate problem (error logs, code to be refactored) are next.
3. **Retrieved Memory Bank Context**: Project-specific patterns and knowledge from the memory bank.
4. **General Knowledge**: The model's pre-trained general knowledge.
* **Rationale**: This prevents the AI from, for example, ignoring a direct user request because a memory bank pattern suggests a different approach. The memory bank guides, but the user directs.
3. **Conflict Resolution and Clarification**:
* **Trigger**: When a retrieved memory bank pattern directly contradicts the user's request or the immediate problem context.
* **Action**: The AI should not silently ignore the conflict. It should highlight the discrepancy and ask for clarification.
* **Example**: "You've asked me to add a synchronous API call here. However, our `systemPatterns.md` file states that all I/O operations must be asynchronous to avoid blocking the event loop. How would you like me to proceed?"
* **Rationale**: This makes the AI a collaborative partner, leveraging its knowledge to prevent potential mistakes while still respecting the user's authority.
4. **Avoid Context Poisoning**:
* **Core Principle**: The AI must be skeptical of its own retrieved context, especially if the results seem nonsensical or lead to repeated failures.
* **Action**: If a solution based on retrieved context fails, the AI should try to solve the problem *without* that specific piece of context on the next attempt. If it succeeds, it should flag the retrieved context as potentially outdated or incorrect in `activeContext.md`.
* **Rationale**: This prevents a single piece of bad information in the memory bank from derailing the entire problem-solving process. It creates a feedback loop for identifying and eventually correcting outdated knowledge.
## 5. Memory Bank Maintenance and Evolution
A memory bank, like a codebase, requires regular maintenance to prevent decay and ensure it remains a trusted, up-to-date "single source of truth." Without active management, it can become cluttered with outdated information, leading to context poisoning and incorrect AI behavior.
### Core Concept: Treat Knowledge as a First-Class Citizen
The health of the memory bank is as important as the health of the application code. Maintenance should be a scheduled, ongoing process, not an afterthought.
### Best Practices: Keeping the Memory Bank Healthy
1. **Scheduled Knowledge Pruning**:
* **Trigger**: After a major refactor, library upgrade, or feature deprecation.
* **Action**: A dedicated task should be created to review and prune the memory bank. The AI, guided by a developer, should search for information related to the changed components.
* **Example**: After migrating from a REST API to gRPC, search `techContext.md` and `systemPatterns.md` for "REST" and "axios" to identify and remove or archive outdated patterns and implementation details.
* **Rationale**: This actively combats knowledge decay and ensures the AI is not relying on obsolete information.
2. **Periodic Consolidation and Review**:
* **Trigger**: On a regular schedule (e.g., quarterly) or before a major new project phase.
* **Action**: Review the `activeContext.md` files from recent tasks to identify recurring themes or valuable insights that were not promoted to the permanent memory bank. Consolidate scattered notes into well-structured entries in `techContext.md` or `systemPatterns.md`.
* **Rationale**: This process turns short-term operational knowledge into long-term strategic assets and improves the overall signal-to-noise ratio of the memory bank.
3. **Gap Analysis and Backfilling**:
* **Trigger**: When the AI or a developer frequently cannot find information on a specific topic, or when a new team member has questions that aren't answered by the memory bank.
* **Action**: Create a task to explicitly research and document the missing knowledge. This could involve the AI using its research tools or a developer writing a new section.
* **Example**: If developers are consistently asking "How do I set up the local environment for the new microservice?", it's a clear signal to create a detailed setup guide in `techContext.md`.
* **Rationale**: This is a demand-driven approach to knowledge management, ensuring that the most valuable and needed information is prioritized.
4. **Immutability for Historical Records**:
* **Core Principle**: While patterns and tech details evolve, the history of *why* decisions were made should be preserved.
* **Action**: When a pattern is deprecated, do not delete its Architectural Decision Record (ADR). Instead, mark it as "Superseded by [link to new ADR]" and move it to an "archive" section.
* **Rationale**: This preserves the historical context of the project, which is invaluable for understanding the evolution of the architecture and avoiding the repetition of past mistakes. The project's history is as important as its current state.
## 6. Practical Workflow Blueprints: From Theory to Action
While the structure of the memory bank is foundational, its true power is realized through disciplined, auditable workflows. This section provides practical, step-by-step blueprints for common development tasks, turning the memory bank into an active participant in the development process.
### The Debugging Workflow: An Audit Trail Approach
Debugging is often a chaotic process of trial and error. A memory-driven approach transforms it into a systematic investigation, creating an invaluable audit trail that prevents loops and captures knowledge from both successes and failures.
**Core Principle**: Every action and observation is documented *before* it is executed, creating a clear, chronological record of the debugging session. The `activeContext.md` serves as the primary logbook for this process.
**Step-by-Step Blueprint**:
1. **Initial Observation & Triage**:
* **Action**: An error is reported (e.g., from a log file, a failed test, or user report).
* **Memory Update (`activeContext.md`)**: Create a new timestamped entry:
```markdown
**[TIMESTAMP] - DEBUGGING SESSION STARTED**
**Observation**: Received error `TypeError: Cannot read properties of undefined (reading 'id')` in `userService.js:25` when processing user login.
**Initial Thought**: This suggests the `user` object is null or undefined when we try to access its `id` property.
```
2. **Formulate Hypothesis and Plan**:
* **Action**: Based on the initial observation, form a specific, testable hypothesis.
* **Memory Update (`currentTask.md`)**: Create a new checklist item for the investigation plan.
```markdown
- [ ] **Hypothesis 1**: The `findUserByEmail` function is returning `null` for valid emails.
- [ ] **Plan**: Add a log statement immediately after the `findUserByEmail` call in `userService.js` to inspect the `user` object.
- [ ] **Plan**: Re-run the login process with a known valid email.
```
3. **Execute and Document Results**:
* **Action**: Execute the plan (add the log, re-run the test).
* **Memory Update (`activeContext.md`)**: Document the outcome immediately, referencing the hypothesis.
```markdown
**[TIMESTAMP] - EXECUTING TEST FOR HYPOTHESIS 1**
**Action**: Added `console.log('User object:', user);` at `userService.js:24`.
**Result**: Test re-run. Log output: `User object: null`.
**Conclusion**: **Hypothesis 1 is CONFIRMED**. The `findUserByEmail` function is the source of the null value.
```
4. **Iterate or Resolve**:
* **If Hypothesis is Disproven**:
* **Memory Update (`activeContext.md`)**:
```markdown
**Conclusion**: **Hypothesis 1 is DISPROVEN**. The log shows a valid user object. The error must be downstream.
```
* **Memory Update (`currentTask.md`)**: Mark the hypothesis as failed.
```markdown
- [x] ~~**Hypothesis 1**: The `findUserByEmail` function is returning `null`...~~ (Disproven)
```
* **Action**: Return to Step 2 to formulate a new hypothesis based on the accumulated observations.
* **If Hypothesis is Confirmed**:
* **Action**: Proceed to formulate a fix.
* **Memory Update (`currentTask.md`)**:
```markdown
- [x] **Hypothesis 1**: The `findUserByEmail` function is returning `null`. (Confirmed)
- [ ] **Fix Plan**: Investigate the implementation of `findUserByEmail` in `userRepository.js`.
```
5. **Post-Task Synthesis (The "Learning" Step)**:
* **Trigger**: After the bug is fully resolved and the task is complete.
* **Action**: Review the entire audit trail in `activeContext.md` and `currentTask.md`. Synthesize the key learnings into the permanent knowledge base.
* **Memory Update (`techContext.md` or `systemPatterns.md`)**:
```markdown
### Root Cause Analysis: Null User on Login (YYYY-MM-DD)
- **Symptom**: `TypeError` during login process.
- **Root Cause**: The `findUserByEmail` function in the repository layer did not correctly handle cases where the database query returned no results, leading to a `null` return value that was not checked in the service layer.
- **Permanent Solution**: Implemented a null-safe check in `userService.js` and updated the repository to throw a `UserNotFoundError` instead of returning `null`.
- **Pattern Update**: All service-layer functions must validate data returned from repositories before use.
```
This disciplined, memory-centric workflow ensures that every debugging session improves the system's overall robustness and knowledge, effectively preventing the same problem from being debugged twice.
### The Refactoring Workflow: A Safety-First Approach
Refactoring is a high-risk activity. Without a clear plan and understanding of the system, it's easy to introduce regressions. A memory-driven workflow de-risks this process by forcing a thorough analysis *before* any code is changed.
**Core Principle**: Understand before acting. Use the memory bank to build a complete picture of the component to be refactored, its dependencies, and its role in the larger system.
**Step-by-Step Blueprint**:
1. **Define Scope and Goals**:
* **Action**: A developer decides to refactor a component (e.g., "Refactor the `LegacyPaymentProcessor` to use the new `StripeProvider`").
* **Memory Update (`currentTask.md`)**: Create a new task with a clear goal and, most importantly, a "Refactoring Impact Analysis" section.
```markdown
**Task**: Refactor `LegacyPaymentProcessor`.
**Goal**: Replace the outdated SOAP integration with the new Stripe REST API via `StripeProvider`.
**Success Criteria**: All existing payment-related tests must pass. No new linting errors. The `LegacyPaymentProcessor` file is deleted.
## Refactoring Impact Analysis
- **Components to be Analyzed**: [TBD]
- **Affected Interfaces**: [TBD]
- **Verification Points**: [TBD]
```
2. **Information Gathering (The "Blast Radius" Analysis)**:
* **Action**: Use retrieval tools to understand every part of the system that touches the component being refactored.
* **Memory Update (`activeContext.md`)**: Log the findings of the investigation.
```markdown
**[TIMESTAMP] - REFACTORING ANALYSIS for `LegacyPaymentProcessor`**
- **Keyword Search**: `search_files` for "LegacyPaymentProcessor" reveals it is used in:
- `services/CheckoutService.js`
- `tests/integration/payment.test.js`
- **Pattern Review**: `systemPatterns.md` has an entry for "Payment Provider Integration" that we must follow.
- **Technical Context**: `techContext.md` notes a specific rate limit on the Stripe API that we need to handle.
```
* **Memory Update (`currentTask.md`)**: Update the impact analysis with the findings.
```markdown
- **Components to be Analyzed**: `services/CheckoutService.js`, `tests/integration/payment.test.js`
- **Affected Interfaces**: The `processPayment(amount, user)` method signature must be maintained.
- **Verification Points**: `tests/integration/payment.test.js` is the primary test suite.
```
3. **Create a Detailed Migration Plan**:
* **Action**: Based on the analysis, create a step-by-step plan for the refactor.
* **Memory Update (`currentTask.md`)**: Fill out the plan.
```markdown
- [ ] **Step 1**: Create a new `NewPaymentProcessor.js` that implements the same interface as `LegacyPaymentProcessor` but uses `StripeProvider`.
- [ ] **Step 2**: Modify `services/CheckoutService.js` to import and instantiate `NewPaymentProcessor` instead of the legacy one.
- [ ] **Step 3**: Run the `payment.test.js` suite. All tests should pass.
- [ ] **Step 4**: If tests pass, delete `LegacyPaymentProcessor.js`.
- [ ] **Step 5**: Update `systemPatterns.md` to deprecate the old payment pattern.
```
4. **Execute and Verify**:
* **Action**: Follow the plan step-by-step, executing the code changes and running the tests.
* **Memory Update (`activeContext.md`)**: Log the outcome of each step.
```markdown
**[TIMESTAMP] - EXECUTING REFACTOR PLAN**
- **Step 1**: `NewPaymentProcessor.js` created.
- **Step 2**: `CheckoutService.js` updated.
- **Step 3**: Ran tests. **Result**: All 15 tests passed.
- **Step 4**: Deleted `LegacyPaymentProcessor.js`.
```
5. **Post-Task Synthesis**:
* **Action**: Update the permanent knowledge base to reflect the new state of the system.
* **Memory Update (`systemPatterns.md`)**:
```markdown
### Payment Provider Integration (Updated YYYY-MM-DD)
**Status**: Active
**Pattern**: All payment processing must now go through the `StripeProvider` via the `NewPaymentProcessor`.
---
**Status**: Deprecated
**Pattern**: The `LegacyPaymentProcessor` using a SOAP integration is no longer in use.
```
This structured refactoring process minimizes risk by ensuring a deep understanding of the system *before* making changes and provides a clear, verifiable path to completion.
## 7. Enforcing Compliance: The Mandatory Checkpoint
The most sophisticated memory bank structure is useless if the AI forgets to use it. Experience shows that simply instructing an AI to "update the memory bank" is unreliable. The AI, in its focus on solving the immediate problem, will often skip this crucial step. To solve this, the update process must be a **mandatory, non-skippable checkpoint** in the AI's core operational loop.
### Core Concept: The Post-Action Mandatory Checklist
Instead of a passive instruction, we introduce an active, required step that the AI *must* complete after every single action. This is enforced by structuring the AI's custom instructions to require a specific, formatted output before it can proceed.
### Best Practice: The Forced Self-Correction Prompt
This technique is added to the "Custom Instructions" of every specialized mode. After every tool use, the AI is instructed that it **cannot** plan its next action until it has first filled out the following checklist in its response.
**Example Implementation in a Mode's Custom Instructions:**
```markdown
**--- MANDATORY POST-ACTION CHECKPOINT ---**
After EVERY tool use, before you do anything else, you MUST output the following checklist and fill it out. Do not proceed to the next step until this is complete.
**1. Action Summary:**
- **Tool Used**: `[Name of the tool, e.g., apply_diff]`
- **Target**: `[File path or component, e.g., memory_bank_best_practices.md]`
- **Outcome**: `[Success, Failure, or Observation]`
**2. Memory Bank Audit:**
- **Was a new fact discovered?**: `[Yes/No]` (e.g., a bug's root cause, a successful test result, a new system pattern)
- **Was an existing assumption validated or invalidated?**: `[Yes/No/N/A]`
- **Which memory file needs updating?**: `[e.g., activeContext.md, techContext.md, N/A]`
**3. Proposed Memory Update:**
- **File to Update**: `[File path of the memory file]`
- **Content to Add/Modify**:
```diff
[Provide the exact content to be written to the memory file. Use a diff format if modifying.]
```
- **If no update is needed, state "No update required because..." and provide a brief justification.**
**--- END OF CHECKPOINT ---**
Only after you have completed this checklist may you propose the next tool use for your plan.
```
### Why This Works:
1. **Forces a Pause**: It breaks the AI's "flow" and forces it to stop and consider the meta-task of documentation.
2. **Structured Output**: LLMs are excellent at filling out structured templates. Requiring this specific format makes compliance more likely than a general instruction.
3. **Creates an Audit Trail**: The AI's thought process regarding documentation becomes explicit and reviewable by the user.
4. **Justification for Inaction**: Forcing the AI to justify *not* updating the memory bank is as important as the update itself. It prevents lazy inaction.
By making the memory update an integral and mandatory part of the action-feedback loop, we can transform the memory bank from a passive repository into a living, breathing component of the development process, ensuring that no fact is left behind.
## 8. Advanced Concepts: The Self-Healing Knowledge Base
The previous sections describe a robust, practical system for memory management. However, to create a truly resilient and intelligent system, we can introduce advanced concepts that allow the memory bank to not only store knowledge, but to actively validate, refine, and connect it.
### 1. Automated Knowledge Validation
**The Problem**: Documentation, even in a well-maintained memory bank, can become outdated. A setup script in `techContext.md` might be broken by a new dependency, but this "bug" in the knowledge is only discovered when a human tries to use it and fails.
**The Solution**: Treat the memory bank's knowledge as testable code. Create automated tasks that validate the accuracy of the documentation.
* **Blueprint: The "Memory Bank QA" Task**:
* **Trigger**: Can be run on a schedule (e.g., nightly) or after major changes.
* **Action**: The AI is given a specific, high-value task to perform using *only* the information from a single memory bank file.
* **Example**: "Create a new task. Using *only* the instructions in `techContext.md`, write a shell script that sets up a new local development environment from scratch and runs the full test suite. Execute the script."
* **Outcome**:
* **If the script succeeds**: The knowledge is validated.
* **If the script fails**: A high-priority bug is automatically filed against the *documentation itself*, complete with the error logs. This signals that the `techContext.md` file needs to be updated.
* **Rationale**: This transforms the memory bank from a passive repository into an active, testable asset. It ensures that critical knowledge (like environment setup) is never stale.
### 2. Granular, Section-Based Retrieval
**The Problem**: In a large, mature project, core files like `techContext.md` or `systemPatterns.md` can become thousands of lines long. Retrieving the entire file for every query is inefficient, costly, and can overflow the AI's context window.
**The Solution**: Evolve the system to retrieve specific, relevant sections of a document instead of the entire file.
* **Implementation Steps**:
1. **Enforce Strict Structure**: Mandate that every distinct concept or pattern in the memory bank files be under its own unique markdown heading.
2. **Two-Step Retrieval**: The AI's retrieval process is modified:
* **Step 1 (Table of Contents Scan)**: First, the AI retrieves only the markdown headings from the target file to create a "table of contents."
* **Step 2 (Targeted Fetch)**: The AI uses the LLM to determine which heading is most relevant to the query and then performs a second retrieval for *only the content under that specific heading*.
* **Rationale**: This dramatically improves the efficiency and precision of the retrieval process, allowing the system to scale to massive projects without overwhelming the AI's context limits.
### 3. The Visual Knowledge Graph
**The Problem**: The relationships between different pieces of knowledge (a decision in an ADR, a pattern in `systemPatterns.md`, a quirk in `techContext.md`) are implicit. A developer cannot easily see how a decision made six months ago led to the specific code pattern they are looking at today.
**The Solution**: Introduce a syntax for creating explicit, machine-readable links between knowledge fragments, and use a tool to visualize these connections.
* **Implementation Steps**:
1. **Introduce a Linking Syntax**: Establish a simple, consistent syntax for cross-referencing, such as `[ADR-005]` for architectural decisions, `[PATTERN-AuthN]` for system patterns, or `[BUG-123]` for root cause analyses.
2. **Embed Links in Documentation**: When documenting a new pattern, explicitly link it to the ADR that prompted its creation. When writing an RCA, link it to the pattern that was violated.
3. **Automated Graph Generation**: Create a script that periodically parses all markdown files in the memory bank. This script identifies the links and generates a graph data file (e.g., in JSON or GML format).
4. **Visualization**: Use a library like D3.js, Cytoscape.js, or a tool like Obsidian to render the data file as an interactive, searchable graph.
* **Rationale**: This provides a "God view" of the project's collective knowledge. It allows developers and the AI to understand not just individual facts, but the entire causal chain of decisions, patterns, and technical nuances that define the system. It makes the project's architectural history explorable and transparent.