111 KiB
Prompt Engineering Best Practices for Roo Modes
This document will store notes and findings about effective prompt engineering techniques relevant to creating powerful and reliable Roo Code modes.
Key Areas of Research:
- Emphasizing critical instructions.
- Structuring prompts for complex tool use.
- Persona definition for AI agents.
- Eliciting step-by-step reasoning.
- Handling ambiguity and errors.
- Ensuring adherence to negative constraints (e.g., "do not do X").
- Role of examples in prompts.
- Chain-of-Thought (CoT) and Tree-of-Thought (ToT) prompting integration within mode instructions.
- Context length management strategies within prompts.
Insights for Persona Definition (from promptingguide.ai/agents/introduction)
-
Proactive Traits for AI Agents:
- Define the agent's ability to plan and reflect: "analyze a problem, break it down into steps, and adjust their approach based on new information."
- Encourage the agent to propose steps rather than waiting for explicit micro-instructions.
- Prompt the agent to anticipate issues or suggest alternative/improved paths based on its analysis.
-
Collaborative Traits for AI Agents:
- Emphasize the use of memory (learning from past experiences, user feedback, project context) to inform decisions and interactions.
- Reinforce the iterative workflow (propose, seek feedback/approval, execute) as a collaborative process.
- The agent should be framed as a partner that learns and adapts through interaction.
Insights from PromptHub Blog on AI Agents (prompthub.us/blog/prompt-engineering-for-ai-agents)
- Structured Tool Definition: Clearly define tool syntax (e.g., XML-like) and provide examples within the system prompt for consistency and debuggability (Example: Cline).
- Iterative Execution with Confirmation: Enforce a one-tool-per-message limit and require user confirmation after each execution to create a robust feedback loop and minimize cascading errors (Example: Cline).
- Phased Approach (Plan/Act): Consider instructing agents to operate in distinct phases, such as a planning/strategizing phase before an execution phase, to improve clarity and reduce errors (Example: Cline's Plan Mode vs. Act Mode).
- Emphasis on Critical Instructions: Using strong visual cues like capitalization (e.g., "ULTRA IMPORTANT", "CRITICAL") or bolding for non-negotiable instructions is a valid technique to improve adherence (Example: Bolt, Roo's existing use of "MUST" / "HIGHEST PRIORITY").
- Holistic Contextual Analysis: Instruct the agent to consider the entire project context, dependencies, and potential impacts comprehensively before taking significant actions or generating artifacts (Example: Bolt's "Think HOLISTICALLY and COMPREHENSIVELY").
- Clarity and Simplicity in Prompts: Reiteration of core principles: use simple language, provide all necessary context (don't assume agent knowledge), be specific in instructions, and iterate on prompts based on performance.
Insights for Reliability and Error Handling (from learnprompting.org/docs/reliability/introduction)
- General Reliability Aim: Most prompting techniques inherently aim to improve completion accuracy and thus reliability.
- Self-Consistency: A key technique for improving reasoning and reliability.
- Common LLM Challenges Affecting Reliability:
- Hallucinations.
- Flawed explanations (even with Chain-of-Thought).
- Biases: Majority label, recency, common token bias.
- Zero-Shot CoT can be particularly biased on sensitive topics.
- High-Level Solutions for Improving Reliability:
- Calibrators: To remove a priori biases from model outputs.
- Verifiers: To score or validate the quality/accuracy of completions.
- Promoting Diversity in Completions: Can help in ensembling or selecting the best output.
- Further Exploration Needed: The sub-topics on learnprompting.org (Prompt Debiasing, Prompt Ensembling, LLM Self-Evaluation, Calibrating LLMs) might contain more specific techniques for robust tool use and error handling within prompts.
Insights for Emphasizing Critical Instructions (from codingscape.com & arXiv:2312.16171v1)
- Direct & Affirmative Language: Use direct commands (e.g., "Do X") and avoid polite fillers ("please," "thank you") or negative phrasing ("Do not do Y"; instead, state what should be done).
- Structural Clarity & Delimiters:
- Use clear delimiters to separate sections of the prompt (e.g.,
###Instruction###
,###Context###
,###Examples###
). - Use line breaks to clearly separate distinct instructions, examples, and input data.
- Use clear delimiters to separate sections of the prompt (e.g.,
- Keywords for Emphasis & Imperatives:
- Incorporate strong imperative phrases like "Your task is..." and, critically, "You MUST..." (aligns with existing Roo strategy).
- Consider negative reinforcement for non-negotiable constraints: "You will be penalized if [condition is violated]."
- Repetition: Repeating a critical word, phrase, or instruction multiple times within the prompt can increase its salience and the likelihood of adherence.
- Role Assignment: Clearly defining the AI's role (as Roo does with modes) is fundamental to guiding its behavior and how it interprets and follows instructions.
- Stating Requirements Explicitly: Clearly and unambiguously list all requirements, rules, keywords, or constraints the model must follow.
- Output Priming (Principle 21): Concluding the prompt with the beginning of the desired output can guide the LLM. While not direct instruction adherence, it shapes the response in a constrained way.
Insights for "TDD Light" / Suggesting Basic Test Cases (from dev.to article & general principles)
- Goal: For a general coding mode, the aim isn't full TDD or complete test generation, but rather to encourage light consideration of testing for new functionality.
- Prompting for Test Ideas/Scenarios: Instead of asking the LLM to write full tests, prompts can guide it to:
- Suggest a few basic test scenarios (e.g., "Identify 2-3 critical test cases for the function X, covering positive, negative, and an edge case.").
- Outline high-level aspects to test (e.g., "What are the main behaviors of component Y that should be verified?").
- Prompting for Test Structure/Scaffolding:
- Ask for a basic test file structure if appropriate for the project context (e.g., "Suggest a Jest test file outline for
newModule.js
, including common imports and describe/it blocks.").
- Ask for a basic test file structure if appropriate for the project context (e.g., "Suggest a Jest test file outline for
- Focus on High-Level Thinking: Prompts should direct the LLM to think about what to test at a high level, rather than getting lost in detailed test implementation logic, which might be better handled by a dedicated QA/Testing mode or the developer.
- Leverage Context: If the project's testing framework, style, or existing test files are known (e.g., through prior
read_file
orsearch_files
actions), this context should be provided when asking for test-related suggestions to ensure relevance.
Insights for Error Handling & Code Generation Reliability (from prompthub.us on Code Gen Errors & general principles)
- Anticipate Common Code Generation Errors: Be aware that LLMs can produce:
- Logical Errors: Misinterpreting requirements, incorrect conditions.
- Incomplete Code: Missing crucial steps or blocks.
- Context Misunderstanding: Failing to grasp the broader context, especially with partial codebase views.
- Syntactic Errors: Issues with language syntax, function calls, etc.
- Garbage/Meaningless Code: Especially with overly long or ambiguous prompts.
- Promote Clarity and Conciseness (Especially for Code Gen):
- Use clear, specific, and concise prompts. Shorter prompts (<50 words in one study) often perform better for code generation, reducing the likelihood of errors or irrelevant output.
- Avoid overly long prompts unless using structured techniques like few-shot prompting with clear examples.
- Iterative Refinement for Errors: The agent's workflow should support:
- Presenting generated code/tool actions to the user.
- Allowing the user to provide feedback on errors (e.g., compiler messages, incorrect behavior).
- The agent then attempting to fix the error based on this feedback.
- Prompting for Code Error Recovery: When an error in generated code is identified, the follow-up prompt to the LLM should ideally include:
- The original requirement or task description.
- The specific code snippet that caused the error.
- The exact error message or a clear description of the incorrect behavior.
- A clear request to analyze the issue, explain the cause, and provide a corrected version of the code.
- Example: "For the task '[original task]', I generated:
This resulted in the error:# ...erroneous code...
TypeError: unsupported operand type(s) for +: 'int' and 'str'
. Please explain why this error occurred and provide the corrected Python code."
- Prompting for Tool Failure Recovery: If a tool call fails, the follow-up prompt should include:
- The intended action and the specific tool call made (including parameters).
- The error message returned by the tool.
- A request for the LLM to analyze the failure and suggest:
- A corrected tool call (if parameters seem incorrect).
- An alternative tool or a different approach to achieve the objective.
- A clarifying question to the user if the error's cause is ambiguous.
- Example: "I tried to use
apply_diff
forfile.py
with [diff content], but it failed with:Error: SEARCH block not found at line X
. Please analyze this. Was the:start_line:
hint likely incorrect, or is there a problem with the SEARCH content? Should I tryread_file
first to confirm the content around line X?"
Insights for General Coding Assistant System Prompts (from GitHub: mustvlad/ChatGPT-System-Prompts)
- Clear Role Definition: Start with a concise definition of the AI's role (e.g., "You are an AI programming assistant.").
- Emphasize Requirement Adherence: Use direct language like "Follow the user's requirements carefully and to the letter."
- Mandate a Detailed Planning Step: Instruct the AI to first think step-by-step and describe its plan in detail (e.g., using pseudocode) before generating any code. This forces a structured approach.
- Example Instruction: "First, think step-by-step and describe your plan for what to build in pseudocode, written out in great detail."
- Specify Output Structure: Clearly define the expected output sequence and format.
- Example Instruction: "Then, output the code in a single code block."
- Request Conciseness: Ask the AI to minimize extraneous prose to keep the output focused on the plan and the code.
- Example Instruction: "Minimize any other prose."
Insights for Generating Textual Diagrams (e.g., Mermaid) (from HackerNoon article & general principles)
- Direct Instruction for Specific Format: Clearly instruct the LLM to generate code in the desired textual diagramming language, most commonly Mermaid. E.g., "Generate the Mermaid code for a flowchart that describes..."
- Provide Clear, Detailed Textual Description: The LLM needs a comprehensive natural language description of the process, system, or structure to be diagrammed. This includes:
- Entities/Nodes/Objects.
- Relationships/Connections/Flows between them.
- Sequence of events (for sequence diagrams or flowcharts).
- Conditions/Decisions (for flowcharts or decision trees).
- Hierarchy (for class diagrams or mind maps).
- Specify Diagram Type: Explicitly state the type of diagram required (e.g.,
flowchart
,sequenceDiagram
,classDiagram
,stateDiagram
,gantt
,mindmap
) as this often corresponds to the root declaration in Mermaid. - Simplicity is Key for LLM Generation: While Mermaid can represent complex diagrams, start by prompting for simpler structures. Complex diagrams might require iterative prompting or manual refinement of the generated code.
- Instruct to Use Correct Syntax: Remind the LLM to adhere to the specific syntax of the diagramming language (e.g., "Ensure the output is valid Mermaid syntax.").
- Example Prompt Structure (for Ask Mode): "Based on your explanation of [concept X], please generate the Mermaid code for a [diagram type, e.g., flowchart] that visually represents [the key steps/components/relationships you just described]."
Insights for Autonomous Agents / Self-Optimizing Prompts (from arXiv:2407.11000 Abstract)
- Concept of Autonomous Prompt Engineering: LLMs (like GPT-4 with a toolbox like APET) can be enabled to autonomously apply prompt engineering techniques to dynamically optimize prompts for specific tasks.
- Key Techniques for Autonomous Optimization by an LLM:
- Expert Prompting: Dynamically assigning or refining a very specific expert persona to itself for the task at hand.
- Chain of Thought (CoT): The LLM internally uses step-by-step reasoning to break down the problem or the prompt optimization task.
- Tree of Thoughts (ToT): The LLM explores multiple reasoning paths or prompt variations and evaluates them to select the best one.
- Implication for Roo Modes: While Roo modes are predefined, the custom instructions within a mode could guide Roo to:
- Internally use CoT or ToT for complex problem-solving or before generating significant code/plans.
- Refine its understanding of its 'expert role' based on the task specifics if the prompt allows for such meta-cognition.
Insights for Prompting AI to Anticipate Follow-Up Questions (from kirenz.github.io & general principles)
- Core Idea: An AI assistant can enhance user experience by proactively identifying and offering to address likely follow-up questions or points of confusion after an initial explanation.
- Self-Reflection Step: Instruct the AI, after providing its main answer, to perform a brief internal review of its explanation to identify areas that:
- Are particularly complex or dense.
- Might lead to common misunderstandings.
- Could benefit from further detail or a different angle of explanation.
- Proactive Offer to Elaborate: Based on self-reflection, prompt the AI to offer further assistance on 1-2 specific, relevant points.
- Example Instruction: "After your main explanation, identify one or two aspects that the user might want more detail on. Offer to elaborate on these specific points, for example: 'Would you like a more detailed explanation of [specific aspect A] or [related concept B]?'"
- Concise Preemptive Clarification (Optional): For very common or critical points of potential confusion related to the topic, the AI could be instructed to include a brief, preemptive clarification within its main answer.
- Example Instruction: "If explaining [topic X], and you know that [common misconception Y] often arises, briefly address and clarify Y proactively."
- Balance with Conciseness: The aim is not to make the initial explanation overly long by preempting everything, but to intelligently offer the most relevant next steps for deeper understanding. The offer to elaborate should be concise.
Insights for Prompting LLM Source Citation (from arXiv:2307.02185v3 & general principles)
- Focus on Actively Retrieved Sources: For an AI assistant like Ask Mode, citation instructions should primarily target sources it explicitly fetches during the current interaction (e.g., via web search or reading a provided document URL). Citing from its vast training data (parametric content) is generally unreliable and hard to verify.
- Conditional Citation Prompt: Instruct the AI to cite when its explanation relies on specific, non-common-knowledge information from a retrieved source.
- Example Instruction: "If your explanation includes specific facts, figures, direct quotes, or detailed information that is not common knowledge and is taken directly from a document or webpage you accessed during this session, YOU MUST clearly indicate the source. For example, state 'According to [Source Name/Website Title], ...' or 'As detailed on [Website Name], ...'."
- Distinguish from General Knowledge: Allow the AI to state when information comes from its general training.
- Example Instruction: "For general concepts or widely known information assimilated during your training, you do not need to provide a specific external citation. You can state that it's based on your general knowledge."
- Handling Synthesized Information: If information is synthesized from multiple retrieved sources for a single point.
- Example Instruction: "If you synthesize a specific point from several retrieved sources, you can mention the primary sources, e.g., 'This understanding is based on information from [Source A] and [Source B].'"
- Format of Citation: Keep it practical for an LLM.
- Example Instruction: "When citing a web source, mention the source name (e.g., 'MDN Web Docs', 'Stack Overflow') or the main title of the article/page if easily identifiable from the browser tool. Full URLs are generally not needed unless specifically relevant or if it's the only identifier."
- Avoid Fabricating Citations (CRITICAL):
- Example Instruction: "YOU MUST NOT invent sources, fabricate URLs, or cite documents you have not accessed during this current interaction. If you cannot confidently attribute specific information to a retrieved source, state that it is from your general knowledge or that you cannot provide a specific source."
- User-Provided Documents: If the user provides a document (e.g., via
@mention
), and the AI quotes or uses specific, non-trivial information from it, it should refer to that document.- Example Instruction: "If you are referencing specific details from a document the user provided (e.g.,
'@user_document.pdf' (see below for file content)
), state that the information is from 'the user-provided document [document name]'."
- Example Instruction: "If you are referencing specific details from a document the user provided (e.g.,
Insights for Structuring Complex Explanations (from promptingguide.ai/techniques & general principles)
- Chain-of-Thought (CoT) for Explanations: Instruct the AI to break down complex topics and explain its reasoning step-by-step. This helps in creating a logical flow.
- Example Instruction: "When explaining [complex topic], adopt a Chain-of-Thought approach. First, outline the key steps or components, then elaborate on each one sequentially, showing how they connect."
- Generate Knowledge First (for very complex topics): Before diving into a full explanation, prompt the AI to first list key facts, definitions, or sub-concepts related to the topic. This generated knowledge can then be used as a scaffold for the main explanation.
- Example Instruction: "To explain [highly complex system X], first generate a list of its 3-5 core components and their primary functions. Then, use this list to structure your detailed explanation of how system X works."
- Hierarchical or Sequential Structure: Guide the AI to build explanations from foundational concepts upwards, or to follow a logical sequence (e.g., chronological, cause-and-effect).
- Example Instruction: "Explain [concept Y] by first defining [foundational term A], then explaining [related concept B] and how it builds on A, and finally detailing [concept Y] itself."
- Synthesize Information Coherently: If the AI has gathered information from multiple sources (e.g., different file sections, web search results), explicitly instruct it to synthesize this information into a single, coherent, and well-structured explanation, rather than just listing facts from each source.
- Example Instruction: "After reviewing [source 1] and [source 2], synthesize the information into a comprehensive explanation of [topic Z], ensuring a logical flow and clear connections between the points from each source."
- Consider Audience and Perspectives (Implicit ToT): Encourage the AI to tailor the explanation's depth and angle to the likely user.
- Example Instruction: "Explain [topic W] in a way that is clear for a beginner but also includes technical details that an experienced developer would find useful. Consider different angles from which this topic can be understood."
- Use Formatting for Readability: Reinforce the use of Markdown for headings, lists, bolding, and code blocks to structure long or complex explanations, as already mentioned in Ask Mode's instructions.
Insights for Explanation Accuracy & Self-Correction (from learnprompting.org & general principles)
- Principle of Self-Checking/Internal Review: While full multi-step self-verification (generating multiple answers and verifying by predicting masked conditions) might be too complex for a single turn in Ask Mode, the underlying principle of self-critique is valuable.
- Prompt for Internal Review: Instruct the AI to perform a brief internal review of its explanation before presenting it. This review should check for:
- Logical Consistency: "Do the points in my explanation flow logically and support each other?"
- Factual Accuracy (within retrieved context): "Does my explanation accurately reflect the information I gathered from provided files or web search? Have I avoided misinterpreting or overstating?"
- Clarity and Unambiguity: "Is my explanation clear? Are there any statements that could be easily misunderstood?"
- Directness: "Does my explanation directly and comprehensively answer the user's question?"
- Example Instruction: "Before finalizing your explanation, take a moment to internally review its clarity, logical consistency, and accuracy based on the information you've gathered. Ensure it directly addresses the user's query."
- Cross-Referencing (if multiple sources used): If the explanation synthesizes information from multiple retrieved documents or web pages.
- Example Instruction: "If you are combining information from several sources (e.g., multiple documents or web pages you've read), briefly cross-check for consistency between them on key points before presenting your synthesized answer."
- Acknowledge Uncertainty or Limitations: If, after internal review and information gathering, the AI is still uncertain about a specific detail or cannot fully answer.
- Example Instruction: "If you are uncertain about any part of your explanation or cannot find definitive information after searching, clearly state this limitation to the user (e.g., 'Based on the available information, X is the case, but details on Y are not specified in the sources I accessed.'). DO NOT invent information to fill gaps."
- Iterative Refinement with User Feedback: Reinforce that user feedback is a form of verification. If the user points out an inaccuracy or lack of clarity, the AI should use that feedback to refine its understanding and explanation. (This is already in AskMode.md but worth noting its connection to verification).
Insights for Debugging via Command Output Analysis (from elastic/sysgrok & general principles)
- Structured Diagnostic Process (Iterative): For complex debugging, guide the AI through a multi-step process:
- Hypothesize & Suggest Commands: Based on the problem description, prompt the AI to hypothesize causes and suggest specific diagnostic commands (
execute_command
). - Execute Commands: (User or AI if permitted and safe).
- Analyze Individual Command Outputs: Instruct the AI to meticulously analyze
stdout
,stderr
, and theexit code
of each command. The analysis should be contextualized to the problem.- Example Instruction: "The command
[command_executed]
produced the following output:\nSTDOUT:\n\n[stdout_content]\n
\nSTDERR:\n\n[stderr_content]\n
\nExit Code:[exit_code]
. Analyze this output in relation to the reported problem: '[problem_description]'. What does this tell you about the potential cause? Does it confirm or refute any hypotheses?"
- Example Instruction: "The command
- Synthesize Summaries: After analyzing outputs from several commands, prompt the AI to synthesize all its individual analyses.
- Example Instruction: "Based on your analysis of the outputs from [command A], [command B], and [command C], what is your current leading hypothesis for the root cause of '[problem_description]'? Explain your reasoning."
- Propose Next Steps: Based on the synthesis, the AI should suggest further diagnostic commands or a potential fix.
- Hypothesize & Suggest Commands: Based on the problem description, prompt the AI to hypothesize causes and suggest specific diagnostic commands (
- Interpreting Exit Codes: Explicitly instruct the AI that an exit code of 0 typically indicates success, while non-zero usually signifies an error. The meaning of specific non-zero codes can be OS/application-dependent.
- Correlating Information: Encourage the AI to correlate command output with other available information, such as log files (
read_file
), application behavior described by the user, or known issues. - Tool for Thought (Internal CoT): Even if not explicitly outputting each step, the AI should internally follow a "Chain of Thought" when analyzing outputs and forming hypotheses.
Insights for Orchestrator Mode: Context Passing & Task Decomposition (from promptingguide.ai/research/llm-agents & general principles)
- Orchestrator's Own Planning (CoT/ToT): The Orchestrator itself should use structured reasoning (like Chain-of-Thought or Tree-of-Thoughts) to decompose the main user request into a logical sequence of subtasks before delegating.
- Example Instruction for Orchestrator: "Given the user's goal '[user goal]', first, develop a comprehensive, step-by-step plan outlining the sequence of subtasks required. For each subtask, identify the most suitable specialized mode for delegation."
- Structuring
new_task
Messages (Context for Sub-modes): When the Orchestrator delegates using thenew_task
tool, themessage
parameter provided to the sub-mode is its primary context. This message MUST be carefully structured to include:- Clear Subtask Objective: A precise statement of what the sub-mode needs to accomplish.
- Essential Inputs/Parameters: Any specific data, file paths, variables, or parameters the sub-mode requires to perform its task.
- Relevant Prior Context: A concise summary of key decisions made by the Orchestrator or critical outputs from previously completed subtasks that directly inform this subtask. (E.g., "The 'DataValidationMode' previously confirmed [data X] is valid. Your task as 'ProcessingMode' is to process [data X] by applying [specific transformation Y].")
- Expected Output/Artifact/Result: Clearly state what the sub-mode is expected to produce or what information should be included in its
attempt_completion
result for the Orchestrator to use. - Brief Role Reminder (Optional but helpful): A quick reminder of the sub-mode's role can help focus its execution (e.g., "As CodeMode, your task is to implement...").
- Example Instruction for Orchestrator: "When creating a
new_task
message
for a sub-mode, ensure it contains: a clear objective for that subtask, all necessary input data or references, a brief summary of any prerequisite outcomes from prior subtasks, and a description of the expected output or result from this subtask."
- Mode Awareness for Delegation: The Orchestrator must be instructed to intelligently select the most appropriate specialized mode for each subtask.
- Example Instruction for Orchestrator: "YOU MUST select the most suitable specialized mode (e.g.,
Code
,Ask
,Debug
,DeepResearch
) for each subtask based on its defined role (roleDefinition
) and intended purpose (whenToUse
). Clearly state which mode you are delegating to when informing the user."
- Example Instruction for Orchestrator: "YOU MUST select the most suitable specialized mode (e.g.,
- Managing Orchestration State & Logging (for complex tasks): For orchestrations involving many steps, the Orchestrator should maintain an understanding of the overall progress.
- Example Instruction for Orchestrator: "For complex projects with multiple subtasks, keep an internal 'mental map' of the overall plan, which subtasks are complete, their key outcomes, and what remains. If the orchestration is very long or involves many steps, consider proposing to the user the creation of an
orchestration_log.md
file to track major delegations, decisions, and results. If the user approves creation of such a log (and it's not a mode config file), you may need to delegate its updates to an appropriate mode likeCodeMode
orAskMode
(for summarizing)."
- Example Instruction for Orchestrator: "For complex projects with multiple subtasks, keep an internal 'mental map' of the overall plan, which subtasks are complete, their key outcomes, and what remains. If the orchestration is very long or involves many steps, consider proposing to the user the creation of an
Insights for QA Tester Mode: AI for Exploratory Testing (from HeadSpin Blog)
- AI as a Collaborative Tool for QA: Generative AI (like ChatGPT) should be viewed as a partner to human testers, offering insights and inspiration, not a replacement. Human evaluation, learning, and iterative exploration remain crucial.
- Generating Exploratory Test Ideas:
- Prompt AI with user stories or functionalities to brainstorm fresh test ideas.
- Example Prompt (from article): "Given a user story about importing attachments via a CSV file, generate exploratory test ideas focusing on functionality and usability."
- Creating Exploratory Test Charters:
- AI can assist in crafting test charters (outlining objectives and focus areas) by suggesting scenarios tailored to quality attributes (security, usability).
- Example Prompt (from article): "Create test charters related to security considerations for a feature that involves importing attachments using Xray Test Case Importer."
- Summarizing Testing Sessions:
- AI can extract and consolidate key observations, defects, and quality assessments from testing notes.
- Example Prompt (from article): "Based on testing notes for a banking app exploration session, summarize the session's results and assess the confidence in its quality aspects."
- Enumerating Risks:
- AI can help brainstorm and analyze potential risks associated with user stories or requirements.
- Example Prompt (from article): "Identify risks associated with a requirement involving importing attachments via CSV files using Xray Test Case Importer."
- Exploring Scenario Variation:
- AI can generate variations of test scenarios to explore different application paths and behaviors under various conditions.
- Example Prompt (from article): "Explore performance testing scenarios for a mobile banking app to evaluate responsiveness under heavy user load."
- Generating Sample Data:
- AI can automate sample data generation (e.g., realistic datasets, user profiles with simulated transactions).
- Example Prompt (from article): "Generate sample financial transactions (deposits, withdrawals, transfers) to test a banking application's functionality."
- Key AI Capabilities for Testing (as per article):
- Automatic Test Case Generation: Analyze code/behavior to generate comprehensive test cases (positive/negative scenarios).
- Defect Detection (Predictive Analytics): Leverage historical data/code analysis to predict potential defects early.
- Test Data Generation: Create diverse test data for varied conditions.
- Test Automation (Script Generation): Describe scenarios in natural language for AI to generate test scripts.
- Challenges of Generative AI in Exploratory Testing (as per article):
- Contextual Understanding: May struggle with full context, purpose, audience relevance.
- Time-Consuming Interactions: Can be lengthy despite intended ease.
- Training Dataset Dependency: Relies heavily on its training data.
- Limitations in Contextual Depth: Input size restrictions can limit rich outputs.
- Concerns: Hallucination of facts, reliance on non-verifiable sources, potential bias.
Insights for QA Tester Mode: AI Prompt Templates for Test Case Generation (from Faqprime)
The Faqprime article "AI Prompts for Test Case Generation" provides a rich set of templates. Key types of test case generation prompts include:
- Positive Scenario Test Case:
- Task: Create a positive scenario test case for a given feature.
- Assumption: User wants to complete a specific action successfully.
- Details: Step-by-step navigation, relevant data input, flawless action execution, expected outcome, preconditions.
- Negative Scenario Test Case:
- Task: Write a negative scenario test case for specified functionality.
- Assumption: User intentionally enters incorrect/invalid data.
- Details: Steps to reproduce, expected error message/failure, post-condition checks.
- Boundary Value Test Case:
- Task: Generate a boundary value test case for a given input field with acceptable limits (e.g., quantity, date range).
- Details: Values for lower/upper bounds, how the system should handle them, expected outcomes.
- Regression Test Case:
- Task: Develop a regression test case to safeguard existing functionality after recent code changes.
- Details: Specific feature/area, preconditions, detailed steps, expected outcomes to ensure no unintended side effects.
- Data Validation Test Case:
- Task: Write a data validation test case for an input form.
- Details: Set of valid/invalid inputs for a field (e.g., email, phone), how the system should validate/reject, corresponding error messages/responses for invalid inputs.
- Usability Test Case:
- Task: Create a usability test case to evaluate user-friendliness of an interface feature.
- Details: Task a typical user should accomplish, detailed steps, potential challenges/confusion, expected ease of use/difficulties.
- Performance Test Case:
- Task: Develop a performance test case to assess system responsiveness under load.
- Details: Test scenario (e.g., concurrent users/transactions), acceptable response time, expected performance metrics.
- Compatibility Test Case (Browser/Device & OS):
- Task: Ensure software functions correctly across different browsers/devices or operating systems.
- Details: Specific combinations to test, features/functionalities to validate, expected results/behavior for each combination.
- Security Test Case:
- Task: Generate a security test case to identify potential vulnerabilities.
- Assumption: Scenario where unauthorized access or data breaches may occur.
- Details: Actions to simulate breach attempt, expected system response/security measures.
- Accessibility Test Case (General & Compliance):
- Task: Evaluate software's compliance with accessibility standards (e.g., for users with disabilities, keyboard navigation, screen reader compatibility).
- Details: Task users with disabilities should perform, assistive technologies used, expected level of accessibility, specific criteria/guidelines.
- Error Handling Test Case:
- Task: Create an error handling test case for a specific feature.
- Assumption: Unexpected error occurs during user interaction.
- Details: Steps leading to error, error message/behavior to be displayed, expected corrective action/resolution.
- Cross-Module Integration Test Case:
- Task: Ensure seamless interaction between different modules/components.
- Details: Modules to test, specific actions, expected data flow and results between modules.
- Data Integrity Test Case:
- Task: Validate accuracy and consistency of data stored in the system.
- Assumption: Data is entered, modified, or deleted.
- Details: Steps to replicate data change, expected state of data after operation, alignment with integrity constraints.
- Load Balancing Test Case:
- Task: Assess ability to distribute incoming traffic/requests across multiple servers/instances.
- Details: Load distribution scenario, number of requests/users, expected distribution pattern and system behavior under different load levels.
- Recovery and Rollback Test Case:
- Task: Evaluate system's ability to recover from failures.
- Assumption: System encounters an unexpected error/interruption.
- Details: Steps taken by system to recover, expected state after recovery or data rollback.
- Localization Test Case (and Internationalization):
- Task: Verify correct adaptation to different languages/regions (e.g., date formats, currency symbols).
- Details: Feature/content to test, languages/regions, steps to change settings and validate translation/display.
- Data Migration Test Case:
- Task: Ensure successful data transfer between systems/formats.
- Details: Source/target systems, data migration process, expected state of data in target system (accuracy, completeness, relationships).
- Concurrency Test Case:
- Task: Evaluate system behavior under simultaneous user interactions.
- Assumption: Multiple users perform actions concurrently.
- Details: Specific steps, expected outcomes, handling of potential conflicts/synchronization issues.
- API Integration Test Case:
- Task: Ensure seamless communication between components via APIs.
- Details: Components interacting, API requests/responses, expected data exchange and system behavior.
- User Permissions Test Case:
- Task: Verify different user roles have appropriate access/restrictions.
- Details: User roles, specific actions/features each role should access, expected behavior for actions beyond permissions.
- Stress Testing Test Case:
- Task: Assess stability/performance under extreme load conditions.
- Details: Stress test scenario, load level (high concurrent users, data volume), actions to perform, expected behavior, system response, performance metrics.
- End-to-End Workflow Test Case:
- Task: Cover a complete user journey involving multiple steps/interactions.
- Details: Sequence of actions, user inputs, system responses, expected outcomes throughout the workflow, covering various scenarios/edge cases.
- Network Disruption Test Case:
- Task: Evaluate behavior during network interruptions.
- Assumption: Network connection is suddenly lost/unstable.
- Details: User actions, system responses, expected behavior, recovery process once network is restored.
- Negative Data Input Test Case (Security Focus):
- Task: Assess how software handles invalid or malicious inputs (e.g., SQL injection, XSS).
- Details: Input fields/areas, examples of negative inputs, expected system response to prevent security risks/data corruption.
- Installation and Setup Test Case:
- Task: Ensure software is installed/configured correctly.
- Details: Installation process (prerequisites, config options), steps for setup/initial configuration, expected outcomes/behavior after successful installation.
- Data Replication and Synchronization Test Case:
- Task: Verify accuracy/consistency of data replication across databases/systems.
- Details: Source/target databases, replication process, expected state of replicated data (synchronized, error-free).
- Performance under Peak Load Test Case:
- Task: Evaluate behavior under peak load conditions (sudden surge in users/transactions).
- Details: Test scenario, load level, steps to monitor performance metrics, assess responsiveness/stability.
Insights for QA Tester Mode: Essential AI Testing Prompts (from OurSky.com blog)
The OurSky.com article "4 Essential AI Testing Prompts to Streamline Your QA Process" offers practical prompt structures for various QA scenarios:
-
Security Advisory for Testing Prompts:
- Confidentiality: Test information can be sensitive (systems, business logic, vulnerabilities).
- Trusted Sources: Share only with secure LLM providers or use self-hosted models.
- Sanitize Data: Remove sensitive data, credentials, internal identifiers.
- Self-Hosting: Consider self-hosted models (e.g., Ollama) for maximum security.
-
1. Post-Sprint Exploratory Testing Prompt:
- Goal: Find bugs on new features/changes after a sprint.
- Prompt Structure: "This is the updates done for the project [Project Name] in the past sprint, please summarize the updates, and list the corresponding test cases to test the updated functions manually. Print it in a document."
- Context/Input: Upload a CSV of issues completed in the sprint (title, description).
- Quick Tips: Provide all updated features as context. Compare results from different LLMs.
-
2. Pre-Release Smoke Testing Prompt:
- Goal: Quick validation of critical functionalities before release.
- Prompt Structure: "You are a product owner going to release a new version of [Project Name]. Generate a set of Smoke Test cases for manual QA to ensure these main flows work without any critical issues. Categorize the test cases and divide into sections. Make a checklist for each section with columns ID, Scenario, Expected behaviour, Pass/Fail and Remarks. Keep it short and clean. Core functionalities: [List your core features here]"
- Context/Input: List of core features.
- Quick Tips: List all core features. Keep the list to ~20 items; don't be too long.
-
3. Complex Feature Testing Prompt:
- Goal: Thorough testing of features with complex business rules, requiring high coverage and depth.
- Prompt Structure: "You are a manual QA tester. Given the user flow and rules of the system, create a set of comprehensive functional test cases to test each step in the user flow, covering positive test scenarios, negative test scenarios, edge cases and additional cases. Categorize the test cases and divide into sections. Each section shows a table format with a meaningful "ID", "Test Case", "Sample Test Data" and "Expected Behaviour" columns. Keep the sentences short and clean. Happy flow: [List your user flow steps] Rules: [List your business rules]"
- Context/Input: User flow steps ("happy flow"), business rules.
- Quick Tips: Include a complete happy flow, starting from the feature's entry point (not system login). List all validation requirements and business rules. Break down large features into smaller, focused test suites. Try multiple prompt attempts and combine the best test cases.
-
4. Post-Bug-Fix Regression Testing Prompt:
- Goal: Ensure bug fixes don't break related functionalities.
- Prompt Structure: "You are a manual QA tester. Below is a bug reported. The bug is already fixed by developer. Please suggest a list of regression test cases to check if any related area is affected after the fix. Return your results in a checklist table with columns "ID", "Aspect", "Description", "Expected Result", "Priority". Original bug report: [Include full bug report with environment details and steps to reproduce]"
- Context/Input: Complete original bug report (title, environment details, testing platforms, description, steps to reproduce, expected behaviors).
- Quick Tips: Include the complete bug report. Focus on issues one by one; don't retest multiple issues at a time. (Article mentions using IssueSnap for clear bug reports).
-
Common Pitfalls in Prompting for Test Cases:
- Too Vague: E.g., "Test the login feature" vs. specifying requirements like password rules, error handling, session management.
- Missing Context: Always include relevant business rules and technical constraints.
- Unclear Priorities: Specify which scenarios are most critical.
- Incomplete Information: Include all necessary details about the feature or bug.
- Key Takeaway: The quality of AI-generated test cases depends directly on the quality (detail, specificity) of the prompts.
Insights for QA Tester Mode: AI Memory Bank and Context Recall (Conceptual from Enlighter.ai)
The Enlighter.ai overview of "Memory Bank" for the Cursor IDE provides conceptual validation for using structured documentation as a persistent memory for AI assistants.
- Concept of Memory Bank:
- A "structured documentation system" designed to help AI coding assistants "maintain context... across sessions" and "remember project details over time."
- This aims to "eliminate repetitive explanations and maintain continuity."
- Implications for QA Tester Mode's Memory Usage:
- Consulting Memory: The QA mode should be prompted to regularly consult designated "memory bank" files (e.g.,
project_brief.md
,product_context.md
,qa_memory_log.md
) at the start of testing tasks or when needing to recall established project details, past test strategies, known critical bugs, or specific product behaviors.- Example Prompt Snippet: "Before designing test cases for [feature X], YOU MUST first consult
product_context.md
andqa_memory_log.md
for any existing information relevant to this feature or related components."
- Example Prompt Snippet: "Before designing test cases for [feature X], YOU MUST first consult
- Updating Memory (Potentially): If the QA mode uncovers critical, persistent information (e.g., a newly established baseline behavior, a recurring tricky test environment setup, a fundamental misunderstanding of a feature that's now clarified), it could be prompted to suggest an update or append a note to a designated memory file. This would require careful file write permissions.
- Example Prompt Snippet (if updating is allowed): "During your testing of [feature Y], you discovered [critical new insight Z]. Summarize this insight and, if it represents a stable, important piece of context for future testing of Y, propose an entry to append to
qa_memory_log.md
."
- Example Prompt Snippet (if updating is allowed): "During your testing of [feature Y], you discovered [critical new insight Z]. Summarize this insight and, if it represents a stable, important piece of context for future testing of Y, propose an entry to append to
- Structured Information: The "memory bank" files should ideally be structured (e.g., using markdown with clear headings) to help the AI parse and retrieve information more effectively. The QA mode could be prompted to search for specific sections or keywords within these files.
- Consulting Memory: The QA mode should be prompted to regularly consult designated "memory bank" files (e.g.,
Insights for QA Tester Mode: AI for Intelligent Regression and Sanity Testing (from Amzur.com)
The Amzur.com article on "Impact Of AI In Regression Testing" provides insights into how AI can enhance regression and, by extension, sanity testing.
- Intelligent Test Selection for Regression Testing:
- Concept: AI can analyze historical test data, application code changes, and bug reports to identify critical areas needing regression testing and prioritize test cases based on risk or severity.
- Prompting Strategy:
- Provide the AI with context about recent changes (e.g., "The following modules/files were changed: [list changes/diff summary]. The following bugs were recently fixed: [list bug IDs/summaries].").
- If a "memory bank" of past test results or bug patterns exists, instruct the AI to consider it.
- Ask the AI to: "Based on the recent changes and historical data (if available), identify the top 5-10 most critical areas/features that require regression testing. For each area, suggest 2-3 specific test cases (or types of test cases) that should be executed to ensure no new defects were introduced."
- "Prioritize the suggested regression tests based on potential risk and impact of failure."
- Defect Prediction to Guide Testing:
- Concept: AI can use historical data to predict defect-prone areas.
- Prompting Strategy:
- If historical defect data is available (e.g., in a structured
bug_reports.md
or a QA memory file), prompt the AI: "Analyze the historical bug reports inbug_reports.md
. Are there any patterns or modules that appear to be consistently defect-prone? If so, suggest additional exploratory or regression tests focusing on these areas for the current release/feature."
- If historical defect data is available (e.g., in a structured
- Sanity Testing (as a focused subset of Regression):
- Concept: Sanity tests are quick checks on the most critical functionalities related to a recent small change, performed before a full regression suite.
- Prompting Strategy:
- Provide context about a specific recent change/fix: "A bug related to [specific function, e.g., user login password reset] was just fixed in [file/module X]."
- Ask the AI: "Suggest 3-5 critical sanity test cases to quickly verify that [specific function] is now working as expected and that the immediate surrounding critical functionalities (e.g., basic login, new user registration if related) have not been broken by this fix. These tests should be executable quickly."
- Self-Healing Concepts (Adapting Test Scripts - for advanced scenarios):
- Concept: AI can identify changes in UI or application behavior and suggest updates to existing automated test scripts.
- Prompting Strategy (if test scripts are accessible via
read_file
):- Provide the AI with an existing test script and context about a UI change: "The login button's ID changed from 'loginBtn' to 'submitLogin'. Review the following Playwright test script snippet for the login page. Identify where it needs to be updated to reflect this change and provide the corrected snippet."
- This is more about guided maintenance than fully autonomous self-healing but leverages the AI's code understanding.
Insights for QA Tester Mode: Effective AI Communication & Collaboration (from QESTIT blog)
The QESTIT article "Mastering AI Communication - A Guide to Effective Prompting" offers general principles for AI prompting that are highly relevant for QA collaboration and communication:
-
Crafting Clear and Specific Prompts (for AI's own understanding and for its outputs):
- Principle: Vague prompts lead to generic AI responses. Precise, well-structured prompts with detailed context enable AI to provide highly relevant insights and actionable answers.
- Application for QA Mode Communication:
- When the QA mode needs to ask the user or a dev mode for clarification, it should be prompted to formulate clear, specific questions, providing necessary context.
- When the QA mode reports bugs or test summaries, it should be prompted to present this information with clarity, specificity, and sufficient context for the recipient to understand.
- Example (General vs. Optimized Query by AI):
- If AI needs to ask about a feature: (Vague) "Tell me about the login feature." (Optimized) "Regarding the login feature, please clarify the expected behavior when a user enters an incorrect password three times consecutively. Specifically, should the account be locked, and if so, for how long?"
-
Breaking Down Complex Tasks/Queries:
- Principle: For complex requests to an AI, breaking them down into smaller steps yields better outcomes.
- Application for QA Mode Communication:
- If the QA mode needs to convey a complex test scenario or a multi-faceted bug, it should be prompted to break down its explanation or report into smaller, logical parts.
- If asking a complex question, it should consider a step-by-step inquiry.
- Example (QA Mode reporting a complex bug): Instead of one large paragraph, prompt the QA mode to report: "1. Observed behavior. 2. Steps to reproduce. 3. Expected behavior. 4. Environment details. 5. Potential impact."
-
Refining AI Responses/Requests with Feedback Loops (Internal & External):
- Principle: Iteratively evaluate AI responses and refine prompts. If an AI's output (e.g., a generated test case, a bug report draft) is insufficient, provide precise feedback to improve it.
- Application for QA Mode Communication:
- The QA mode itself can be prompted to review its own generated bug reports or test summaries for clarity and completeness before presenting them.
- If the user/dev provides feedback that a bug report is unclear, the QA mode should be able to take that feedback to refine and resubmit.
- Example (QA Mode self-correction of a bug report draft): "Review the bug report you just drafted. Is the title descriptive? Are the reproduction steps unambiguous? Is the expected result clearly contrasted with the actual result? Refine if necessary."
-
Generating Alternative AI Responses/Queries/Suggestions:
- Principle: Exploring multiple variations can lead to better solutions or communication strategies.
- Application for QA Mode Communication:
- If the QA mode is unsure how to best phrase a question to a developer about a technical ambiguity, it could be prompted to generate 2-3 alternative phrasings of its question.
- When suggesting test strategies, it could propose alternatives.
- Example (QA Mode seeking clarification): "I need to understand if [ambiguous condition X] is expected. Here are two ways I could ask the developer: 1. 'Is X the correct behavior when Y occurs?' 2. 'Could you clarify if condition X is by design or a potential issue when Y is true?' Which is clearer?"
Insights for General AI Agent System Prompts (from PromptingGuide.ai on LLM Agents)
The PromptingGuide.ai article on "LLM Agents" outlines core components and considerations for building LLM-based agents, which can inform the system prompts for Roo modes.
-
Core LLM Agent Framework Components:
- Agent/Brain (LLM as main controller):
- The system prompt is crucial for defining how the agent operates.
- Persona/Role Definition: The prompt should clearly define the agent's role, capabilities, and potentially personality. This can be handcrafted. (Aligns with Roo's
roleDefinition
).
- Planning Module (Task Decomposition):
- The agent needs to break down complex user requests into manageable steps or subtasks.
- Prompting Techniques for Planning: Instruct the agent to use Chain of Thought (CoT) for single-path reasoning or implicitly guide towards Tree of Thoughts (ToT) for exploring multiple approaches if applicable.
- Planning with Feedback (Reflection & Refinement): For complex or long-running tasks, the agent should be prompted to:
- Reflect on past actions and observations.
- Iteratively refine its plan or correct mistakes.
- (Conceptually similar to ReAct: Thought -> Action -> Observation loop).
- Memory Module (Short-term & Long-term):
- Short-Term Memory: Primarily managed by the LLM's context window (in-context learning). The prompt should guide what immediate past interactions are most relevant.
- Long-Term Memory: Achieved by prompting the agent to interact with external storage (e.g., vector databases, or for Roo, specific "memory bank" files like
project_context.md
,qa_memory_log.md
). The prompt should instruct when and how to consult or update this long-term memory. - Memory Formats: Information in memory files should ideally be structured (e.g., natural language with clear headings, structured lists) for easier retrieval and understanding by the LLM.
- Tools Module (Interaction with External Environment):
- The system prompt must clearly define the available tools, their syntax, and when/how to use them. (Aligns with Roo's tool descriptions provided in its system prompt).
- Agent/Brain (LLM as main controller):
-
Key Challenges & Prompting Considerations for LLM Agents:
- Role-Playing Capability: Ensure the persona/role defined in the prompt is clear and detailed enough for the LLM to adopt effectively, especially for specialized roles like a QA Tester.
- Long-Term Planning & Context Length:
- For tasks requiring many steps, prompt the agent to summarize progress and key decisions periodically to manage context.
- Encourage breaking down very long tasks into phases, potentially using
new_task
if the mode has orchestrator-like capabilities for sub-problems.
- Prompt Robustness & Reliability:
- System prompts for agents are complex. Emphasize critical instructions (as discussed in other sections).
- Iteratively test and refine the overall system prompt based on agent performance.
- Hallucination & Knowledge Boundary:
- Prompt the agent to clearly distinguish between information derived from provided context/tools versus its general training knowledge.
- Instruct it to state uncertainty or inability to answer if information is not found in provided sources, rather than hallucinating. (Aligns with "Avoid Fabricating Citations" for Ask Mode).
- Efficiency: While not directly a prompting issue, be mindful that complex agentic behavior involving many LLM calls (for planning, tool use, reflection) can impact speed and cost. Prompts should aim for clarity to achieve goals with reasonable efficiency.
-
Application to Roo Mode System Prompts:
- The
roleDefinition
should be strong and clear. customInstructions
should guide the mode's planning process (e.g., "First, create a plan...").- Instructions should specify how and when to use "memory bank" files (e.g., "Before X, read file Y for context Z.").
- Tool usage instructions should be clear, and the mode should be prompted to explain its intended tool use.
- Encourage self-reflection or review of its own outputs (e.g., "Before submitting the bug report, review it for clarity and completeness.").
- The
Insights for Enhanced Planning Mode: Analyzing Failure Context (CARE Framework from NN/g)
The Nielsen Norman Group's "CARE: Structure for Crafting AI Prompts" article provides a useful framework for structuring prompts, which can be adapted for how the Enhanced Planning Mode analyzes failure contexts provided by other modes.
-
CARE Framework:
- Context: Describe the situation.
- Ask: Request specific action.
- Rules: Provide constraints.
- Examples: Demonstrate what you want (good or bad).
-
Applying CARE to Failure Context Analysis by Enhanced Planning Mode: When Enhanced Planning Mode is invoked due to a failure in another mode, it needs to ingest and understand the failure context. It can be prompted to internally (or as part of its documented analysis) use a CARE-like structure to process this information:
- C - Context of the Failure:
- "What was the original task or goal the failing mode was trying to achieve?"
- "What was the state of the system/environment when the failure occurred (e.g., specific files being edited, commands run, user inputs)?"
- "What information or inputs was the failing mode working with?"
- "What was the sequence of actions taken by the failing mode leading up to the failure?"
- (This information should ideally be passed from the failing mode or be reconstructable from logs/history).
- A - Ask (Analyze the Failure):
- "What is the specific error message or observed incorrect behavior?"
- "What are the key pieces of information to extract from the error message, logs, or failure description?"
- "Based on the context and error, what are 1-3 potential root causes for this failure?" (Initial hypothesis generation).
- "What additional information might be needed to confirm a root cause?"
- R - Rules & Constraints:
- "What are the known constraints or limitations of the tools or processes involved?"
- "Are there any project-specific rules or best practices that might have been violated or are relevant to this failure?"
- "What are the boundaries of the failing mode's capabilities that might have been exceeded?"
- E - Examples (if available in memory/history):
- "Has a similar failure or error pattern been observed before in this project (check memory bank/
project_context.md
)?" - "If so, what were the previous resolutions or successful workarounds?"
- "Are there examples of successful execution of similar tasks that can be contrasted with this failure?"
- "Has a similar failure or error pattern been observed before in this project (check memory bank/
- C - Context of the Failure:
-
Prompting Enhanced Planning Mode for Failure Analysis:
- The system prompt for Enhanced Planning Mode can instruct it: "When analyzing a failure from another mode, first systematically gather and document the Context of the failure (original goal, steps taken, environment). Then, Ask critical questions to diagnose the error (specific error, information to extract, potential causes). Consider relevant Rules and constraints. Finally, check for Examples of similar past failures or successes in the project's memory bank if available. Use this structured analysis to inform your subsequent Chain-of-Thought and Tree-of-Thought planning for a new solution."
- This structured approach helps ensure a thorough understanding of the failure before attempting to re-plan.
Insights for Enhanced Planning Mode: Optimizing modelcontextprotocol/sequentialthinking
Usage (Iterative Refinement from Latitude Blog)
The Latitude blog "Iterative Prompt Refinement: Step-by-Step Guide" provides a general framework for improving AI outputs that can be applied to how the Enhanced Planning Mode interacts with and guides the modelcontextprotocol/sequentialthinking
(SQ) MCP tool for its own complex planning tasks. The Enhanced Planning Mode should essentially perform iterative prompt refinement on its inputs to the SQ tool.
-
Core Iterative Refinement Process for SQ Tool Usage:
- Formulate Initial Thought/Plan (Input to SQ): The Enhanced Planning Mode (EPM) starts by creating a clear, focused initial
thought
for the SQ tool, setting specific expectations for the planning sub-task. It can assign a "role" or perspective to its thought if helpful. - Execute SQ Tool & Assess Output: EPM calls the SQ tool and then methodically evaluates the generated thought sequence from SQ. Key aspects to assess:
- Accuracy: Is the reasoning sound?
- Relevance: Does the thought sequence align with the EPM's current planning objective?
- Format/Structure: Is the SQ output structured in a way that's useful for EPM's overall plan?
- Completeness: Does the SQ output cover all necessary aspects, or are there gaps?
- Adjust Next Thought/Input to SQ (Refinement): Based on the assessment, EPM refines its next input to the SQ tool. This refinement can involve:
- Modifying the
thought
string: Making it more specific, adding constraints (e.g., "Focus only on X," "Consider Y before Z"), providing examples of desired reasoning steps, clarifying ambiguous terms. - Adjusting SQ parameters:
- If SQ needs more steps: Increase
totalThoughts
. - If a previous SQ thought was flawed: Set
isRevision
to true and specifyrevisesThought
for the SQ tool to correct its path. - If an alternative path needs exploration: Use
branchFromThought
and abranchId
. - If more thinking is generally needed: Ensure
nextThoughtNeeded
remains true.
- If SQ needs more steps: Increase
- Providing feedback implicitly: The structure of the EPM's next
thought
to SQ inherently contains feedback on the previous SQ output.
- Modifying the
- Test and Repeat: EPM continues this cycle of calling SQ, assessing, and refining its input to SQ until a satisfactory detailed plan or analysis is achieved from the SQ tool. EPM should document its own iterative process if the interaction with SQ is complex.
- Formulate Initial Thought/Plan (Input to SQ): The Enhanced Planning Mode (EPM) starts by creating a clear, focused initial
-
Applying Advanced Prompting Techniques to SQ Inputs:
- Chain-of-Thought (CoT) for EPM's input to SQ: When EPM formulates its
thought
for the SQ tool, especially for complex planning sub-problems, it should internally use CoT to structure thatthought
clearly, guiding SQ more effectively. - Few-Shot Learning for EPM's input to SQ: If EPM wants SQ to follow a very specific reasoning pattern or output structure for its thoughts, EPM could include 1-2 examples of desired "mini-thoughts" within its main
thought
string provided to the SQ tool.
- Chain-of-Thought (CoT) for EPM's input to SQ: When EPM formulates its
-
Best Practices for EPM's Interaction with SQ:
- Clarity and Structure in
thought
inputs: EPM should use straightforward language when crafting thethought
for SQ. - Systematic Testing of SQ sequences: EPM should evaluate each SQ-generated thought sequence against its planning goals.
- Avoid Over-Refining: If SQ is producing diminishing returns despite EPM's refined inputs, EPM should recognize this and perhaps finalize its plan with the best available SQ output or switch to a different analysis technique (e.g., direct research).
- Clarity and Structure in
-
This iterative process allows the Enhanced Planning Mode to dynamically guide the
sequentialthinking
tool, treating it as a powerful but steerable reasoning engine to build out detailed components of its overall plan.
Insights for Enhanced Planning Mode: Outputting Actionable Plans (CREATE Method from OnStrategyHQ)
The OnStrategyHQ article "An AI Prompt Guide to CREATE a Great Strategic Plan" introduces the CREATE method for prompt engineering, which offers valuable principles for generating clear, structured, and actionable plans. While aimed at strategic plans, the core ideas apply to technical implementation plans produced by the Enhanced Planning Mode.
-
CREATE Method for Prompting:
- Character: Define the AI's role (e.g., "You are a meticulous technical planner...").
- Request: Clearly state what plan is needed (e.g., "Create a step-by-step implementation plan for [feature X]...").
- Examples: If available, provide examples of similar successful plans or desired plan structures.
- Adjustments: Iteratively refine the plan based on initial AI output (this is part of the EPM's internal process).
- Tell (Format): Crucially, specify the desired output format for the plan.
- Application for EPM: The EPM's internal prompt to generate the final
currentTask.md
should explicitly state the required markdown structure (headings, sub-headings, checklists for steps, sections for dependencies, success criteria, as already defined inEnhancedPlanningMode.md
). - Example for EPM's internal prompt: "Generate the final implementation plan for
currentTask.md
using the following structure: H1 for Task Name, H2 for Overview, H2 for Selected Approach, H2 for Implementation Plan (using nested checklists for steps and sub-tasks), H2 for Dependencies (bullet list), H2 for Success Criteria (bullet list), H2 for Notes."
- Application for EPM: The EPM's internal prompt to generate the final
- Extras (Context & Data): Provide all relevant information and context needed to create the plan.
- Application for EPM: The EPM should feed all its analyzed data (from CoT, ToT, failure context, research) into its final plan generation step.
- Pro Tip from Article (Self-Prompting for Info): If EPM is unsure it has all necessary details before generating a complex plan section, it can internally prompt itself (or use
ask_followup_question
to the user): "What additional information or clarifications are needed to create a comprehensive plan for [specific sub-task/component]?"
-
Key Principles for Actionable Plan Output:
- Structured Output is Essential: The "Tell" aspect of CREATE emphasizes defining the output format. The EPM's existing plan structure for
currentTask.md
(Overview, Selected Approach, Implementation Plan with checklists, Dependencies, Success Criteria) is well-aligned with this. - Clarity in Steps: Each step in the implementation plan should be clear, concise, and actionable. Sub-tasks help break down complexity.
- Explicit Dependencies: Clearly listing dependencies helps in sequencing and identifying potential blockers.
- Measurable Success Criteria: Defining how to validate the successful completion of the plan and its components is vital for actionability.
- AI Output as a Draft: The article stresses that AI-generated plans are a starting point or a first draft. The EPM should present its plan as such, expecting review and potential refinement by the user or the next executing mode.
- EPM Communication: "Here is the detailed implementation plan based on my analysis. Please review, especially the proposed steps and success criteria. Let me know if any adjustments are needed before proceeding."
- Structured Output is Essential: The "Tell" aspect of CREATE emphasizes defining the output format. The EPM's existing plan structure for
-
Integrating with EPM's Workflow:
- The EPM's internal CoT and ToT analyses serve as the "Character" (itself as a planner), "Request" (the planning task), "Examples" (its own structured thought processes), and "Extras" (gathered information).
- The "Tell" component is applied when it formats the final output into
currentTask.md
. - The "Adjustments" happen iteratively as it builds its internal CoT/ToT and refines its understanding before committing to the final documented plan.
Insights for Enhanced Planning Mode: Deciding When to Trigger Deep Research (from AI for Education)
The AI for Education article "Prompting Techniques for Specialized LLMs," particularly its section on "Deep Research Agents," provides valuable insights for how the Enhanced Planning Mode (EPM) can determine when to initiate its own deep research phase or suggest a switch to the dedicated Deep Research Mode.
-
Characteristics of Tasks Requiring Deep Research:
- Demand depth, accuracy, and consideration of multiple (often external) perspectives.
- Involve iterative searching, analyzing, synthesizing, and refining information.
- Often require the use of external tools like web search or document retrieval to gather current and accurate information.
-
Triggers for EPM to Initiate Its Own "Deep Research" Step (or suggest switching to Deep Research Mode): EPM should consider initiating research when its internal planning (CoT, ToT, analysis of failure context) reveals:
- Significant Knowledge Gaps: The current understanding (from provided context, memory bank, or initial analysis) is insufficient to confidently select or detail a solution path.
- Prompt for EPM (internal or to user): "My initial analysis of [problem X] indicates a lack of information regarding [specific area Y]. To create a robust plan, further research into Y is needed. Shall I proceed with a targeted web search for [Y-related keywords]?"
- Lack of Evidence for Solution Paths: Multiple potential solutions identified via ToT lack strong evidence, best practice validation, or clear pros/cons without external input.
- Prompt for EPM: "I have identified approaches A, B, and C. However, to determine the most suitable approach for [project context], research is needed on the comparative performance/reliability of these options in similar scenarios."
- Unresolved Complex Challenges: Potential challenges identified during CoT require exploring known solutions, workarounds, or best practices not readily available.
- Prompt for EPM: "A key challenge is [challenge Z]. I need to research common solutions or mitigation strategies for Z before finalizing the plan."
- New Technologies or External Dependencies: The task involves unfamiliar technologies, APIs, or complex external systems where internal knowledge is limited.
- Prompt for EPM: "The plan involves integrating with [new API/technology W]. I need to research its documentation, best practices for integration, and potential pitfalls."
- Need for External Benchmarking or Standards: The solution needs to align with industry standards, or compare against external benchmarks, requiring information beyond the project's immediate context.
- Prompt for EPM: "To ensure the proposed solution for [feature Q] meets industry best practices, I need to research current standards for [relevant area P]."
- Significant Knowledge Gaps: The current understanding (from provided context, memory bank, or initial analysis) is insufficient to confidently select or detail a solution path.
-
Prompting for the Research Itself (when EPM does its own research):
- Be Specific: Define the scope of the research, preferred source types (e.g., "technical documentation," "peer-reviewed articles," "official community forums"), and the desired output format of the research summary (e.g., "a list of pros and cons," "a summary of best practices," "a comparison table").
- Use Action Cues: Guide the research with prompts like: "Compare [tool A] and [tool B] for [task C]," "Summarize the top 3 best practices for implementing [pattern D]," "Find known issues and workarounds for [library E]."
-
Decision Point for EPM:
- After its initial CoT/ToT analysis, EPM should have an internal checkpoint: "Is the current information sufficient to create a high-confidence, actionable plan?"
- If NO, and the reason aligns with the triggers above, it should proceed to its "Deep Research" step as outlined in its workflow, or if the research scope is very broad or requires extensive exploration, it might suggest to the user: "The complexity of this planning task requires extensive external research. Would you like to switch to Deep Research Mode for a more thorough investigation before I finalize the plan?"
Insights for Deep Thinker Mode: Promoting Objective Analysis & Avoiding Cognitive Biases/Overthinking (from Textio Blog & General Principles)
The Textio blog "Mindful AI: Crafting prompts to mitigate the bias in generative AI" highlights how prompt design can inadvertently introduce or, conversely, help mitigate bias in AI outputs. These principles are crucial for the Deep Thinker mode to ensure its analysis is objective and to help it recognize when it might be "overthinking" due to biased assumptions rather than factual analysis.
-
Awareness of Prompt-Induced Bias:
- Subtle Prompt Differences, Significant Bias: Even minor, seemingly neutral details in a prompt (or an internal "thought" guiding the
sequentialthinking
tool) can lead the LLM to generate outputs reflecting stereotypes or biases present in its training data. - Deep Thinker Implication: The mode must be cautious that its own iterative thoughts (inputs to the
sequentialthinking
tool) do not introduce or amplify biases. It should strive for neutrality in how it frames its analytical steps.
- Subtle Prompt Differences, Significant Bias: Even minor, seemingly neutral details in a prompt (or an internal "thought" guiding the
-
Strategies for More Objective Analytical Thinking (Prompting Deep Thinker):
- Focus on Facts and Observable Behaviors:
- Instruct the Deep Thinker to ground its analysis primarily in the provided factual context, observable data, and system behaviors, rather than making assumptions about intent, personality, or unstated factors unless explicitly asked to explore hypotheses.
- Example Prompt Snippet for Deep Thinker's internal guidance: "When analyzing [situation X], prioritize the documented facts and observed behaviors. Clearly distinguish between direct observations and inferred interpretations."
- Explicitly Question Assumptions:
- A core part of "deep thinking" should be to identify and question underlying assumptions—both its own and those potentially present in the input material.
- Example for
sequentialthinking
tool usage: A recurring type ofthought
could be: "What assumptions am I making in the previous thought? Are these assumptions validated by evidence, or are they potentially introducing bias?"
- Avoid Irrelevant Contextual Details in Internal "Prompts":
- Just as Textio advises leaving out irrelevant details like alma mater from performance review prompts, the Deep Thinker should be guided not to fixate on or introduce irrelevant contextual details into its analytical threads if they risk triggering biased associations in the LLM.
- Structured Analysis of Different Facets (as already in
DeepThinkerMode.md
):- The existing instruction to analyze "Current Situation (facts)," "Meaning & Interpretation," "Impact & Consequences," "Goals," "Potential Actions/Perspectives," and "Assumptions & Biases" is a good structure. The key is to ensure the "Assumptions & Biases" check is robust.
- Focus on Facts and Observable Behaviors:
-
Recognizing and Managing "Overthinking" (Connecting to Bias and Assumptions):
- The instruction for Deep Thinker to "Recognize Diminishing Returns" and avoid overthinking is crucial. This can be linked to bias detection.
- Prompting for Self-Correction when Overthinking:
- If a line of thought becomes circular or excessively speculative, the Deep Thinker should be prompted to ask itself: "Is this continued line of thought based on new factual input, or am I extrapolating too far based on initial assumptions or potential biases? Is there a risk of [specific cognitive bias, e.g., confirmation bias] here?"
- "If further thought on sub-point X requires making significant unvalidated assumptions, I should pause this line and either seek clarifying input (if appropriate for Deep Thinker's role) or shift focus to another aspect that is better grounded in available information."
- This makes the "avoid overthinking" instruction more actionable by linking it to a check for ungrounded speculation or bias.
-
Synthesizing Diverse Perspectives (as a Bias Mitigation Strategy):
- When analyzing a complex topic, if the Deep Thinker is prompted to explore multiple (potentially conflicting) perspectives or interpretations (as per its "Meaning & Interpretation" and "Potential Actions/Perspectives" analysis areas), this can itself be a strategy to mitigate the impact of a single biased viewpoint emerging from the LLM.
Insights for Code Reviewer Mode: Comprehensive Checklist Items (from Swimm.io)
The Swimm.io "Ultimate 10-Step Code Review Checklist" provides a detailed list of aspects to cover during a code review, which can be incorporated into the Code Reviewer mode's analysis process.
-
1. Functionality:
- Does the code implement the intended functionality?
- Are all requirements met?
- Are edge cases and potential error scenarios handled appropriately?
- Is the code’s behavior consistent with the project’s specifications?
-
2. Readability and Maintainability:
- Is the code well-organized and easy to read?
- Are naming conventions consistent and descriptive?
- Is the code properly indented and formatted?
- Are comments used appropriately to explain complex or non-obvious code segments?
-
3. Code Structure and Design:
- Does the code follow established design patterns and architectural guidelines?
- Is the code modular and maintainable?
- Are functions and classes of reasonable size and complexity?
- Does the code adhere to the principles of separation of concerns and single responsibility?
-
4. Performance and Efficiency:
- Are there any potential performance bottlenecks or inefficiencies (e.g., unnecessary loops, suboptimal algorithms)?
- Is memory usage optimized (e.g., avoiding memory leaks)?
- Are algorithms and data structures appropriate and efficient?
- Are there any opportunities for caching or parallelization?
-
5. Error Handling and Logging:
- Does the code include proper error handling mechanisms?
- Are exceptions used appropriately and caught at the correct level?
- Is logging implemented for debugging and troubleshooting purposes?
- Are error messages clear, descriptive, and actionable?
-
6. Security:
- Does the code follow secure coding practices?
- Are there any potential security vulnerabilities (e.g., SQL injections, cross-site scripting, improper access controls)?
- Is user input validated and sanitized properly?
- Are authentication and authorization mechanisms implemented correctly?
-
7. Test Coverage:
- Does the code include appropriate unit tests or integration tests?
- Is the test coverage sufficient for the critical functionality and edge cases?
- Are the tests passing and up-to-date?
- Is the test code well-structured, readable, and maintainable?
-
8. Code Reuse and Dependencies:
- Is the code properly reusing existing libraries, frameworks, or components?
- Are dependencies managed correctly and up-to-date?
- Are any unnecessary dependencies or duplicate code segments removed?
- Are dependencies secure, actively maintained, and of sufficient quality?
-
9. Compliance with Coding Standards:
- Does the code comply with company or project-specific coding standards and guidelines (e.g., from a memory bank file like
.clinerules
orcoding_standards.md
)? - (AI Note: While AI can't use linters, it should check for patterns typically enforced by them if standards are known).
- Does the code comply with company or project-specific coding standards and guidelines (e.g., from a memory bank file like
-
10. Documentation:
- Are inline comments used effectively to explain complex or non-obvious code segments?
- Do functions, methods, and classes have descriptive comments or docstrings?
- Is there high-level documentation for complex modules or components?
- Is documentation regularly updated (or does the code change necessitate doc updates)?
-
Additional Review Tips (Adaptable for AI):
- Focus on Critical Sections First: Prioritize review of new features, complex logic, or sensitive data handling.
- Automated Checks as Baseline: Assume basic linting/styling is handled; focus AI review on deeper logic, design, security, and maintainability aspects.
- Overall Design Fit: Check if new code aligns with the project’s architectural goals and existing design patterns (requires context from memory bank or broader codebase understanding).
Insights for Code Reviewer Mode: Detailed Checklists & Best Practices (from GetDX.com)
The GetDX.com article "Enhance your code quality with our guide to code review checklists" offers specific checklists and process best practices for code reviews.
-
General Code Review Checklist Items:
- Code Quality & Maintainability:
- Adherence to project standards.
- Follows modular programming principles.
- Uses meaningful variable, function, and class names for readability.
- Technical documentation is clear and up-to-date.
- Error Handling & Logging:
- Uses try-catch blocks appropriately.
- Logs errors without exposing sensitive information.
- Testing & Validation:
- Unit tests cover core functionality and edge cases.
- Automated tests are integrated into CI/CD pipelines.
- Code Quality & Maintainability:
-
Security Code Review Checklist Items:
- Input Validation & Injection Prevention:
- Validates all input for type, length, and format.
- Protects against SQL Injection & XSS.
- Authentication & Authorization:
- Uses secure password storage (e.g., bcrypt, Argon2).
- Implements least privilege access principles.
- Sensitive Data Handling:
- Encrypts sensitive data at rest and in transit.
- Avoids logging or exposing sensitive data in URLs.
- Dependency Management:
- Ensures third-party libraries are updated and checked for vulnerabilities.
- Input Validation & Injection Prevention:
-
Performance Code Review Checklist Items:
- Algorithm & Query Efficiency:
- Code is optimized for time and space complexity.
- SQL queries are indexed and avoid N+1 problems.
- Memory & Resource Management:
- Efficient memory allocation, preventing leaks.
- Avoids excessive object retention and redundant processing.
- Concurrency & Caching:
- Uses threading & concurrency efficiently.
- Implements caching for frequently accessed resources.
- Algorithm & Query Efficiency:
-
API & Integration Code Review Checklist Items:
- API Design & Usage:
- Follows RESTful or GraphQL best practices.
- API responses use appropriate HTTP status codes.
- Security & Authentication:
- Uses OAuth, API keys, or JWT for authentication.
- API requests are properly validated.
- Error Handling & Performance:
- API provides meaningful error responses.
- Minimizes and compresses payload sizes.
- API Design & Usage:
-
Frontend Code Review Checklist Items:
- Code Structure & Maintainability:
- Uses a modular component structure.
- Adheres to accessibility best practices (WCAG compliance).
- Performance Optimization:
- Minimizes unnecessary re-renders and optimizes assets.
- Implements efficient event handling.
- Security:
- Frontend validation complements backend security.
- No sensitive keys or credentials are exposed.
- Code Structure & Maintainability:
-
Mobile App Code Review Checklist Items:
- UI/UX & Performance:
- Adheres to platform-specific design guidelines.
- UI is responsive across different screen sizes.
- Memory & Battery Optimization:
- Manages memory efficiently to avoid excessive battery drain.
- Offline Handling:
- Gracefully handles network disconnections.
- Uses caching for improved performance.
- UI/UX & Performance:
-
Code Review Process Best Practices (for AI adaptation):
- Clear Goals: Define what a successful review entails for the AI (e.g., identifying X types of issues, checking against Y standards).
- Automate Repetitive Checks (AI's Focus): Instruct the AI to assume basic linting/formatting is done and to focus its review on deeper aspects: business logic, security risks, architectural alignment, maintainability, complex error handling, performance, etc.
- Small, Focused Scope: When possible, provide the AI with smaller, focused code sections (e.g., a single function, a PR diff) rather than entire large files at once, to improve the quality and relevance of its feedback.
- Constructive & Actionable Feedback: Prompt the AI to:
- Be specific in its comments (file, line number, problematic snippet).
- Explain why something is an issue (impact).
- Suggest how it could be improved (without writing the full fix, but offering a direction or pattern).
- Balance critical feedback with acknowledging good practices found.
- Continuous Improvement (for AI's knowledge): If the AI reviewer misses something consistently, this feedback can be used to refine its custom instructions or prompt for future reviews (meta-learning for the system managing Roo).
Insights for Code Reviewer Mode: Expanded Checklist Items (from Bito.ai)
The Bito.ai "What to Look for in a Code Review: 24 Points Checklist" offers further detailed criteria for code reviews, categorized for clarity.
-
I. Code Quality:
- 1. Clarity and Readability:
- Purpose/functionality clear.
- Self-explanatory and logically grouped names (variables, functions, classes).
- Avoids complex structures unnecessarily.
- Consistent coding style (naming, indentation, spacing). Adherence to language-specific standards (e.g., PascalCase in Java, snake_case in Python).
- 2. DRY (Don’t Repeat Yourself) Principle:
- Identify repeated code for refactoring into functions/components.
- Balance reusability with simplicity (avoid over-engineering).
- 3. SOLID Principles (for Object-Oriented Design):
- Single Responsibility Principle: Each class/module has one responsibility.
- Open/Closed Principle: Open for extension, closed for modification.
- Liskov Substitution Principle: Derived classes can substitute base classes.
- Interface Segregation Principle: Don't force unnecessary methods on implementing classes.
- Dependency Inversion Principle: Depend on abstractions, not concretions.
- 4. Error Handling:
- Comprehensive coverage of potential errors (technical and business logic).
- Consistent error handling strategy (exceptions or error codes).
- Graceful error handling without exposing sensitive information.
- 1. Clarity and Readability:
-
II. Code Performance:
- 5. Efficiency:
- Assess algorithms/data structures for time/space efficiency and complexity.
- Consider alternatives for large datasets.
- Profile to identify hotspots; optimize these areas.
- Avoid premature optimization that complicates code; justify with metrics.
- 6. Resource Management:
- Efficient management of memory, file handles, etc.
- Proper allocation/deallocation to avoid leaks (e.g., C#'s
using
, Java'stry-with-resources
). - Ensure cleanup even with exceptions (e.g.,
finally
blocks).
- 7. Scalability:
- Ability to handle increased loads.
- Identify potential bottlenecks.
- Architecture supports horizontal/vertical scaling.
- Modular design for independent scaling of components.
- 8. Concurrency:
- Correct handling of multi-threading/synchronization.
- Address race conditions, deadlocks.
- Justified and efficient use of concurrency.
- Rigorous testing under concurrent load.
- 5. Efficiency:
-
III. Security:
- 9. Input Validation:
- Validate all inputs (user and inter-system) for type, length, format, range.
- Defensive checks.
- Sanitize inputs for SQL/HTML to prevent malicious content.
- Clear feedback for invalid inputs without revealing system details.
- 10. Authentication and Authorization Checks:
- Use standard protocols/libraries.
- Ensure all access points to protected resources are secure.
- Apply the principle of least privilege.
- 11. Secure Coding Practices:
- Awareness of common vulnerabilities (SQL injection, XSS).
- Preventive measures (prepared statements, input sanitization).
- Use security auditing tools; stay updated on best practices.
- 12. Data Encryption:
- Encrypt sensitive data in transit (e.g., HTTPS) and at rest.
- Use up-to-date, standard encryption methods.
- Secure key management.
- Comply with relevant regulations (GDPR, HIPAA).
- 9. Input Validation:
-
IV. Testing and Reliability:
- 13. Unit Tests:
- Comprehensive: cover critical paths, common, edge, and error scenarios.
- Well-structured, easy to maintain, independent of external systems.
- Mocking used to isolate code under test.
- 14. Integration Tests:
- Ensure different system parts work together effectively.
- Cover real-world scenarios and failure modes.
- Test environment mimics production.
- 15. Test Coverage:
- Evaluate coverage for critical paths using tools.
- Balance high coverage with test quality.
- 16. Code Consistency:
- Maintain consistency in coding practices (naming, structures, patterns) across the codebase.
- Document and update coding conventions.
- 13. Unit Tests:
-
V. Documentation and Comments:
- 17. Code Comments:
- Explain complex logic and decisions, avoid redundancy with code.
- Up-to-date, reflect recent changes, adhere to project style.
- 18. Technical Documentation:
- Accurate, complete, clear.
- Reflects current code state, system structure, interactions.
- Accessible to newcomers.
- 19. README/Changelogs:
- README: informative, current (setup, usage, links).
- Changelogs: chronologically list significant changes.
- Clear and well-formatted.
- 17. Code Comments:
-
VI. Compliance and Standards:
- 20. Code Standards:
- Confirm adherence to coding standards.
- Suggest automated linting tools for enforcement.
- Document standards for team alignment.
- 21. Legal Compliance:
- Check compliance with licenses, data protection laws (e.g., GDPR), IP rights.
- Proper handling of user data, authorization for code use.
- 22. Accessibility:
- Review for compliance with accessibility best practices (e.g., WCAG).
- Suggest tools for testing.
- Documentation covers accessibility features (e.g., ARIA roles).
- 20. Code Standards:
-
VII. Design and Architecture:
- 23. Design Patterns:
- Evaluate appropriate and consistent use of design patterns.
- Ensure they address specific problems effectively.
- Documentation explains rationale for pattern choice.
- 24. Architecture Decisions:
- Assess code alignment with architectural principles (scalability, performance, flexibility, maintainability).
- Identify potential architectural bottlenecks.
- 23. Design Patterns:
Insights for Code Reviewer Mode: 8 Essential Steps Checklist (from Axify.io)
The Axify.io article "The Ultimate Developer’s Code Review Checklist (8 Essential Steps)" provides another valuable set of criteria for code reviews.
-
1. Functionality:
- Does the code accomplish its intended purpose (as per specifications/user stories)?
- Check for logic errors, off-by-one mistakes, and edge cases.
-
2. Readability and Expressiveness:
- Is the code easy to understand and maintain?
- Are variable and function names clear and meaningful?
- Is the code structure well-organized?
- Are comments used where necessary (for complex logic)?
- Does the code adhere to any team style guide?
-
3. Security Flaws:
- Check for common vulnerabilities (e.g., SQL injection, cross-site scripting (XSS), missing validation, insecure data storage).
- Ensure user inputs are sanitized.
- Ensure sensitive data (passwords, keys) is handled securely (e.g., not hardcoded, retrieved from a secure vault).
- Consider OWASP cheat sheets for guidance.
-
4. Performance:
- Evaluate for memory efficiency.
- Assess appropriate use of data structures.
- Look for potential bottlenecks.
- Optimize loops, recursive calls, and database queries.
-
5. Unit and Integration Tests:
- Are adequate unit tests provided? Do they cover all functionalities and edge cases?
- Are integration tests in place to verify interactions with other system parts?
- Check for test coverage, proper mocking of dependencies, and clear assertions.
-
6. Supporting Documentation:
- Are there inline comments explaining complex logic?
- Is external documentation (e.g., README files, API docs) provided and adequate?
- Does documentation cover usage, parameters, and expected behavior?
-
7. Adherence to Standards and Principles:
- Does the code follow established coding conventions and best practices for the project or language?
- Are design principles like SOLID and DRY (Don’t Repeat Yourself) applied?
-
8. Error and Exception Handling:
- Does the code robustly handle errors, exceptions, and edge cases?
- Are specific exceptions caught and handled appropriately (rather than generic
Exception
)? - Are meaningful error messages provided to the user or logging system?
-
Implementing the Checklist (Process Considerations for AI Adaptation):
- Integration into Workflow: The AI reviewer should be triggered as part of a standard workflow (e.g., on PR creation).
- Tool-Assisted Review: The AI should assume basic static analysis/linting is done, focusing on deeper issues.
- Education & Culture (AI's Role): The AI's feedback should be framed constructively, explaining the 'why' behind suggestions to foster learning, even if the recipient is another AI mode or a human.
Insights for Code Reviewer Mode: Structuring Constructive & Actionable Feedback (from TeamAI)
The TeamAI article "Best Prompt for Constructive Code Reviews and Feedback" offers a structured approach and detailed directions for providing feedback, which can guide how the Code Reviewer AI should be prompted to formulate its review comments.
-
Core Task for AI Reviewer (as per TeamAI prompt): "Provide best practices for delivering constructive code reviews that help improve code quality and support the development of the engineer."
-
Key Directions for AI to Generate Constructive Feedback:
- Thorough Review Foundation: (AI already does this by reading code) Focus on code quality, readability, maintainability, and adherence to coding standards and best practices.
- Start with Positive Observations: Instruct the AI to identify and briefly acknowledge any strengths or well-implemented aspects of the code before diving into issues.
- Prompt Snippet: "Begin your review by noting 1-2 positive aspects of the code, if applicable."
- Be Specific with Critical Feedback & Provide Examples:
- For each issue, the AI MUST clearly state the problem, provide the specific file/line number, and include the relevant code snippet.
- It MUST explain the potential negative impact of the issue (e.g., "This could lead to a NullPointerException if X occurs because...").
- Prompt Snippet: "For each identified issue, provide: a) The file path and line number. b) The problematic code snippet. c) A clear explanation of why it's an issue and its potential impact."
- Offer Actionable Suggestions for Improvement:
- The AI should suggest how the issue could be addressed, or what principle to apply. It should not necessarily write the corrected code but guide the developer.
- If relevant, it could be prompted to suggest looking into specific documentation, patterns, or examples (if known or found via quick research).
- Prompt Snippet: "For each issue, offer a high-level actionable suggestion for improvement or a principle to consider (e.g., 'Consider refactoring this into smaller functions to improve readability and testability,' or 'This section could benefit from applying the Strategy pattern.')."
- Maintain a Respectful and Supportive Tone:
- Instruct the AI to use neutral, objective language. Frame feedback as suggestions ("Consider doing X," "It might be clearer if Y") rather than harsh criticisms or demands.
- Avoid language that could be perceived as personal.
- Prompt Snippet: "Ensure all feedback is phrased respectfully and constructively, focusing on the code and not the author. Use suggestive language."
- Encourage Discussion (Implicitly): While an AI can't "discuss," its feedback should be phrased to invite thought, not shut down alternatives.
- Prioritize Feedback:
- Instruct the AI to categorize or flag issues by severity (e.g., Critical, Major, Minor, Suggestion) or impact.
- It should focus its main comments on more significant issues.
- Prompt Snippet: "If multiple issues are found, attempt to prioritize them by potential impact (e.g., Critical, Important, Minor Suggestion)."
- Acknowledge Multiple Valid Approaches (If Applicable):
- If the AI suggests an alternative, it can briefly acknowledge that other solutions might exist but explain why its suggestion is preferred in this context (e.g., "While approach A works, approach B might be more maintainable here because...").
- Reinforce Best Practices & Standards:
- When an issue relates to a deviation from known best practices or project-specific standards (from memory bank files), the AI should reference this.
- Prompt Snippet: "If an issue violates a known coding standard from
coding_standards.md
or a general best practice (e.g., SOLID, DRY), briefly mention this."
- End with a Positive or Encouraging Note (Optional):
- The AI could conclude its review summary with a generally positive or appreciative statement if appropriate.
-
Contextual Inputs for the AI Reviewer (from TeamAI prompt):
- The coding language or framework being reviewed.
- Paths to any team coding standards or style guides (e.g.,
project_brief.md
,.clinerules
,coding_standards.md
in the memory bank). The AI MUST consult these.
-
Applying to
review.md
and Final Report:- These principles should guide how the AI reviewer populates its
review.md
file iteratively. - The final, synthesized report should also adhere to this structure for each reported item, ensuring clarity, actionability, and a constructive tone.
- These principles should guide how the AI reviewer populates its
Insights for Code Reviewer Mode: Identifying Code Smells & Anti-Patterns (from Geekpedia)
The Geekpedia article "Using AI to Detect Anti-Patterns in Code" explains anti-patterns and code smells, and how AI can assist in their detection. This is valuable for prompting the Code Reviewer mode to look beyond direct bugs.
-
Definitions:
- Anti-Patterns: Common, yet ineffective or counterproductive solutions to recurring design/programming problems. They are pitfalls that lead to technical debt and maintenance issues.
- Examples: God Class (too many responsibilities), Spaghetti Code (tangled control structure), Feature Envy (object overly interested in another's methods/properties).
- Code Smells: Surface-level indicators that something might be wrong in the code, often hinting at deeper design issues (which could be anti-patterns).
- Examples: Long Method, Large Class, Primitive Obsession (overuse of primitive data types instead of small objects).
- Anti-Patterns: Common, yet ineffective or counterproductive solutions to recurring design/programming problems. They are pitfalls that lead to technical debt and maintenance issues.
-
Prompting AI to Identify Code Smells & Anti-Patterns: The Code Reviewer mode can be instructed to look for these during its "Systematic Code Examination":
- Explicitly Check for Known Anti-Patterns:
- Prompt Snippet: "Review this class: [code snippet]. Does it exhibit characteristics of a 'God Class' by centralizing too many responsibilities or having knowledge of too many other classes? Explain your reasoning."
- Prompt Snippet: "Analyze the control flow in this module: [code snippet]. Are there signs of 'Spaghetti Code,' such as deeply nested conditionals, excessive use of global variables for flow control, or a lack of clear structure?"
- Prompt Snippet: "Examine the interaction between ClassA and ClassB: [code snippets]. Does ClassA show 'Feature Envy' towards ClassB by frequently accessing ClassB's internal data or methods to perform operations that might better belong in ClassB itself?"
- Identify Common Code Smells as Indicators:
- Prompt Snippet: "Is this method excessively long (Long Method smell)? If so, suggest logical points where it could be broken down into smaller, more focused methods."
- Prompt Snippet: "Does this class seem too large or handle too many distinct responsibilities (Large Class smell)? What are the different responsibilities it seems to manage?"
- Prompt Snippet: "Is there an over-reliance on primitive data types in this section where creating small classes or structs might improve clarity and type safety (Primitive Obsession smell)?"
- Prompt Snippet: "Is there evidence of duplicated code blocks that could be refactored into a shared function or component (Duplicated Code smell)?"
- Contextual Analysis:
- Remind the AI that what constitutes an anti-pattern can sometimes be context-dependent.
- Prompt Snippet: "While evaluating for [anti-pattern X], consider the specific context and constraints of this project. Is this pattern problematic here, or is it a justifiable trade-off?"
- Suggest High-Level Refactoring:
- If an anti-pattern or significant smell is identified, the AI should suggest the need for refactoring and the general approach, rather than writing the refactored code.
- Prompt Snippet: "This class appears to be a God Class. Consider refactoring by identifying distinct responsibilities and extracting them into separate, cohesive classes."
- Explicitly Check for Known Anti-Patterns:
-
Integrating into Review Process:
- These checks should be part of the "Maintainability" and "Code Structure and Design" aspects of the review.
- Findings related to smells/anti-patterns should be documented in
review.md
with clear explanations and locations.
Insights for Code Reviewer Mode: AI for Detecting Code Smells & Suggesting Optimizations (from arXiv:2404.18496v1)
The arXiv paper "AI-powered Code Review with LLMs: Early Results" describes an LLM-based AI agent model for code review, including a "Code Smell Agent" and "Code Optimization Agent." This provides insights into prompting for deeper code analysis.
-
Concept of Specialized AI Review Agents:
- Code Smell Agent: Trained to detect symptoms of deeper problems in code design and implementation, recognize anti-patterns, and suggest refactoring to improve maintainability and performance. Focuses on subtle, non-obvious patterns.
- Code Optimization Agent: Provides recommendations for improving code and can suggest optimizations for execution speed, memory usage, and overall maintainability.
- Bug Report Agent: Identifies potential bugs by analyzing patterns and anomalies.
- (While Roo's Code Reviewer is a single mode, it can be prompted to wear these different "hats" during its review).
-
AI Capabilities Beyond Static Analysis:
- LLMs can be trained on code reviews, bug reports, and best practices to understand context and provide deeper insights than traditional static analysis tools.
- They can predict potential future risks in the code.
- They aim not just to find issues but to provide actionable suggestions for improvement and educate developers.
-
Examples of Issues AI Can Identify (from the paper's preliminary results):
- Critical bugs (e.g., unicode parsing failures).
- Code smells (e.g., hard-coded parameters, use of global variables, unclear function names, overly rigid decision trees).
- Inefficiencies (e.g., outdated algorithms, suboptimal data handling for large corpora, inefficient training loops, outdated caching).
- Lack of error handling.
-
Prompting Strategies for Code Reviewer Mode (inspired by the paper):
- Prompt for Code Smell Detection:
- Prompt Snippet: "Analyze this code for common code smells such as [list specific smells like 'Long Method', 'Large Class', 'Feature Envy', 'Data Clumps', 'Primitive Obsession', 'Shotgun Surgery', 'Message Chains', 'Inappropriate Intimacy']. For each identified smell, explain why it's a concern and suggest a general refactoring approach."
- Prompt for Optimization Suggestions (High-Level):
- Prompt Snippet: "Review this module for potential optimizations in terms of performance (e.g., algorithm efficiency, resource usage) or maintainability. Are there any sections where alternative approaches or design patterns might yield significant improvements? Describe the potential improvement and the suggested approach at a high level."
- Prompt for Proactive Risk Identification:
- Prompt Snippet: "Beyond immediate bugs, does this code introduce any potential future risks regarding scalability, maintainability, or security if left as is? Explain your reasoning."
- Frame Feedback Educationally:
- Prompt Snippet: "When suggesting an improvement, briefly explain the underlying best practice or design principle that motivates the suggestion (e.g., 'This change would better adhere to the Single Responsibility Principle because...')."
- Prompt for Code Smell Detection:
-
Leveraging Training Data Concepts (for prompting):
- The paper mentions training on code reviews, bug reports, and best practices. When prompting the Code Reviewer mode, providing context from similar sources (e.g., project's own bug history if available in a memory bank, or general best practice documents) can help it make more relevant suggestions.
Insights for Code Reviewer Mode: Leveraging Project-Specific Context (Memory Bank)
Based on search snippets (Medium/Vishal Rajput, Qodo.ai) and general principles of AI context, the Code Reviewer mode must be explicitly prompted to use project-specific guidelines from its "memory bank" files.
-
Core Principle: AI code reviewers should enforce not only general best practices but also unique project/team coding standards, architectural patterns, and known conventions.
-
Prompting Strategies for Using Memory Bank:
- Mandatory Context Ingestion:
- At the beginning of any review task, the Code Reviewer mode MUST be instructed to read and understand relevant "memory bank" files. These could include:
project_brief.md
(for overall goals, tech stack)system_architecture.md
orsystemPatterns.md
(for architectural guidelines)coding_standards.md
or.clinerules
(for specific coding conventions, naming, formatting)known_issues_and_workarounds.md
(for recurring problems or established solutions)
- Prompt Snippet: "Before reviewing the provided code, YOU MUST first read and thoroughly understand the guidelines and context provided in the following project documents:
coding_standards.md
,system_architecture.md
. Pay close attention to [specific section if relevant, e.g., 'the API design principles']."
- At the beginning of any review task, the Code Reviewer mode MUST be instructed to read and understand relevant "memory bank" files. These could include:
- Prioritize Project-Specific Standards:
- Instruct the AI that documented project-specific standards generally take precedence over generic best practices, unless a project standard introduces a clear security risk or a major, universally recognized anti-pattern.
- Prompt Snippet: "When reviewing, prioritize adherence to the conventions outlined in
coding_standards.md
. If you identify a conflict between a project standard and a general best practice, note the project standard but also mention the general best practice and any potential trade-offs or risks if the project standard is followed in this specific instance."
- Specific Checks Against Guidelines:
- Prompt the AI to perform targeted checks against rules or patterns defined in the memory bank.
- Example Prompts for AI's Internal Checklist Generation:
- "Verify that all public method names in this Java code adhere to the
camelCaseForMethods
convention specified incoding_standards.md
." - "Does the error handling strategy in this Python module align with the 'centralized logging approach' detailed in
project_brief.md
?" - "Is the use of the
[SpecificSingletonPattern]
in this module consistent with its approved usage contexts described insystemPatterns.md
?" - "The file
known_issues_and_workarounds.md
mentions a common pitfall related to [specific library X]. Check if the current code avoids this pitfall."
- "Verify that all public method names in this Java code adhere to the
- Referencing Memory Bank in Feedback:
- When the AI identifies an issue that violates a project-specific guideline, it should explicitly reference the source document in its feedback.
- Prompt Snippet: "If a piece of code deviates from a rule in
coding_standards.md
, your review comment MUST state: 'This deviates from the project standard [Rule X] outlined incoding_standards.md
(line Y), which requires [brief explanation of rule]. Consider refactoring to align.'"
- Mandatory Context Ingestion:
-
Integrating into Review Process:
- The "Understand Project Context" step in
CodeReviewerMode.md
's custom instructions should make reading these memory bank files the absolute first action. - The "Adherence to Standards" checklist item should explicitly include project-specific standards from these files.
- The "Understand Project Context" step in