# browser_action
The `browser_action` tool enables web automation and interaction via a Puppeteer-controlled browser. It allows Roo to launch browsers, navigate to websites, click elements, type text, and scroll pages with visual feedback through screenshots.
## Parameters
The tool accepts these parameters:
- `action` (required): The action to perform:
* `launch`: Start a new browser session at a URL
* `click`: Click at specific x,y coordinates
* `type`: Type text via the keyboard
* `scroll_down`: Scroll down one page height
* `scroll_up`: Scroll up one page height
* `close`: End the browser session
- `url` (optional): The URL to navigate to when using the `launch` action
- `coordinate` (optional): The x,y coordinates for the `click` action (e.g., "450,300")
- `text` (optional): The text to type when using the `type` action
## What It Does
This tool creates an automated browser session that Roo can control to navigate websites, interact with elements, and perform tasks that require browser automation. Each action provides a screenshot of the current state, enabling visual verification of the process.
## When is it used?
- When Roo needs to interact with web applications or websites
- When testing user interfaces or web functionality
- When capturing screenshots of web pages
- When demonstrating web workflows visually
## Key Features
- Provides visual feedback with screenshots after each action and captures console logs
- Supports complete workflows from launching to page interaction to closing
- Enables precise interactions via coordinates, keyboard input, and scrolling
- Maintains consistent browser sessions with intelligent page loading detection
- Operates in two modes: local (isolated Puppeteer instance) or remote (connects to existing Chrome)
- Handles errors gracefully with automatic session cleanup and detailed messages
- Optimizes visual output with support for various formats and quality settings
- Tracks interaction state with position indicators and action history
## Browser Modes
The tool operates in two distinct modes:
### Local Browser Mode (Default)
- Downloads and manages a local Chromium instance through Puppeteer
- Creates a fresh browser environment with each launch
- No access to existing user profiles, cookies, or extensions
- Consistent, predictable behavior in a sandboxed environment
- Completely closes the browser when the session ends
### Remote Browser Mode
- Connects to an existing Chrome/Chromium instance running with remote debugging enabled
- Can access existing browser state, cookies, and potentially extensions
- Faster startup as it reuses an existing browser process
- Supports connecting to browsers in Docker containers or on remote machines
- Only disconnects (doesn't close) from the browser when session ends
- Requires Chrome to be running with remote debugging port open (typically port 9222)
## Limitations
- While the browser is active, only `browser_action` tool can be used
- Browser coordinates are viewport-relative, not page-relative
- Click actions must target visible elements within the viewport
- Browser sessions must be explicitly closed before using other tools
- Browser window has configurable dimensions (default 900x600)
- Cannot directly interact with browser DevTools
- Browser sessions are temporary and not persistent across Roo restarts
- Works only with Chrome/Chromium browsers, not Firefox or Safari
- Local mode has no access to existing cookies; remote mode requires Chrome with debugging enabled
## How It Works
When the `browser_action` tool is invoked, it follows this process:
1. **Action Validation and Browser Management**:
- Validates the required parameters for the requested action
- For `launch`: Initializes a browser session (either local Puppeteer instance or remote Chrome)
- For interaction actions: Uses the existing browser session
- For `close`: Terminates or disconnects from the browser appropriately
2. **Page Interaction and Stability**:
- Ensures pages are fully loaded using DOM stability detection via `waitTillHTMLStable` algorithm
- Executes requested actions (navigation, clicking, typing, scrolling) with proper timing
- Monitors network activity after clicks and waits for navigation when necessary
3. **Visual Feedback**:
- Captures optimized screenshots using WebP format (with PNG fallback)
- Records browser console logs for debugging purposes
- Tracks mouse position and maintains paginated history of actions
4. **Session Management**:
- Maintains browser state across multiple actions
- Handles errors and automatically cleans up resources
- Enforces proper workflow sequence (launch → interactions → close)
## Workflow Sequence
Browser interactions must follow this specific sequence:
1. **Session Initialization**: All browser workflows must start with a `launch` action
2. **Interaction Phase**: Multiple `click`, `type`, and scroll actions can be performed
3. **Session Termination**: All browser workflows must end with a `close` action
4. **Tool Switching**: After closing the browser, other tools can be used
## Examples When Used
- When creating a web form submission process, Roo launches a browser, navigates to the form, fills out fields with the `type` action, and clicks submit.
- When testing a responsive website, Roo navigates to the site and uses scroll actions to examine different sections.
- When capturing screenshots of a web application, Roo navigates through different pages and takes screenshots at each step.
- When demonstrating an e-commerce checkout flow, Roo simulates the entire process from product selection to payment confirmation.
## Usage Examples
Launching a browser and navigating to a website:
```
launch
https://example.com
```
Clicking at specific coordinates (e.g., a button):
```
click
450,300
```
Typing text into a focused input field:
```
type
Hello, World!
```
Scrolling down to see more content:
```
scroll_down
```
Closing the browser session:
```
close
```