← All tools

Desktop Automation (Windows)

Windows Screenshot

Capture the screen, a monitor, or a named window to a PNG.

windows_screenshot

Overview

The visual equivalent of windows_filesystem's read ops — lets an agent take a snapshot of the user's screen, a particular monitor, or a specific window so a downstream step (vision LLM, bug-report-generator, visual diff) has something to work with. The agent never picks where the file lands; the output directory is fixed by configuration.

How it works

Implements three modes: screen (the primary monitor via java.awt.Robot), monitor (any 0-based monitor index for multi-display setups), and window (find the bounds of a window matching a title substring via PowerShell + User32, then capture that rectangle). Output goes to a single configured directory with a timestamped filename. Each capture is a MutationPlan routed through the approval gate (a screenshot is a write to disk + a privacy event, not a free read). Pass apply=true to execute; otherwise the tool returns a dry-run plan.

Example

When a user asks:

Snapshot the Excel budget window so the next agent can OCR it.

the agent calls the tool:

windows_screenshot(operation="window", title="Budget", apply=true)

and gets back: an approval prompt, then a PNG at the configured output dir + the absolute path so a vision agent can pick it up.

Configuration

Set these before calling the tool. Values marked required must be present or the tool call will fail.

swarmai.tools.windows.enabled required

Master switch for the Windows tool category.

swarmai.tools.windows.screenshot.output-dir optional

Allowlisted directory where captured PNGs are written.

swarmai.tools.windows.auto-approve optional

Skip the y/N prompt for capture writes. Intended for non-interactive runs only.

Use it in a workflow

Wire this tool into a SwarmAI crew. Use the YAML DSL for declarative workflows, or the Java builder API when you want full programmatic control.

YAML DSL

# bug-report.yaml
name: bug-report-crew
process: SEQUENTIAL

agents:
  - id: capturer
    role: Bug Capturer
    goal: Snapshot the failing app window and attach it to a bug report
    tools:
      - windows_screenshot

tasks:
  - id: capture-task
    agent: capturer
    description: Find the window with 'Error' in the title and capture it to a PNG.

Java

import ai.intelliswarm.swarmai.agent.Agent;
import ai.intelliswarm.swarmai.task.Task;
import ai.intelliswarm.swarmai.swarm.Swarm;
import ai.intelliswarm.swarmai.swarm.SwarmOutput;
import ai.intelliswarm.swarmai.process.ProcessType;
import ai.intelliswarm.swarmai.tool.windows.WindowsScreenshotTool;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.beans.factory.annotation.Autowired;

@Autowired ChatClient chatClient;
@Autowired WindowsScreenshotTool windowsScreenshotTool;

Agent capturer = Agent.builder()
    .role("Bug Capturer")
    .goal("Snapshot the failing app window and attach it to a bug report")
    .chatClient(chatClient)
    .tool(windowsScreenshotTool)
    .build();

Task capturerTask = Task.builder()
    .description("Find the first window matching 'Error' and capture its bounds.")
    .agent(capturer)
    .build();

SwarmOutput result = Swarm.builder()
    .agent(capturer)
    .task(capturerTask)
    .process(ProcessType.SEQUENTIAL)
    .build()
    .kickoff();

What it's good for

Real scenarios where agents put this tool to work.

Attach 'before/after' screenshots to bug-report-generating crews
Capture a specific app window mid-workflow for visual diffing
Document a desktop-tidy run by snapshotting the result
Pair with windows_window to focus a target before capturing it

Source

Implementation lives at swarmai-tools/src/main/java/ai/intelliswarm/swarmai/tool/windows/WindowsScreenshotTool.java in the swarm-ai repository.

Open windows_screenshot on GitHub →