Web & HTTP
Fetch a web page and extract clean structured content.
web_scrapeVisits a web page and pulls out just the readable stuff — the headline, the article, and any tables — while ignoring the menus, ads, and footers. Think of it as handing the agent the clean version of the page.
Does a plain HTTP fetch on the URL, parses the HTML, and extracts the title, body text, headings, tables, and optionally the link list. Supports a CSS selector so you can target a specific region of the page. It does not run JavaScript — for pages that need it, use the 'browse' tool instead.
When a user asks:
Pull the article text from this blog post.
the agent calls the tool:
web_scrape(url="https://example.com/blog/post")and gets back: the article's title, body text, and any tables — menus, ads, and footer stripped out.
Wire this tool into a SwarmAI crew. Use the YAML DSL for declarative workflows, or the Java builder API when you want full programmatic control.
YAML DSL
# content-harvest.yaml
name: content-harvest-crew
process: SEQUENTIAL
agents:
- id: extractor
role: Content Extractor
goal: Pull clean article text from public web pages
tools:
- web_scrape
tasks:
- id: content-harvest-task
agent: extractor
description: Visit the supplied URL and return just the body text, title, and any tables.Java
import ai.intelliswarm.swarmai.agent.Agent;
import ai.intelliswarm.swarmai.task.Task;
import ai.intelliswarm.swarmai.swarm.Swarm;
import ai.intelliswarm.swarmai.swarm.SwarmOutput;
import ai.intelliswarm.swarmai.process.ProcessType;
import ai.intelliswarm.swarmai.tool.common.WebScrapeTool;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.beans.factory.annotation.Autowired;
@Autowired ChatClient chatClient;
@Autowired WebScrapeTool webScrapeTool;
Agent extractor = Agent.builder()
.role("Content Extractor")
.goal("Pull clean article text from public web pages")
.chatClient(chatClient)
.tool(webScrapeTool)
.build();
Task extractorTask = Task.builder()
.description("Visit the supplied URL and return just the body text, title, and any tables.")
.agent(extractor)
.build();
SwarmOutput result = Swarm.builder()
.agent(extractor)
.task(extractorTask)
.process(ProcessType.SEQUENTIAL)
.build()
.kickoff();Real scenarios where agents put this tool to work.
Implementation lives at swarmai-tools/src/main/java/ai/intelliswarm/swarmai/tool/common/WebScrapeTool.java in the swarm-ai repository.