Browserless has added four new endpoints to make scraping workflows easier to run end-to-end: smart-scrape, search, map, and crawl. Together, they cover the full path from discovery to extraction with less orchestration code.
Here's exactly what each of the four endpoints can do:
- Smart Scrape API -- Scrape a single URL and let Browserless escalate automatically from fast HTTP fetching to proxies, a browser, and CAPTCHA solving when the page needs it.
- Search API -- Search across
web,news, andimages, then optionally scrape each result into markdown, HTML, links, or screenshots. - Map API -- Discover and deduplicate URLs on a site, with sitemap controls, relevance ordering, geo-targeting, and filtering options.
- Crawl API -- Start an async crawl job, poll for status and results, filter paths, configure page-level scraping, and receive webhook events as pages complete.
When to use each API
Use smart-scrape as your default scraping endpoint when you have a URL and want its content without configuring proxies, rendering, or captcha handling yourself. A single call can return HTML, markdown, screenshots, PDFs, and links, making it ideal for AI agents and MCP workflows that need reliable extraction without managing complexity.
Use search when your starting point is a query, not a URL. It's the endpoint for finding pages across the public web, news, or images, with the option to immediately turn the top results into LLM-ready content in the same request.
Use map when your target is one site and you want a clean inventory of URLs before you scrape anything. It's a good first pass for docs sites, ecommerce catalogs, help centers, and blog archives where you want relevance sorting and sitemap-aware discovery without starting a full crawl yet.
Use crawl when you want to scrape an entire site asynchronously. It discovers URLs and extracts their content in one workflow, with operational controls like depth limits, path filters, retries, and webhooks for downstream systems.
Smart Scrape API
What Smart Scrape API does
/smart-scrape, which is available on all plans, is the endpoint for turning one URL into usable output without pre-classifying the site first.
Browserless starts with a fast HTTP fetch, retries through a residential proxy if the target blocks datacenter traffic, escalates to a full stealth browser for JavaScript-rendered pages, and adds CAPTCHA solving when a challenge is detected.
The response tells you what actually worked through strategy, and shows the full path it tried through attempted.
It's useful for more than just plain HTML retrieval. You can request HTML, markdown, screenshots, PDF, and links in one call. Failures still come back as HTTP 200 with ok: false plus an error message you can handle in your application logic.
Quickstart code
The minimal shape is a POST with a url and a formats array. There's also a timeout query parameter if you need to cap how long each strategy attempt can run.
curl --request POST \
--url 'https://production-sfo.browserless.io/smart-scrape?token=YOUR_API_TOKEN_HERE' \
--header 'Content-Type: application/json' \
--data '{
"url": "https://news.ycombinator.com/",
"formats": ["html", "markdown", "links", "screenshot", "pdf"]
}'
A real-world example of how you can use Smart Scrape API
You're building a knowledge base for an AI assistant from external docs, support articles, and product pages. Some sources are static, some are client-rendered, and a few throw challenges at you. smart-scrape lets you keep that as one integration:
- Request
markdownfor chunking and embedding. - Keep
HTMLfor structure-aware parsing. - Add
screenshotfor vision model context. - Include
PDFfor archival and human reference. - Pull
linksto discover related pages.
Let Browserless choose the cheapest successful strategy for each page.
Search API
What Search API does
/search, which is currently in beta and available on Browserless's Cloud plans, is the discovery endpoint. It can search across web, news, and images in one request, and it can optionally scrape each result URL so the response contains both the result metadata and the extracted content you actually want to work with.
Browserless also supports language targeting, country and location controls, time filters, and content categories such as github, research, and pdf.
The endpoint is especially useful for research, monitoring, and RAG ingestion. Instead of building one job to search and another to scrape the top hits, you can search, filter, and return markdown, HTML, links, or screenshots from the selected results in the same call.
Quickstart code
The simplest request is just a query. When you add scrapeOptions, Browserless fetches each result and returns your requested output format.
curl --request POST \
--url 'https://production-sfo.browserless.io/search?token=YOUR_API_TOKEN_HERE' \
--header 'Content-Type: application/json' \
--data '{
"query": "browser automation best practices",
"sources": ["web", "news"],
"tbs": "month",
"scrapeOptions": {
"formats": ["markdown"],
"onlyMainContent": true
}
}'
A real-world example of how you can use Search API
Use Search API to build a research feed for your team:
- Query a topic.
- Pull from both
webandnews. - Narrow it to the last month with
tbs. - Return the top results as cleaned markdown for storage, embedding, or review.
If you need a narrower source profile, category filters let you bias toward GitHub repos, academic sources, or PDFs.
Map API
What Map API does
/map, which is available on Browserless's Cloud plans, discovers all URLs inside a site. You send a base URL and get back a deduplicated links array with optional titles and descriptions -- a clean way to understand a site's shape before you decide what to scrape next.
The endpoint also gives you more control than a basic sitemap pull. You can rank discovered URLs against a search query, choose whether discovery uses the sitemap, on-page links, or both, include or exclude subdomains, ignore query parameters to cut down duplicate variants, and apply geo-targeting through location.country and location.languages.
Quickstart code
The minimal request is just the base URL, but map is even more useful when you add relevance sorting and filtering controls.
curl --request POST \
--url 'https://production-sfo.browserless.io/map?token=YOUR_API_TOKEN_HERE' \
--header 'Content-Type: application/json' \
--data '{
"url": "https://www.browserless.io",
"search": "pricing",
"sitemap": "include",
"includeSubdomains": false,
"ignoreQueryParameters": true
}'
A real-world example of how you can use Map API
Imagine you need to index only the useful parts of a large docs site. You can:
- Map the site once.
- Rank results against a query such as
authenticationorpricing. - Drop duplicate URL variants by ignoring query parameters.
- Hand a much cleaner queue to the next stage of your pipeline.
It's usually a better first move than crawling the whole domain blind.
Crawl API
What Crawl API does
/crawl, which is currently in beta and available on Browserless's Cloud plans, is the async endpoint for multi-page work. You start a job with POST /crawl, get back a crawl ID and a status URL, then poll GET /crawl/{id} to track progress and retrieve results. Browserless also exposes GET /crawl to list jobs and DELETE /crawl/{id} to cancel a running crawl.
The endpoint is built for operational control, not just bulk collection. You can set depth and retry limits, control whether the crawl follows subdomains or external links, choose sitemap behavior, filter paths with regex patterns, configure per-page scrape output, and subscribe to page, completed, and failed webhook events.
Results are paginated. Keep two expiry windows in mind: crawl results are available for 24 hours after a job completes, and each page's contentUrl expires after 1 hour, so process or store content promptly.
Quickstart code
The basic flow is two-step:
- Start the crawl.
- Poll the crawl ID for progress and results.
# 1. Start the crawl
curl --request POST \
--url 'https://production-sfo.browserless.io/crawl?token=YOUR_API_TOKEN_HERE' \
--header 'Content-Type: application/json' \
--data '{
"url": "https://docs.browserless.io",
"maxDepth": 2,
"includePaths": ["^/rest-apis"],
"scrapeOptions": {
"formats": ["markdown", "html"],
"onlyMainContent": true
}
}'
# 2. Poll for results
curl --request GET \
--url 'https://production-sfo.browserless.io/crawl/CRAWL_ID_HERE?token=YOUR_API_TOKEN_HERE'
A real-world example of how you can use Crawl API
If you're indexing a docs site, crawl gives you the right primitives out of the box. You can:
- Constrain the job to certain sections with
includePaths. - Avoid low-value branches with
excludePaths. - Keep the extracted page body cleaner with
onlyMainContent. - Process pages incrementally as webhook events arrive instead of waiting for one giant synchronous response.
Using these endpoints for AI workflows
All four endpoints produce output that slots directly into AI workflows, especially /search, /smart-scrape, and /crawl. These three support markdown output for LLM consumption.
/smart-scrapeoffers the widest format set: markdown, HTML, links, screenshots, PDFs, and auto-parsed JSON for API-style targets./searchadds markdown, HTML, links, and screenshots viascrapeOptions./crawlreturns markdown, HTML, and raw text for each page.
For example, when scrapeOptions is provided, /search returns the content from each result URL in your requested format -- ready for embedding, summarization, or storage. Meanwhile, each page that /crawl scrapes is returned as data that's structured to be ready for LLM interactions.
Learn more about how these new API endpoints work with AI tooling in our MCP announcement blog.
API endpoint FAQs
How do timeouts work across these endpoints?
Timeout handling is not identical everywhere.
/smart-scrapedocuments atimeoutquery parameter./searchand/mapaccept atimeoutfield in the JSON request body./crawllets you set per-page navigation timeout throughscrapeOptions.timeout, with a documented default of30000ms for cloud, plus an optionalwaitFordelay after page load.
When should you use async instead of a single scrape?
Use /smart-scrape when you care about one known URL and you want the system to figure out the lightest successful extraction strategy.
Use /crawl when you need multi-page collection, status polling, path filters, retries, pagination, or webhook-driven processing. That's the line between a fetch job and a crawl system.
How do you filter what gets discovered or scraped?
/searchgives you filters for source type, language, country, location, recency, and categories./mapgives you relevance sorting, sitemap behavior, subdomain control, and query-parameter deduplication./crawladds regex-basedincludePathsandexcludePaths, plus controls for crawl depth and whether to follow subdomains or external links.
What's the best way to get cleaner page bodies?
For /search, use scrapeOptions.onlyMainContent and, when needed, stripNonContentTags to remove <script> and <style> elements, or includeTags and excludeTags to target specific CSS selectors.
For /crawl, the same idea applies through scrapeOptions, and onlyMainContent defaults to true there.
For /smart-scrape, request markdown when you want a cleaner content representation without building your own conversion step.
How do you choose between map and crawl on the same site?
Start with /map when you want to understand the site and decide what matters. Move to /crawl when you already know the sections you want and need structured, async extraction at scale. In practice, /map is often the planning step and /crawl is the execution step.
Note that the sitemap parameter uses different values for each endpoint. Map accepts "include", "skip", or "only", while crawl accepts "auto", "force", or "skip".