Introducing 4 New Browserless Endpoints for Faster Automation

Browserless has added four new endpoints to make scraping workflows easier to run end-to-end: smart-scrape, search, map, and crawl. Together, they cover the full path from discovery to extraction with less orchestration code.

Here's exactly what each of the four endpoints can do:

  • Smart Scrape API -- Scrape a single URL and let Browserless escalate automatically from fast HTTP fetching to proxies, a browser, and CAPTCHA solving when the page needs it.
  • Search API -- Search across web, news, and images, then optionally scrape each result into markdown, HTML, links, or screenshots.
  • Map API -- Discover and deduplicate URLs on a site, with sitemap controls, relevance ordering, geo-targeting, and filtering options.
  • Crawl API -- Start an async crawl job, poll for status and results, filter paths, configure page-level scraping, and receive webhook events as pages complete.

When to use each API

Use smart-scrape as your default scraping endpoint when you have a URL and want its content without configuring proxies, rendering, or captcha handling yourself. A single call can return HTML, markdown, screenshots, PDFs, and links, making it ideal for AI agents and MCP workflows that need reliable extraction without managing complexity.

Use search when your starting point is a query, not a URL. It's the endpoint for finding pages across the public web, news, or images, with the option to immediately turn the top results into LLM-ready content in the same request.

Use map when your target is one site and you want a clean inventory of URLs before you scrape anything. It's a good first pass for docs sites, ecommerce catalogs, help centers, and blog archives where you want relevance sorting and sitemap-aware discovery without starting a full crawl yet.

Use crawl when you want to scrape an entire site asynchronously. It discovers URLs and extracts their content in one workflow, with operational controls like depth limits, path filters, retries, and webhooks for downstream systems.

Smart Scrape API

What Smart Scrape API does

/smart-scrape, which is available on all plans, is the endpoint for turning one URL into usable output without pre-classifying the site first.

Browserless starts with a fast HTTP fetch, retries through a residential proxy if the target blocks datacenter traffic, escalates to a full stealth browser for JavaScript-rendered pages, and adds CAPTCHA solving when a challenge is detected.

The response tells you what actually worked through strategy, and shows the full path it tried through attempted.

It's useful for more than just plain HTML retrieval. You can request HTML, markdown, screenshots, PDF, and links in one call. Failures still come back as HTTP 200 with ok: false plus an error message you can handle in your application logic.

Quickstart code

The minimal shape is a POST with a url and a formats array. There's also a timeout query parameter if you need to cap how long each strategy attempt can run.

curl --request POST \
  --url 'https://production-sfo.browserless.io/smart-scrape?token=YOUR_API_TOKEN_HERE' \
  --header 'Content-Type: application/json' \
  --data '{
    "url": "https://news.ycombinator.com/",
    "formats": ["html", "markdown", "links", "screenshot", "pdf"]
  }'

A real-world example of how you can use Smart Scrape API

You're building a knowledge base for an AI assistant from external docs, support articles, and product pages. Some sources are static, some are client-rendered, and a few throw challenges at you. smart-scrape lets you keep that as one integration:

  • Request markdown for chunking and embedding.
  • Keep HTML for structure-aware parsing.
  • Add screenshot for vision model context.
  • Include PDF for archival and human reference.
  • Pull links to discover related pages.

Let Browserless choose the cheapest successful strategy for each page.

๐Ÿ“– Smart Scrape Docs โ†’

Search API

What Search API does

/search, which is currently in beta and available on Browserless's Cloud plans, is the discovery endpoint. It can search across web, news, and images in one request, and it can optionally scrape each result URL so the response contains both the result metadata and the extracted content you actually want to work with.

Browserless also supports language targeting, country and location controls, time filters, and content categories such as github, research, and pdf.

The endpoint is especially useful for research, monitoring, and RAG ingestion. Instead of building one job to search and another to scrape the top hits, you can search, filter, and return markdown, HTML, links, or screenshots from the selected results in the same call.

Quickstart code

The simplest request is just a query. When you add scrapeOptions, Browserless fetches each result and returns your requested output format.

curl --request POST \
  --url 'https://production-sfo.browserless.io/search?token=YOUR_API_TOKEN_HERE' \
  --header 'Content-Type: application/json' \
  --data '{
    "query": "browser automation best practices",
    "sources": ["web", "news"],
    "tbs": "month",
    "scrapeOptions": {
      "formats": ["markdown"],
      "onlyMainContent": true
    }
  }'

A real-world example of how you can use Search API

Use Search API to build a research feed for your team:

  • Query a topic.
  • Pull from both web and news.
  • Narrow it to the last month with tbs.
  • Return the top results as cleaned markdown for storage, embedding, or review.

If you need a narrower source profile, category filters let you bias toward GitHub repos, academic sources, or PDFs.

๐Ÿ“– Search API Docs โ†’

Map API

What Map API does

/map, which is available on Browserless's Cloud plans, discovers all URLs inside a site. You send a base URL and get back a deduplicated links array with optional titles and descriptions -- a clean way to understand a site's shape before you decide what to scrape next.

The endpoint also gives you more control than a basic sitemap pull. You can rank discovered URLs against a search query, choose whether discovery uses the sitemap, on-page links, or both, include or exclude subdomains, ignore query parameters to cut down duplicate variants, and apply geo-targeting through location.country and location.languages.

Quickstart code

The minimal request is just the base URL, but map is even more useful when you add relevance sorting and filtering controls.

curl --request POST \
  --url 'https://production-sfo.browserless.io/map?token=YOUR_API_TOKEN_HERE' \
  --header 'Content-Type: application/json' \
  --data '{
    "url": "https://www.browserless.io",
    "search": "pricing",
    "sitemap": "include",
    "includeSubdomains": false,
    "ignoreQueryParameters": true
  }'

A real-world example of how you can use Map API

Imagine you need to index only the useful parts of a large docs site. You can:

  • Map the site once.
  • Rank results against a query such as authentication or pricing.
  • Drop duplicate URL variants by ignoring query parameters.
  • Hand a much cleaner queue to the next stage of your pipeline.

It's usually a better first move than crawling the whole domain blind.

๐Ÿ“– Map API Docs โ†’

Crawl API

What Crawl API does

/crawl, which is currently in beta and available on Browserless's Cloud plans, is the async endpoint for multi-page work. You start a job with POST /crawl, get back a crawl ID and a status URL, then poll GET /crawl/{id} to track progress and retrieve results. Browserless also exposes GET /crawl to list jobs and DELETE /crawl/{id} to cancel a running crawl.

The endpoint is built for operational control, not just bulk collection. You can set depth and retry limits, control whether the crawl follows subdomains or external links, choose sitemap behavior, filter paths with regex patterns, configure per-page scrape output, and subscribe to page, completed, and failed webhook events.

Results are paginated. Keep two expiry windows in mind: crawl results are available for 24 hours after a job completes, and each page's contentUrl expires after 1 hour, so process or store content promptly.

Quickstart code

The basic flow is two-step:

  • Start the crawl.
  • Poll the crawl ID for progress and results.
# 1. Start the crawl
curl --request POST \
  --url 'https://production-sfo.browserless.io/crawl?token=YOUR_API_TOKEN_HERE' \
  --header 'Content-Type: application/json' \
  --data '{
    "url": "https://docs.browserless.io",
    "maxDepth": 2,
    "includePaths": ["^/rest-apis"],
    "scrapeOptions": {
      "formats": ["markdown", "html"],
      "onlyMainContent": true
    }
  }'

# 2. Poll for results
curl --request GET \
  --url 'https://production-sfo.browserless.io/crawl/CRAWL_ID_HERE?token=YOUR_API_TOKEN_HERE'

A real-world example of how you can use Crawl API

If you're indexing a docs site, crawl gives you the right primitives out of the box. You can:

  • Constrain the job to certain sections with includePaths.
  • Avoid low-value branches with excludePaths.
  • Keep the extracted page body cleaner with onlyMainContent.
  • Process pages incrementally as webhook events arrive instead of waiting for one giant synchronous response.

๐Ÿ“– Crawl API Docs โ†’

Using these endpoints for AI workflows

All four endpoints produce output that slots directly into AI workflows, especially /search, /smart-scrape, and /crawl. These three support markdown output for LLM consumption.

  • /smart-scrape offers the widest format set: markdown, HTML, links, screenshots, PDFs, and auto-parsed JSON for API-style targets.
  • /search adds markdown, HTML, links, and screenshots via scrapeOptions.
  • /crawl returns markdown, HTML, and raw text for each page.

For example, when scrapeOptions is provided, /search returns the content from each result URL in your requested format -- ready for embedding, summarization, or storage. Meanwhile, each page that /crawl scrapes is returned as data that's structured to be ready for LLM interactions.

Learn more about how these new API endpoints work with AI tooling in our MCP announcement blog.

API endpoint FAQs

How do timeouts work across these endpoints?

Timeout handling is not identical everywhere.

  • /smart-scrape documents a timeout query parameter.
  • /search and /map accept a timeout field in the JSON request body.
  • /crawl lets you set per-page navigation timeout through scrapeOptions.timeout, with a documented default of 30000 ms for cloud, plus an optional waitFor delay after page load.

When should you use async instead of a single scrape?

Use /smart-scrape when you care about one known URL and you want the system to figure out the lightest successful extraction strategy.

Use /crawl when you need multi-page collection, status polling, path filters, retries, pagination, or webhook-driven processing. That's the line between a fetch job and a crawl system.

How do you filter what gets discovered or scraped?

  • /search gives you filters for source type, language, country, location, recency, and categories.
  • /map gives you relevance sorting, sitemap behavior, subdomain control, and query-parameter deduplication.
  • /crawl adds regex-based includePaths and excludePaths, plus controls for crawl depth and whether to follow subdomains or external links.

What's the best way to get cleaner page bodies?

For /search, use scrapeOptions.onlyMainContent and, when needed, stripNonContentTags to remove <script> and <style> elements, or includeTags and excludeTags to target specific CSS selectors.

For /crawl, the same idea applies through scrapeOptions, and onlyMainContent defaults to true there.

For /smart-scrape, request markdown when you want a cleaner content representation without building your own conversion step.

How do you choose between map and crawl on the same site?

Start with /map when you want to understand the site and decide what matters. Move to /crawl when you already know the sections you want and need structured, async extraction at scale. In practice, /map is often the planning step and /crawl is the execution step.

Note that the sitemap parameter uses different values for each endpoint. Map accepts "include", "skip", or "only", while crawl accepts "auto", "force", or "skip".