Smart Scrape API

Scrape any URL.
Let Browserless figure out the rest.

The Smart Scrape API adapts to any challenge automatically, escalating from fast HTTP fetching to headless browsers and captcha solving as needed.

How our smart scraper works

Browserless's smart scraper automatically escalates from fast HTTP fetching to headless browsers and captcha solving as required.

Start fast and adapt

Smart Scrape API starts with the fastest, cheapest approach and moves to more powerful strategies if needed.

Exactly what you want

The API stops as soon as it has your data, or you can add a timeout query parameter for more control.

All in a single request

Specify whether you want HTML, Markdown, screenshots, PDFs, or extracted link outputs.

How Smart Scrape adapts

With a simple request, our scraper API will find the right web scraping method, stop when successful, and share full details of the approaches used.

Fast HTTP fetch

A lightweight HTTP request that mimics a real browser's network fingerprint, handling the majority of static and server-rendered sites in under 2 seconds.

Fastest

↓It's blocked by datacenter IP detection

Proxied HTTP fetch

The same request retried through a residential proxy, bypassing datacenter IP blocks without the overhead of launching a full browser.

Proxied

↓The page requires JavaScript rendering

Headless browser

A full stealth browser renders the page, handling single-page apps, client-rendered content, and any site that needs JavaScript to load its data.

Headless

↓CAPTCHA or bot challenge detected

Browser and CAPTCHA solving

Automatically detects and solves most CAPTCHA categories, Cloudflare Turnstile, and others before extracting your data.

Full unlock

One request. Five output formats.

Pass the formats you need in a single request. Smart Scrape returns all of them together.

html

The full rendered HTML of the page, always included in the content field regardless of other formats requested.

markdown

The page content converted to clean Markdown, with scripts, styles, and non-visible elements stripped out.

screenshot

A full-page screenshot returned as a base64-encoded PNG. Requires the use of a headless browser.

pdf

The page rendered as a base64-encoded PDF. Like screenshots, this always uses a headless browser.

links

All <a href> links extracted from the page, with relative URLs resolved to absolute and non-HTTP links filtered out.

Quickstart code to harness Smart Scrape API

POST a URL and the formats you need. Smart Scrape decides the strategy, escalates if it needs to, and returns everything in one JSON response.

✓Available in cURL, JavaScript, and Python
✓The strategy field tells you which approach succeeded
✓The attempted array shows the full sequence tried
✓JSON API endpoints are auto-parsed – the content field returns a parsed object, not a string
✓Timeout is configurable per request via query parameter

const scrape = async () => {
  const TOKEN = "YOUR_API_TOKEN_HERE";
  const url = `https://production-sfo.browserless.io/smart-scrape?token=${TOKEN}`;

  const response = await fetch(url, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      url: 'https://example.com',
      formats: ['html', 'markdown', 'links']
    })
  });

  const result = await response.json();
  // result.strategy → which approach succeeded
  // result.markdown → clean Markdown content
  // result.links   → extracted absolute URLs
};

What you get from the Smart Scrape API

Successful scrape responses include the ok field to confirm success. Auth, timeout, and rate-limit failures use standard HTTP error statuses.

Field	Type	Description
`ok`	`boolean`	Whether the scrape succeeded.
`statusCode`	`numbernull`	The HTTP status code from the target site, or `null` on network errors.
`content`	`stringobjectnull`	Page content as HTML string, or a parsed JSON object if the target returns `application/json`. `null` on failure.
`contentType`	`stringnull`	The content type of the scraped page.
`headers`	`object`	HTTP response headers from the target site.
`strategy`	`string`	The strategy that produced the result, or was being attempted on failure.
`attempted`	`string[]`	All strategies attempted, in order.
`message`	`stringnull`	Error message on failure, `null` on success.
`screenshot`	`stringnull`	Base64-encoded PNG screenshot, when `screenshot` is in `formats`.
`pdf`	`stringnull`	Base64-encoded PDF, when `pdf` is in `formats`.
`markdown`	`stringnull`	Markdown conversion of the page, when `markdown` is in `formats`.
`links`	`string[]null`	Extracted links, when `links` is in `formats`.

Trusted by developers

What our customers say

Teams choose Browserless to stop managing browser infrastructure and start shipping.

“I found Browserless and had our Puppeteer code running within an hour. The scrapes are now 5x faster and 1/3rd of the price, plus the support has been excellent.”

Nicklas Smit

Full-Stack Developer, Takeoff Copenhagen

“I set aside a day for the integration, but it only took a couple of hours. I didn’t need to become an expert in managing proxy servers or virtual computers.”

Mike Heap

Founder, My AskAI

“Browserless helped us focus on the problem we were trying to solve, and less on scaling an automation infrastructure.”

Browserless customer

Enterprise team

Smart Scrape FAQs

Ready to try the Smart Scrape API?

Start free. No credit card required. Production-ready in minutes.

View Pricing

Scrape any URL.Let Browserless figure out the rest.

How our smart scraper works

How Smart Scrape adapts

Fast HTTP fetch

Proxied HTTP fetch

Headless browser

Browser and CAPTCHA solving

One request. Five output formats.

Quickstart code to harness Smart Scrape API

What you get from the Smart Scrape API

What our customers say

Smart Scrape FAQs

How does the cascading strategy pipeline work?

What output formats can I request?

How do I know which strategy was used?

Does Smart Scrape handle JSON API endpoints?

Can I control how long each strategy attempt is allowed to run?

Ready to try the Smart Scrape API?

Cookie Preferences

Scrape any URL.
Let Browserless figure out the rest.