Smart Scrape API

Scrape any URL.
Let Browserless figure out the rest.

The Smart Scrape API adapts to any challenge automatically, escalating from fast HTTP fetching to headless browsers and captcha solving as needed.

How our smart scraper works

Browserless's smart scraper automatically escalates from fast HTTP fetching to headless browsers and captcha solving as required.

Start fast and adapt
Smart Scrape API starts with the fastest, cheapest approach and moves to more powerful strategies if needed.
Exactly what you want
The API stops as soon as it has your data, or you can add a timeout query parameter for more control.
All in a single request
Specify whether you want HTML, Markdown, screenshots, PDFs, or extracted link outputs.

How Smart Scrape adapts

With a simple request, our scraper API will find the right web scraping method, stop when successful, and share full details of the approaches used.

1

Fast HTTP fetch

A lightweight HTTP request that mimics a real browser's network fingerprint, handling the majority of static and server-rendered sites in under 2 seconds.

Fastest
It's blocked by datacenter IP detection
2

Proxied HTTP fetch

The same request retried through a residential proxy, bypassing datacenter IP blocks without the overhead of launching a full browser.

Proxied
The page requires JavaScript rendering
3

Headless browser

A full stealth browser renders the page, handling single-page apps, client-rendered content, and any site that needs JavaScript to load its data.

Headless
CAPTCHA or bot challenge detected
4

Browser and CAPTCHA solving

Automatically detects and solves most CAPTCHA categories, Cloudflare Turnstile, and others before extracting your data.

Full unlock

One request. Five output formats.

Pass the formats you need in a single request. Smart Scrape returns all of them together.

html
The full rendered HTML of the page, always included in the content field regardless of other formats requested.
markdown
The page content converted to clean Markdown, with scripts, styles, and non-visible elements stripped out.
screenshot
A full-page screenshot returned as a base64-encoded PNG. Requires the use of a headless browser.
pdf
The page rendered as a base64-encoded PDF. Like screenshots, this always uses a headless browser.
links
All <a href> links extracted from the page, with relative URLs resolved to absolute and non-HTTP links filtered out.

Quickstart code to harness Smart Scrape API

POST a URL and the formats you need. Smart Scrape decides the strategy, escalates if it needs to, and returns everything in one JSON response.

  • Available in cURL, JavaScript, and Python
  • The strategy field tells you which approach succeeded
  • The attempted array shows the full sequence tried
  • JSON API endpoints are auto-parsed – the content field returns a parsed object, not a string
  • Timeout is configurable per request via query parameter
const scrape = async () => {
  const TOKEN = "YOUR_API_TOKEN_HERE";
  const url = `https://production-sfo.browserless.io/smart-scrape?token=${TOKEN}`;

  const response = await fetch(url, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      url: 'https://example.com',
      formats: ['html', 'markdown', 'links']
    })
  });

  const result = await response.json();
  // result.strategy → which approach succeeded
  // result.markdown → clean Markdown content
  // result.links   → extracted absolute URLs
};

What you get from the Smart Scrape API

Successful scrape responses include the ok field to confirm success. Auth, timeout, and rate-limit failures use standard HTTP error statuses.

FieldTypeDescription
ok
boolean
Whether the scrape succeeded.
statusCode
numbernull
The HTTP status code from the target site, or null on network errors.
content
stringobjectnull
Page content as HTML string, or a parsed JSON object if the target returns application/json. null on failure.
contentType
stringnull
The content type of the scraped page.
headers
object
HTTP response headers from the target site.
strategy
string
The strategy that produced the result, or was being attempted on failure.
attempted
string[]
All strategies attempted, in order.
message
stringnull
Error message on failure, null on success.
screenshot
stringnull
Base64-encoded PNG screenshot, when screenshot is in formats.
pdf
stringnull
Base64-encoded PDF, when pdf is in formats.
markdown
stringnull
Markdown conversion of the page, when markdown is in formats.
links
string[]null
Extracted links, when links is in formats.
Trusted by developers

What our customers say

Teams choose Browserless to stop managing browser infrastructure and start shipping.

I found Browserless and had our Puppeteer code running within an hour. The scrapes are now 5x faster and 1/3rd of the price, plus the support has been excellent.
NS

Nicklas Smit

Full-Stack Developer, Takeoff Copenhagen

I set aside a day for the integration, but it only took a couple of hours. I didn’t need to become an expert in managing proxy servers or virtual computers.
MH

Mike Heap

Founder, My AskAI

Browserless helped us focus on the problem we were trying to solve, and less on scaling an automation infrastructure.
BL

Browserless customer

Enterprise team

Smart Scrape FAQs

Ready to try the Smart Scrape API?

Start free. No credit card required. Production-ready in minutes.