Bypass Cloudflare with Playwright

May 13, 2025

contents

Introduction

Cloudflare is a leading web protection service, widely used to block bots and scrapers through sophisticated browser fingerprinting, JavaScript challenges, and CAPTCHAs. These defenses can stop most traditional scrapers cold. However, with Playwright, a modern browser automation tool, you can emulate real users more convincingly and bypass many of Cloudflare's layers. In this guide, you’ll learn how to build a resilient scraping setup using Playwright, stealth plugins, proxies, and human-like behavior to access data from Cloudflare-protected websites.

How Cloudflare Blocks Scrapers

Cloudflare uses a layered system to detect and block scrapers. Most of these defenses are invisible to regular users but are designed to catch any sign of automation. To get around them, it’s not enough to use a headless browser; you need to understand how these systems work and what they’re looking for.

The first layer involves JavaScript and browser checks. When a page loads, Cloudflare runs scripts that look for expected browser behaviors, such as how the browser renders content, how quickly it executes JS, and whether specific properties exist in the window.navigator. Tools like Playwright can run JavaScript, but using it in a default configuration often leaves signs that a real user isn’t present. That’s usually enough to trigger a block.

Then there’s TLS and JA3 fingerprinting. Every browser has a specific way it initiates secure connections, and Cloudflare captures that fingerprint during the TLS handshake. Scrapers that use different TLS configurations, especially those that don’t match popular browsers, stand out. Even if the script looks like it’s coming from Chrome, the TLS fingerprint might say otherwise.

CAPTCHAs are another defense mechanism, not just for login forms. Cloudflare can serve hCaptcha or Turnstile challenges whenever it detects something suspicious, like repeated access from the same IP, strange headers, or automation signatures. These challenges can stop your scraper completely unless you detect and solve them dynamically.

Cloudflare looks at IP reputation and request pattern, and your IP might get flagged if it is part of a known proxy pool or has made too many rapid requests. Even a small spike in traffic can result in throttling or temporary bans. Changing IPs, managing session cookies, and pacing your requests are all necessary if you want to keep access over time.

To overcome these defenses, your scraper needs to behave like a real browser and a real user. That means mimicking everything from connection-level details to UI behavior without cutting corners.

Setting Up Playwright with Stealth Mode

Playwright gives you direct access to real browser instances, Chromium, Firefox, and WebKit, all of which support full JavaScript execution and page rendering.

But to get past Cloudflare reliably, a standard browser session isn’t enough. You’ll need to take extra steps to hide signs of automation.

That’s where playwright-extra and stealth plugins help these tools modify browser characteristics that Cloudflare often checks, such as navigator.webdriver, missing WebGL features, or the presence of headless-specific headers.

To get started, install the required packages in your Node.js project:


npm install playwright-extra playwright-extra-plugin-stealth

Then, create a custom Playwright instance that uses the stealth plugin:


// Playwright-extra allows plugin support — needed for stealth
const { chromium } = require('playwright-extra');

// Load the stealth plugin and use defaults (all tricks to hide playwright usage)
// Note: playwright-extra is compatible with most puppeteer-extra plugins
const stealth = require('puppeteer-extra-plugin-stealth')()

// Important: this step must happen BEFORE launching the browser
chromium.use(stealth); // Without this, Cloudflare will likely detect automation and serve a challenge

Once that’s set up, you can randomize elements of your browser fingerprint. Viewport dimensions, user-agent strings, language headers, and timezone values contribute to whether the session looks human or automated. These small details matter because Cloudflare’s detection looks at inconsistencies across multiple signals.


const browser = await chromium.launch({ headless: false }); // Headed mode reduces detection; avoid headless=true if possible

const context = await browser.newContext({
  // Real users don’t have consistent viewports — this helps avoid fingerprint mismatches
  viewport: {
    width: 1280 + Math.floor(Math.random() * 100), // Randomize a bit
    height: 720 + Math.floor(Math.random() * 100)
  },
  userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)...', // Use a real user-agent, ideally from your proxy location
  locale: 'en-US',                    // Match browser locale to IP region
  timezoneId: 'America/New_York',     // Timezone mismatches are a red flag in Cloudflare fingerprinting
});

Sessions should be persistent, reusing cookies and local storage data across requests helps make your scraper less suspicious. You can save and load the browser context from disk instead of starting from a clean slate every time.


const userDataDir = './session-profile'; // This folder stores cookies, localStorage, etc.

const browser = await chromium.launchPersistentContext(userDataDir, {
  headless: false, // Again, headed = more human-like
  args: ['--start-maximized'] // Optional, but full-screen windows mimic real usage
});

// Use this approach if the site expects you to "stay logged in" or keep a shopping cart session
// Note: Too many local sessions can get messy, rotate or clean up as needed

With this setup, your Playwright sessions behave more like real browser activity and less like automation. That gives you a better chance of bypassing Cloudflare without being blocked on the first request.

Rotating Proxies and Handling CAPTCHA Challenges

Cloudflare watches traffic patterns across IP addresses. If an IP sends too many requests, triggers multiple challenges, or matches known bad behavior, it can be throttled or blocked entirely.

To reduce the chances of that happening, you can rotate through residential or datacenter proxies using the --proxy-server flag in Playwright. This gives each session a different IP, which helps distribute your request volume and avoid detection.

Here’s how to launch a Playwright browser with a proxy:


onst browser = await chromium.launch({
  headless: false,
  args: [
    // Always use authenticated proxies, preferably residential/mobile
    '--proxy-server=http://username:password@proxy-ip:port' // Format: scheme://user:pass@host:port
  ]
});

const page = await browser.newPage(); // Proxy is applied to this page and all further requests

// Caveat: Avoid using the same proxy too frequently, or you’ll get rate-limited or banned

Cloudflare might still challenge the request with a CAPTCHA even with a fresh IP. When that happens, your scraper must detect and solve it or skip the page. One way to do that is by checking for an iframe that loads a CAPTCHA, like this:


// Check if Cloudflare is presenting a CAPTCHA challenge
const isCaptchaPresent = await page.$('iframe[src*="captcha"]');

if (isCaptchaPresent) {
  console.log('CAPTCHA detected – will need to solve or switch proxy');
}

// Important: Cloudflare can serve invisible challenges too, always monitor response timing and errors, not just DOM

For sites using hCaptcha or reCAPTCHA, third-party services like 2Captcha or CapMonster can solve challenges programmatically. These services take a sitekey and page URL, return a token, and then you inject that token into the form. Some tools (like @extra/recaptcha) simplify this by automating the full solve step:


const RecaptchaPlugin = require('@extra/recaptcha');

// Register plugin to handle reCAPTCHA/hCaptcha solving via 2Captcha
chromium.use(
  RecaptchaPlugin({
    provider: {
      id: '2captcha',
      token: 'YOUR_2CAPTCHA_API_KEY' // Note: This burns credits every time; don’t solve CAPTCHAs unnecessarily
    },
    visualFeedback: true // Shows animations like checkbox checking (great for debugging)
  })
);

// This solves any visible captchas on the current page — usually required on Cloudflare-protected forms
await page.solveRecaptchas();

// Caveat: If CAPTCHA fails, you’ll need to rotate proxy or log the error

Once the CAPTCHA is handled or skipped, you can proceed with your scrape as normal. To prevent the CAPTCHA from appearing again on the next visit, save cookies and session storage data after a successful scrape. This creates continuity between visits and helps the session look more human.


// After a successful login or scrape, save the cookies
const cookies = await context.cookies();
fs.writeFileSync('./cookies.json', JSON.stringify(cookies, null, 2)); // Save to disk

// On future runs, restore session to avoid challenges/login
const savedCookies = JSON.parse(fs.readFileSync('./cookies.json'));
await context.addCookies(savedCookies);

// This makes your scraper look like it's just a returning user huge advantage on Cloudflare or login-gated sites

Rotating proxies, solving CAPTCHAs, and maintaining session state work together to help you get through Cloudflare’s layers without interruption. Scrapers that skip these steps usually don’t last more than a few requests.

Scraping After Cloudflare with Playwright

Once you've configured stealth mode and rotated proxies, you can move on to an actual scraping flow. This part combines everything: proxy setup, fingerprint masking, challenge detection, and data extraction. You're not just testing if the page loads; you’re trying to access real content without getting flagged midway through the session.

To start, launch Playwright with your proxy configured. Ensure it’s a working residential or datacenter proxy with low block rates. Then, set a real user-agent string to replace the default one, which is often tied to automation.


// Launch the browser with a proxy assigned
const browser = await chromium.launch({
  headless: false, // Run in full (headed) mode — helps bypass basic bot checks
  args: [
    `--proxy-server=${proxy}` // Proxy format: http://username:password@ip:port
  ]
});

// Create a fresh context (new browser profile, isolated cookies, etc.)
const context = await browser.newContext();
const page = await context.newPage();

// Set a realistic User-Agent string to avoid detection
await page.setUserAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36");
// Caveats:
// - Avoid reusing the same proxy & UA combo too often
// - Cloudflare may still fingerprint you on TLS/JA3 signature rotating IP alone won’t always help

Once the browser is ready, load the target page for Cloudflare-protected sites. It’s better to wait a few extra seconds to let JavaScript challenges pass quietly in the background. These challenges don’t always show visual feedback, so that that timing can matter.


await page.goto("https://target-cloudflare-site.com", { waitUntil: "domcontentloaded" });
await page.waitForTimeout(5000); // gives Cloudflare a chance to finish background checks

Check if a CAPTCHA exists; you can detect this by searching for known patterns in iframes or specific HTML containers. If it’s present, you can call your CAPTCHA handler, retry with a different proxy, or mark the proxy as burned.


// Attempt to detect if CAPTCHA is present before scraping
const captchaPresent = await page.$('iframe[src*="captcha"]');

if (captchaPresent) {
  console.log("CAPTCHA detected");
  // Either solve, switch to another proxy, or delay the request
  // Pro tip: mark this proxy as "burned" in a proxy pool so you don’t reuse it too soon
} else {
  const content = await page.evaluate(() => document.body.innerText); // Grab visible content
  console.log("Scraped content:", content);
}

Once the content is accessible, store session cookies and scrape the needed data. You’ve cleared the challenge and landed on the real page. From here, your scraper can move forward confidently, pull the data, close the session cleanly, or continue cycling through the next URL with a new proxy.

This pattern works well for batches of URLs, especially if you’re tracking proxy performance and retrying only the ones that hit a CAPTCHA or fail the JS checks. It’s repeatable and flexible without relying on third-party scraping tools.

Using BQL to Handle Stubborn Cloudflare Pages

Sometimes, even a well-configured Playwright setup doesn’t cut through Cloudflare. You’ve got stealth, proxies, and CAPTCHA solvers, all wired up, but the site still detects automation and either blocks access or keeps you stuck in a challenge loop.

When that happens consistently, offloading the scraping to a headless browser-as-a-service like Browserless, specifically using BQL (Browserless Query Language), is often the most reliable path forward.

Cloudflare CAPTCHAs are one of the biggest pain points in browser automation, especially when sites Cloudflare serves them in iframes, shadow DOMs, or in response to unusual behavior.

With Playwright, you’d usually need to detect the CAPTCHA manually, integrate with a solver like 2Captcha, inject tokens, and cross your fingers.

Browserless simplifies all of that with two built-in mutations:

verify: For Cloudflare's "Are you human?" checks (the simple click-to-proceed pages).
solve: For full hCaptcha and reCAPTCHA challenges.

Example 1: Bypass Cloudflare’s “Human Check”


mutation VerifyChallenge {
  goto(url: "https://protected.domain") {
    status
  }

  verify(type: cloudflare) {
    found    # true if a challenge was detected
    solved   # true if it was auto-handled
    time     # how long it took in ms
  }
}

This works for the common Cloudflare interstitials that just want a human presence, no CAPTCHA solving involved. If found, BQL clicks the verify button for you.

Example 2: Solve hCaptcha or reCAPTCHA Automatically


mutation SolveCaptcha {
  goto(url: "https://protected.domain") {
    status
  }

  solve(type: hcaptcha) {
    found
    solved
    time
  }
}

BQL detects the CAPTCHA, finds the form, solves it, and returns structured feedback without installing third-party solvers or APIs.

Conclusion

Cloudflare makes scraping harder than ever, especially with new challenges like Turnstile, stronger fingerprinting, and aggressive rate-limiting. Playwright still gives you a powerful edge, especially when combined with stealth plugins, solid proxy hygiene, and session-aware automation. But it's worth leveling up if you’re hitting limits with local scripts or dealing with constant maintenance to keep things working. Browserless with BQL is built for these kinds of scraping jobs. It handles the stealth, scale, and infrastructure so you can focus on getting the data you need. Start using BQL and spend less time solving CAPTCHAs, more time shipping scrapers that actually work.

FAQs

Can Playwright bypass Cloudflare Turnstile and hCaptcha in 2025?

Yes, Playwright can sometimes bypass both Cloudflare Turnstile and hCaptcha, but it’s not something that works out of the box. These increasingly sophisticated challenges require more than running a headless browser. You’ll need to integrate third-party CAPTCHA-solving services like 2Captcha or CapMonster, detect the presence of CAPTCHAs on the page using selectors (such as iframe containers for Turnstile or hCaptcha), and solve them programmatically before continuing with scraping. Using playwright-extra with stealth plugins helps reduce the chances of being flagged before the challenge even appears. Pairing that with persistent browser sessions and cookies can help reduce how often CAPTCHAs are triggered across multiple requests.

Why does Cloudflare still block Playwright even with stealth mode enabled?

Playwright can still get blocked even with stealth mode enabled because Cloudflare looks far beyond simple browser signals. While stealth plugins help mask things like navigator.webdriver and common headless indicators, they don’t cover deeper-level fingerprinting. Cloudflare analyzes TLS signatures (like JA3), HTTP/2 frame order, and browser consistency, for example, to determine whether your timezone, language, and IP region align. It raises suspicion if your proxy is in one country but your browser fingerprint says you're in another. Randomizing viewport, fonts, geolocation, and language headers is helpful, but sometimes it’s not enough without matching all fingerprint layers.

What’s the best proxy setup for scraping Cloudflare-protected websites with Playwright?

The most effective proxy setup for scraping Cloudflare-protected sites with Playwright involves using rotating residential or mobile proxies that support authentication. Datacenter proxies tend to get flagged quickly, especially if shared across users. For Playwright, proxies can be passed via the --proxy-server launch argument. It’s also important to keep your browser profile consistent with the proxy things like language headers, timezone, and user agent, which should all align with the IP's country and region. If your target site uses geo-based filtering or fingerprinting, matching these values can help reduce the chance of being challenged.

When should you switch from Playwright to Browserless or BQL for Cloudflare scraping?

If you’re running into repeated blocks, solving CAPTCHAs manually, or struggling to scale scraping across multiple pages or sessions, switching to Browserless or BQL is probably time. These are cloud-native solutions designed for scraping at scale, with built-in support for stealth, proxy rotation, session management, and CAPTCHA solving. You don’t have to manage browser instances or infrastructure manually. BQL (Browserless Query Language) simplifies scraping by allowing you to define scrape logic declaratively through an API. It’s especially useful when scraping thousands of pages concurrently or maintaining stable, long-running scraping jobs without micromanaging every browser interaction.

Share this article