Introduction
Cloudflare is a leading web protection service, widely used to block bots and scrapers through sophisticated browser fingerprinting, JavaScript challenges, and CAPTCHAs. These defenses can stop most traditional scrapers cold. However, with Playwright, a modern browser automation tool, you can emulate real users more convincingly and bypass many of Cloudflare's layers. In this guide, you’ll learn how to build a resilient scraping setup using Playwright, stealth plugins, proxies, and human-like behavior to access data from Cloudflare-protected websites.
How Cloudflare Blocks Scrapers
Cloudflare uses a layered system to detect and block scrapers. Most of these defenses are invisible to regular users but are designed to catch any sign of automation. To get around them, it’s not enough to use a headless browser; you need to understand how these systems work and what they’re looking for.
The first layer involves JavaScript and browser checks. When a page loads, Cloudflare runs scripts that look for expected browser behaviors, such as how the browser renders content, how quickly it executes JS, and whether specific properties exist in the window.navigator
. Tools like Playwright can run JavaScript, but using it in a default configuration often leaves signs that a real user isn’t present. That’s usually enough to trigger a block.
Then there’s TLS and JA3 fingerprinting. Every browser has a specific way it initiates secure connections, and Cloudflare captures that fingerprint during the TLS handshake. Scrapers that use different TLS configurations, especially those that don’t match popular browsers, stand out. Even if the script looks like it’s coming from Chrome, the TLS fingerprint might say otherwise.
CAPTCHAs are another defense mechanism, not just for login forms. Cloudflare can serve hCaptcha or Turnstile challenges whenever it detects something suspicious, like repeated access from the same IP, strange headers, or automation signatures. These challenges can stop your scraper completely unless you detect and solve them dynamically.
Cloudflare looks at IP reputation and request pattern, and your IP might get flagged if it is part of a known proxy pool or has made too many rapid requests. Even a small spike in traffic can result in throttling or temporary bans. Changing IPs, managing session cookies, and pacing your requests are all necessary if you want to keep access over time.
To overcome these defenses, your scraper needs to behave like a real browser and a real user. That means mimicking everything from connection-level details to UI behavior without cutting corners.
Setting Up Playwright with Stealth Mode
Playwright gives you direct access to real browser instances, Chromium, Firefox, and WebKit, all of which support full JavaScript execution and page rendering.
But to get past Cloudflare reliably, a standard browser session isn’t enough. You’ll need to take extra steps to hide signs of automation.
That’s where playwright-extra
and stealth plugins help these tools modify browser characteristics that Cloudflare often checks, such as navigator.webdriver
, missing WebGL features, or the presence of headless-specific headers.
To get started, install the required packages in your Node.js project:
Then, create a custom Playwright instance that uses the stealth plugin:
Once that’s set up, you can randomize elements of your browser fingerprint. Viewport dimensions, user-agent strings, language headers, and timezone values contribute to whether the session looks human or automated. These small details matter because Cloudflare’s detection looks at inconsistencies across multiple signals.
Sessions should be persistent, reusing cookies and local storage data across requests helps make your scraper less suspicious. You can save and load the browser context from disk instead of starting from a clean slate every time.
With this setup, your Playwright sessions behave more like real browser activity and less like automation. That gives you a better chance of bypassing Cloudflare without being blocked on the first request.
Rotating Proxies and Handling CAPTCHA Challenges
Cloudflare watches traffic patterns across IP addresses. If an IP sends too many requests, triggers multiple challenges, or matches known bad behavior, it can be throttled or blocked entirely.
To reduce the chances of that happening, you can rotate through residential or datacenter proxies using the --proxy-server
flag in Playwright. This gives each session a different IP, which helps distribute your request volume and avoid detection.
Here’s how to launch a Playwright browser with a proxy:
Cloudflare might still challenge the request with a CAPTCHA even with a fresh IP. When that happens, your scraper must detect and solve it or skip the page. One way to do that is by checking for an iframe that loads a CAPTCHA, like this:
For sites using hCaptcha or reCAPTCHA, third-party services like 2Captcha or CapMonster can solve challenges programmatically. These services take a sitekey and page URL, return a token, and then you inject that token into the form. Some tools (like @extra/recaptcha
) simplify this by automating the full solve step:
Once the CAPTCHA is handled or skipped, you can proceed with your scrape as normal. To prevent the CAPTCHA from appearing again on the next visit, save cookies and session storage data after a successful scrape. This creates continuity between visits and helps the session look more human.
Rotating proxies, solving CAPTCHAs, and maintaining session state work together to help you get through Cloudflare’s layers without interruption. Scrapers that skip these steps usually don’t last more than a few requests.
Scraping After Cloudflare with Playwright
Once you've configured stealth mode and rotated proxies, you can move on to an actual scraping flow. This part combines everything: proxy setup, fingerprint masking, challenge detection, and data extraction. You're not just testing if the page loads; you’re trying to access real content without getting flagged midway through the session.
To start, launch Playwright with your proxy configured. Ensure it’s a working residential or datacenter proxy with low block rates. Then, set a real user-agent string to replace the default one, which is often tied to automation.
Once the browser is ready, load the target page for Cloudflare-protected sites. It’s better to wait a few extra seconds to let JavaScript challenges pass quietly in the background. These challenges don’t always show visual feedback, so that that timing can matter.
Check if a CAPTCHA exists; you can detect this by searching for known patterns in iframes or specific HTML containers. If it’s present, you can call your CAPTCHA handler, retry with a different proxy, or mark the proxy as burned.
Once the content is accessible, store session cookies and scrape the needed data. You’ve cleared the challenge and landed on the real page. From here, your scraper can move forward confidently, pull the data, close the session cleanly, or continue cycling through the next URL with a new proxy.
This pattern works well for batches of URLs, especially if you’re tracking proxy performance and retrying only the ones that hit a CAPTCHA or fail the JS checks. It’s repeatable and flexible without relying on third-party scraping tools.
Using BQL to Handle Stubborn Cloudflare Pages
Sometimes, even a well-configured Playwright setup doesn’t cut through Cloudflare. You’ve got stealth, proxies, and CAPTCHA solvers, all wired up, but the site still detects automation and either blocks access or keeps you stuck in a challenge loop.
When that happens consistently, offloading the scraping to a headless browser-as-a-service like Browserless, specifically using BQL (Browserless Query Language), is often the most reliable path forward.
Cloudflare CAPTCHAs are one of the biggest pain points in browser automation, especially when sites Cloudflare serves them in iframes, shadow DOMs, or in response to unusual behavior.
With Playwright, you’d usually need to detect the CAPTCHA manually, integrate with a solver like 2Captcha, inject tokens, and cross your fingers.
Browserless simplifies all of that with two built-in mutations:
verify
: For Cloudflare's "Are you human?" checks (the simple click-to-proceed pages).solve
: For full hCaptcha and reCAPTCHA challenges.
Example 1: Bypass Cloudflare’s “Human Check”
This works for the common Cloudflare interstitials that just want a human presence, no CAPTCHA solving involved. If found, BQL clicks the verify button for you.
Example 2: Solve hCaptcha or reCAPTCHA Automatically
BQL detects the CAPTCHA, finds the form, solves it, and returns structured feedback without installing third-party solvers or APIs.
Conclusion
Cloudflare makes scraping harder than ever, especially with new challenges like Turnstile, stronger fingerprinting, and aggressive rate-limiting. Playwright still gives you a powerful edge, especially when combined with stealth plugins, solid proxy hygiene, and session-aware automation. But it's worth leveling up if you’re hitting limits with local scripts or dealing with constant maintenance to keep things working. Browserless with BQL is built for these kinds of scraping jobs. It handles the stealth, scale, and infrastructure so you can focus on getting the data you need. Start using BQL and spend less time solving CAPTCHAs, more time shipping scrapers that actually work.
FAQs
Can Playwright bypass Cloudflare Turnstile and hCaptcha in 2025?
Yes, Playwright can sometimes bypass both Cloudflare Turnstile and hCaptcha, but it’s not something that works out of the box. These increasingly sophisticated challenges require more than running a headless browser. You’ll need to integrate third-party CAPTCHA-solving services like 2Captcha or CapMonster, detect the presence of CAPTCHAs on the page using selectors (such as iframe containers for Turnstile or hCaptcha), and solve them programmatically before continuing with scraping. Using playwright-extra
with stealth plugins helps reduce the chances of being flagged before the challenge even appears. Pairing that with persistent browser sessions and cookies can help reduce how often CAPTCHAs are triggered across multiple requests.
Why does Cloudflare still block Playwright even with stealth mode enabled?
Playwright can still get blocked even with stealth mode enabled because Cloudflare looks far beyond simple browser signals. While stealth plugins help mask things like navigator.webdriver
and common headless indicators, they don’t cover deeper-level fingerprinting. Cloudflare analyzes TLS signatures (like JA3), HTTP/2 frame order, and browser consistency, for example, to determine whether your timezone, language, and IP region align. It raises suspicion if your proxy is in one country but your browser fingerprint says you're in another. Randomizing viewport, fonts, geolocation, and language headers is helpful, but sometimes it’s not enough without matching all fingerprint layers.
What’s the best proxy setup for scraping Cloudflare-protected websites with Playwright?
The most effective proxy setup for scraping Cloudflare-protected sites with Playwright involves using rotating residential or mobile proxies that support authentication. Datacenter proxies tend to get flagged quickly, especially if shared across users. For Playwright, proxies can be passed via the --proxy-server
launch argument. It’s also important to keep your browser profile consistent with the proxy things like language headers, timezone, and user agent, which should all align with the IP's country and region. If your target site uses geo-based filtering or fingerprinting, matching these values can help reduce the chance of being challenged.
When should you switch from Playwright to Browserless or BQL for Cloudflare scraping?
If you’re running into repeated blocks, solving CAPTCHAs manually, or struggling to scale scraping across multiple pages or sessions, switching to Browserless or BQL is probably time. These are cloud-native solutions designed for scraping at scale, with built-in support for stealth, proxy rotation, session management, and CAPTCHA solving. You don’t have to manage browser instances or infrastructure manually. BQL (Browserless Query Language) simplifies scraping by allowing you to define scrape logic declaratively through an API. It’s especially useful when scraping thousands of pages concurrently or maintaining stable, long-running scraping jobs without micromanaging every browser interaction.