Key Takeaways
- Playwright Codegen is a starting point, not production-ready; as a powerful tool for rapid prototyping and automation, it speeds up prototyping, but the raw output is brittle. To scale, scripts need hardened selectors, a modular design, and proper session handling.
- Browserless makes automation scalable and resilient. Running Playwright through Browserless unlocks auto-scaling, session pooling, and observability, while BrowserQL handles stealth automation, CAPTCHA solving, and anti-bot evasion.
- Operational maturity is what makes automation production-grade. Concurrency controls, proxy rotation, retries, and debugging hooks (logs, video, tracing) transform simple recorded scripts into reliable systems that can handle real-world workloads.
Introduction
Playwright Codegen is one of the quickest ways to kickstart browser automation. Just hit record, click around, and it spits out runnable automation scripts in your language of choice for browser testing, perfect for getting started quickly. Playwright Codegen can automatically generate scripts from user interactions, making it easy to get an automation idea off the ground without hand-coding everything. Of course, the raw output isn’t production-ready: selectors can be brittle, auth flows aren’t hardened, and things get messy at scale. In this post, we’ll look at how to make the most of Codegen for test creation, share a few best practices to keep scripts reliable, and show how Browserless + BrowserQL can turn those quick prototypes into solid, production-grade automation.
What is Playwright Codegen?
Playwright Codegen is a test generation utility designed to rapidly scaffold end-to-end tests by recording real user interactions in a browser session, specifically capturing user actions such as clicks, form fills, and navigations. It records these user actions and converts them into executable Playwright code as you navigate the application. Run:
…and you get two windows: a browser for interaction, and the Playwright Inspector showing a real-time, line-by-line trace of generated code. While you interact, Playwright Codegen is recording user actions in real time to generate test scripts. The code reflects raw user behavior navigations, clicks, text inputs rendered as page.goto, page.click, page.fill, etc., with basic selector inference.
While this is helpful for quickly capturing flows like logins or form submissions, the generated code is entirely procedural and lacks abstraction, validation, or resilience. Selectors are inferred using heuristics often text=, nth-child, or XPath variants which are readable but structurally fragile. Any DOM shifts or styling refactors will silently break the tests.
Codegen also doesn’t include assertions by default. If you don’t proactively layer in checks like await expect(locator).toHaveText(…), your test will pass even if the UI fails silently. Timing is another issue: it may inject waitForTimeout(…) in response to perceived latency, creating false confidence that the script is stable, when in reality it’s sleeping instead of synchronizing on app state.
You can and should intervene via the Inspector. It lets you:
- Inspect element locators and swap in more robust selectors (e.g. data-testid)
- Insert runtime assertions to harden transitions (e.g., validating a redirect after login)
- Trim unnecessary actions or fine-tune logic before saving the script
But out of the box, Codegen is more of a recorder than a framework. It gives you the skeleton, but not the muscles. Playwright Codegen serves as a test generator by automating the creation of test scripts from recorded actions.
Creating a New Test with Playwright Codegen
You can record a new test script by starting a session in Playwright Codegen. It allows you to perform actions and interact with your website or web page in a real browser, and Playwright writes the code for you.
This command opens two windows:
- A browser window where you perform actions and interact with the website just like a normal user.
- The Playwright Inspector displays the test code being generated in real-time.
Every interaction, click, typing, and navigation while interacting with the web page gets recorded as actual Playwright code. Each test script is a generated script created from your interactions, like this:
Why this is useful:
- You get working code fast, without manually figuring out selectors.
- It mirrors real user behavior, which is exactly what you want in end-to-end tests.
- It’s a great way to bootstrap a new test suite or prototype flows before refining.
- Playwright Codegen helps you create automated test scripts quickly by recording your actions.
Using the Playwright Inspector (and Why It’s Awesome)
The Inspector is more than just a code viewer, it’s an interactive tool that helps you refine your tests as you go.
How it helps:
- Live code preview: You see exactly what Playwright is generating, so you can spot brittle selectors or missing steps early.
- Element picker: Hover over elements in the browser and pick better locators (like data-test attributes) instead of relying on flaky auto-generated selectors.
- Inline editing: You can tweak code right in the Inspector before pasting it into your test file. You can also insert or record actions at the current cursor position within the script, making it easy to target precise points for automation steps.
- Add assertions on the fly: For example, after a login, you might add:
This turns your recording session into a more intentional test-writing process. Instead of just dumping raw code, you’re building reliable, maintainable tests as you go.
Putting It All Together
When you’re done recording:
- Copy the script generated by Codegen into a spec file (e.g. login.spec.js).
- Drop the spec into your test suite.
- Run the test with:
A successful script run validates that the recorded flow works as expected.
From there, you can keep refining: clean up selectors, abstract common steps, or parameterize values.
Pro tip: Use data-test or data-testid attributes in your app. They make your selectors rock solid, preventing tests from breaking whenever a class name changes.
Codegen Output vs. Production-Grade Automation
The output from Codegen is sufficient to simulate happy-path interactions and generate initial automated tests and test scripts.
But if you’re running automation in CI, automated browser tests can be executed across different environments such as local machines, cloud infrastructure, or real devices to ensure comprehensive coverage. As your UI evolves, you’ll need to apply engineering rigor.
What Codegen gives you:
What it should become:
This shift introduces:
- Encapsulation: common flows abstracted into helpers
- Assertions: validation of application state after each action
- Resilience: decoupling from fragile selectors and static timing
- Maintainability: structure that scales across hundreds of tests
Codegen is a great prototyping tool; it saves time on scaffolding flows, especially when exploring new features or reverse-engineering UI logic. But in production contexts, CI pipelines, regression coverage, distributed execution, you’ll need hardened selectors, runtime assertions, error handling, and session management.
Think of Codegen as an interactive macro recorder: great for scaffolding, insufficient for scale. Use it to capture the what, then apply engineering to define the how and why.
Generating a Clean, Reliable Baseline Script
Before you begin, make sure to install Playwright using the appropriate commands in your command line. This is a crucial prerequisite for generating reliable automation scripts.
The default output from npx playwright codegen is useful for quick demos, but without tuning the environment, it’s unlikely to survive even modest changes in your app or CI pipeline.
Recording scripts in a vacuum without accounting for authentication state, viewport, locale, or session artifacts results in fragile tests that break under real-world load, parallelization, or internationalization.
A production-grade baseline requires more than capturing clicks and typing. It means baking in an operational context during the recording phase: simulating target devices, persisting authentication, enforcing viewport consistency, and capturing debug artifacts for postmortem analysis.
Instead of relying on brittle defaults, you can run Codegen with flags that reflect your actual test environment. This ensures the scripts it generates are not only runnable but structurally aligned with CI/CD, infrastructure, and production behavior.
You can start by launching Codegen from the command line by running the following command:
This command will generate a recorded script that captures your interactions as you navigate across multiple pages, providing a robust starting point for your automated tests.
Core Environment Flags That Make Codegen Smarter
These flags help you control the browser environment and simulate the conditions your users actually encounter:
- –save-storage=auth.json
Captures login/session state during recording. Later, load it via storageState: 'auth.json' in your Playwright configuration file to skip repetitive auth flows, improving speed and test reliability. - –device=”iPhone 12”
Applies touch input, screen size, and user-agent emulation. Vital for recording mobile/responsive flows in apps that conditionally render mobile-specific UIs. - –lang=”fr-FR”
Forces the browser to use a specific locale for headers, date formatting, and text rendering. Useful for verifying localized content and language-specific UX issues. - –proxy=”http://user:pass@host:8080”)
Routes) traffic through a proxy server is ideal for testing geo-restricted content, simulating IP rotation, or validating region-specific pricing or UI. - –viewport-size=1280,720
Forces screen size during recording. Prevents selector drift from responsive breakpoints and ensures pixel-perfect playback across CI agents. - –channel=chrome
Locks the recording to a specific Chromium release. Eliminates browser drift between dev machines and CI containers, which is a subtle but common source of inconsistency. - –output=login.spec.ts
Writes the output directly to a file that is created automatically by Codegen when using the --output flag. Helps with traceability, diffing, and version control of generated tests.
Debugging, Observability, and Long-Flow Support
These flags capture the context and artifacts you’ll need when your test eventually fails or just times out:
- --tracing=on
Enables full tracing of all interactions, snapshots, and network events. After test failure, you can run npx playwright show-trace trace.zip to replay the test visually, frame-by-frame. - --timeout=30000
Extends the per-step timeout to 30 seconds, useful for slow environments, long authentication handshakes, or multi-step wizards that exceed Playwright’s 5s default. - --color-scheme=dark
Emulates OS-level dark mode. If your app adapts styling or layout based on color preference, this flag lets you capture those differences during recording.
How this helps
The key difference here is intent: you’re not recording a demo, you’re generating code that is durable and portable. One that works in CI, reflects your actual environments (devices, locales, networks), and can be debugged or reused confidently.
When you layer in session persistence, standardized viewports, and debug metadata, you turn Codegen into a proper scaffolding tool, not just a macro recorder.
Running Codegen with flags like these drastically reduces the need for post-recording cleanup. You’re skipping over login loops, avoiding brittle breakpoints caused by responsive UIs, and logging everything you need to debug on failure. These practices help you generate tests that are ready for CI and real-world use. These scripts aren’t just testable, they’re CI-ready.
Hardening and Maintaining Playwright Scripts
When working with Playwright, robust test automation requires more than just relying on the code generated by tools like Codegen. While Codegen can quickly create scripts, it's important to inspect HTML elements and customize the generated code to ensure your tests are reliable and maintainable.
Choosing Resilient Selectors
Selectors are the bedrock of every browser automation script. Unfortunately, the ones generated by Codegen are often the weakest link. It’s tempting to leave in page.click(‘//div[4]/span[2]’) or getByText(“Click here”), but both are fragile. Hard-coded XPaths break with layout shifts, and text-based selectors fail with even minor wording or localization changes.
Here’s a better mental model: treat selectors like stability tiers. At the bottom are brittle patterns, XPath, and text-only. These should be avoided in anything beyond a throwaway prototype spec file, where you might quickly write some initial tests.
Class-based selectors (e.g. .button-primary) are better but still tightly coupled to styling. A minor CSS refactor can render them useless. Attribute-based selectors like page.locator(“input[name=’email’]”) offer more reliability, though uniqueness isn’t always guaranteed.
The real sweet spot lies with semantic, structure-aware selectors. When you write selectors in your spec files, use getByRole() to tap into the accessibility tree, which remains relatively stable through UI changes.
Even better, introduce data-testid attributes in your app’s HTML. These are specifically designed for testing clear, unambiguous, and decoupled from user-facing content.
These selectors don’t just improve resilience; they also make your tests easier to read and debug. When your scripts rely on roles, labels, or test IDs, you’re no longer guessing why a click() failed; you can trace it straight back to a known DOM structure.
Structuring Scripts for Maintainability
Once you’ve hardened your selectors, the next step is reducing complexity in your test logic. Codegen tends to output flat, verbose test scripts with repeated actions like login or setup. These test scripts work, but they’re not scalable. A better approach is to factor shared flows into reusable helpers.
Instead of repeating form-filling logic like this:
Not only does this reduce duplication, it makes your intent obvious. Helpers also isolate volatility if the login flow changes, you update one place, not twenty.
Another common Codegen pitfall is using hard waits like:
These are one of the top causes of flaky tests. They slow down your suite and fail unpredictably across environments. Playwright already auto-waits for most actions, such as visibility, clickability, and navigation, so you often don’t need manual waits at all.
When you do need precise control, prefer event-driven waits, especially when your test scripts interact with multiple pages during testing:
Manage your test suite like a real application. Store configuration in playwright.config.ts, use storageState to persist auth sessions, and lean on tracing, screenshots, and video capture for debugging. Don’t let your automation logic live in a dozen nearly-identical .spec.ts files; organize it into helpers, configs, and page objects.
The more you treat test code like production code, the fewer flaky surprises you’ll face. Durable test scripts aren’t about luck; they’re the result of consistent structure, smart selector strategies, and intentional design.
Scaling Automation with Browserless and BrowserQL
Codegen gets you started fast, but running Playwright tests locally can only take you so far. When you need to scale beyond a handful of flows, deal with flaky network conditions, or test across different regions, your local machine sometimes doesn’t work; that’s why Browserless can help.
Browserless acts as a remote execution platform for headless browsers. Rather than spinning up Chromium locally, you connect your Playwright scripts via a WebSocket endpoint using connectOverCDP().
This lets you run Codegen-generated scripts inside a robust cloud infrastructure, with built-in support for parallelism, session isolation, and debugging hooks.
Browserless allows you to run Playwright scripts in a Chromium browser environment remotely, making it easy to leverage the full capabilities of the Chromium browser for your automated tests.
Let’s say you recorded a flow with Codegen, for example:
That gives you a working script. But instead of running it locally, you can scale it like this:
This connects directly to Browserless, executes your Codegen flow, and outputs test artifacts like screenshots or video recordings. It’s the fastest way to productionize a script without re-architecting your test framework.
Behind the scenes, BrowserQL, Browserless’s orchestration layer, adds stealth automation techniques like fingerprint spoofing, human-like input timing, CAPTCHA solving, and proxy routing. That means you don’t need to implement anti-detection logic manually or patch flaky selectors with endless retries.
On top of that, you gain observability. Every session can be traced, logged, and replayed with full video, perfect for debugging flaky tests without reproducing them locally.
Here’s what this gives you:
- Effortless scaling: Run 10, 100, or 1,000 concurrent Playwright sessions in the cloud.
- Reduced flakiness: Built-in stealth, auto-waiting, and anti-bot protection.
- Global coverage: Run tests via proxies to validate geo-specific content.
- CI-friendly: Swap your browser launch method; everything else stays the same.
- Insight-rich: Access logs, video, and trace data per test run.
This setup lets you move seamlessly from Codegen → prototype → production, using the same script foundation and building up infrastructure maturity step-by-step.
Note: For managing and running Playwright automation scripts, we recommend using Visual Studio Code. It offers excellent integration with Playwright through extensions, making it easier to organize, record, and debug your tests directly from the editor.
Conclusion
Playwright Codegen accelerates the first step of building automation, but unrefined scripts won’t survive scale on their own. Hardened selectors, proper session handling, and operational resilience are what separate quick demos from production automation. Browserless and BrowserQL provide the infrastructure, stealth, and observability needed to close that gap and keep workflows running reliably at scale. Sign up for a free Browserless trial and see how scalable, detection-resistant automation works in practice.
FAQs
How can I improve the reliability of Playwright codegen scripts?
You can improve reliability by hardening selectors with getByRole or getByTestId, using assertions to validate state before actions, and eliminating static waits in favor of Playwright’s built-in auto-waiting. Refactoring into page objects and modular helpers also makes scripts easier to maintain at scale.
How do I connect Playwright to Browserless?
Playwright connects to Browserless using a WebSocket endpoint. Replace the local browser launch with connectOverCDP or the browserWSEndpoint option to run scripts in a scalable, cloud-based pool instead of consuming local resources.
What is BrowserQL?
BrowserQL is a query language that sits on top of Browserless to handle stealth automation patterns. It provides built-in support for CAPTCHA solving, anti-bot evasion, fingerprint spoofing, and behavioral modeling, which dramatically improves script stability against detection.
What are the best ops practices for large-scale browser automation?
At scale, automation requires session pooling, retry logic, and concurrency controls to avoid resource exhaustion. Observability tools such as tracing, video recording, and error replay links help track down failures quickly, while proxy rotation and geo-targeting manage access across regions.
How do I automate ethically with these tools?
Always respect site terms of service, honor robots.txt, and follow privacy laws when handling data. Maintain audit logs for accountability, avoid scraping personal or sensitive information without consent, and build consent-aware flows to keep automation aligned with compliance standards.
