How to Fix Slow Puppeteer Scripts With Three Underused Techniques

May 29, 2024

contents

Puppeteer is a great library for controlling Chrome, but the scripts can easily get bogged down. Excessive calls, loading lots of elements or repeatedly logging in can all use up precious time and resources.

The techniques in the article come from our experience helping thousands of Browserless users to optimize their scripts. These are the three main pieces of advice we have regularly given developers,

Three lesser-known Puppeteer features to try

Maybe you’re using Puppeteer for automated testing, web scraping or generating screenshots. Regardless of your use case, we’d recommend checking out these techniques:

1. Ignore irrelevant elements with targetFilter

Puppeteer interacts deeply with the Chrome DevTools Protocol, where almost every entity in the browser is treated as a “target” — this could include pages, iframes, or service workers.

However, it could lead to inefficiencies when the automation doesn’t require interaction with all these elements.

Target filters solve this issue by specifying targets the library should engage with. It lets you focus on relevant pages rather than every element spawned by the browser, such as embedded iframes or service workers, which might not be necessary for specific automation tasks.

The targetFilter option is part of the BrowserContextOptions interface in Puppeteer:


interface BrowserContextOptions {
proxyServer?: string; proxyBypassList?: string[]; targetFilter?: TargetFilterCallback;
}

This feature has several benefits, such as:

Reducing process overhead: Puppeteer doesn’t need to set up listeners or utilities for ignored targets, which reduces resource consumption and process overhead.
Limiting interactions: This filter allows you to limit interactions and leaves fewer traces. As a result, you can reduce the chance of bot detection—especially those that initiate service workers to detect automation scripts.
Increasing execution speed: Since there’s less overhead, the browser has fewer extraneous processes to manage, allowing for faster execution times.
Improving debugging speed: You can filter irrelevant targets during debugging, which lets you find the root cause faster while focusing on the targets that are key to your use case.

Most developers only operate at the page level and ignore service workers or sub-frames. So, target filters work well to streamline operations and help with bot detection defenses that minimize their footprint in the browser environment.

2. Use waitFor instead of event listeners

Event listeners allow developers to respond to browser activities dynamically. However, heavily relying on this approach for complex automation tasks can lead to convoluted and resource-intensive code.

When simulating keystrokes using Puppeteer, you send key down and up events for each character typed. Every character you type is logged as a separate event, slowing down operations.

If you’re monitoring every image load or tracking multiple network requests, it could quickly become messy. It gets even worse if you’re using slowMo to try and simulate realistic typing where each delayed keystroke then has an event.

Bonus tip: To speed up text input, you can use the `evaluate` function to inject a text string into a field all at once, instead of going letter-by-letter.

Instead of event listeners, use Puppeteer’s ”waitFor” function to manage browser interactions. You can pause until specific conditions are met—reducing the need for multiple event listeners.

There are several benefits of doing this:

Reduces complexity: By eliminating the need for multiple event listeners, waitFor simplifies the codebase.
Increases performance: Less overhead from managing numerous event listeners improves the script’s execution speed.
Improves code maintenance: With fewer components to manage, the code is easier to maintain and debug.

Here are a few use cases for this function:

Page load events: Use page.waitForNavigation() to wait for a navigation event like a URL change or page reload.
Custom events: Implementing custom waiting logic with page.waitForEvent(event) to wait for specific events to be emitted on the page.

Specific elements: Using page.waitForSelector(selector) to wait for a particular element to appear in the DOM.

3. Save logins or session states with cookies

Session management in Puppeteer usually involves logging into accounts or managing browser states across sessions. However, automating tasks requiring frequent logins or maintaining session states can become resource-intensive.

Instead of logging in or repeating the same action each time, use cookies to restore the previous state. Here’s how you can do that:

Use page.cookies() method to get all cookies in the current browsing session and store them in a file.
After saving the cookies, you can set them back into the browser using the page.setCookie(...cookies) method.
To clear the browsing session, you can clear cookies and other site data using methods like page.deleteCookie(...cookies) and localStorage.clear().

There are several benefits of using cookies, such as letting you:

Manage authentication states without repeatedly simulating login steps.
Bypass the login process and reduce the load on the server and client.
Avoid security flaws associated with script-based login automation.
Reduce load times by specifying a user data directory to reuse cache and cookies across multiple sessions.

Also, with cookies, your actions seem like those of a regular user, which is an aspect you need to bypass bot detection measures.

Bonus tip: Browserless makes it easy to reuse browsers with our /reconnect API, so you can reconnect to a logged in session.

Keeping Chrome from devouring resources

Even though Puppeteer offers excellent capabilities for browser automation, Chrome deployments can use up lots of resources and be time-consuming to manage. Chrome is notorious for memory leaks and slow cold start times, which further slow down automations.

That’s why we offer a pool of managed browsers.

With Browserless, you can focus on scripting tasks like scraping, testing or generating PDFs. You can then set the endpoint to use our browsers, complete with extras like inbuilt proxies and stealth settings.

Take Browserless for a test drive using the 7-day free trial.