WaitUntil option for Puppeteer and Playwright
Picking the best waitUntil option to use can be tricky. Puppeteer and Playwright have several options to consider when a site is considered “done” loading. The waitUntil option you use changes what resources your browser will wait for before it continues.
If you need to scale your browser automation: data extraction, testing, PDF, or screenshot generation, browserless is for you. When working with these tasks, make sure you load all the resources you need, in the fastest time possible.
Why do I need to use the waitUntil option?
They will allow you to statically define when the site is ready for further processing. The best option depends on the following factors and challenges:
- The network behavior of the target site.
- Which library you’re using.
- Your use case.
Ideal event to use based on the network behavior of the target site.
Here’s the order of how the events on a page are fired, which is incredibly useful to know, as it can tell you which is best for your use case. Both Playwright’s documentation and Puppeteer’s documentation cover these events, though there are some differences. Here’s a pretty quick summary below:
commit: is fired when the response headers have been parsed and the session history is updated but the navigation hasn’t started yet (Playwright only).
domcontentloaded: is fired when the document content is loaded and parsed.
load: is fired, after the page executes some scripts and loads resources like stylesheets and images.
networkidle2: is fired when there are no more than 2 network connections for at least
networkidle(playwright): is fired when there are no more than 0 network connections for at least
Differences between Puppeteer and Playwright
The default waitUntil event is set to
load in both libraries, which usually gives the best result in the least time, striking a balance between speed and completion. Playwright additionally has the
commit event, which may be worth trying out for sites that quickly load the desired information, e.g. a header from the site, to check if it’s online or the page’s title.
This option is useful when the load option isn’t quite loading everything you need. You’ll have to use this wisely since we’ve noticed this option is the most prone to timing out (you can use the timeout option to increase the default 30 seconds timeout).
networkidle2(Puppeteer) if the site is using polling techniques. That means the network connections don’t close once data is passed on since it continues to send data over time.
networkidle0(Puppeteer) if the site is using fetch requests or something similar that closes network connections once data is passed on (Common in SPAs).
- Playwright only has the
networkidleevent, which is the equivalent of puppeteer’s
networkidle0, and Playwright’s documentation suggests not using this for testing purposes, but rather to use web assertions and assess readiness.
When using Playwright, this option won’t have a large difference due to Playwright’s auto-waiting feature. For instance, if you have a
page.goto() and a
page.click() call right after the other without waiting, it will automatically wait for the selector to be attached to dom, visible, receive events, and more, making the
networkidle option redundant.
Which event to target for your use case
Essentially, the fewer resources the site needs to load, the earlier event on the timeline you can pick. Fewer resources loaded will also help with issues like high CPU and Memory usage, as loading network resources uses both significantly. The below suggestions are a rough guide since they will depend on your website’s behavior.
Scraping and Data Extraction –
PDF, Screenshots, and Screencasting –
Sample of each waitUntil option in Puppeteer and Playwright
We took a screenshot with each waitUntil option at “https://www.nytimes.com/” for demonstrating purposes since it’s a site that has a lot of network activity. Please consider that the times we measured in this blog post per event may slightly vary depending on your location and worker capacity.
I hope this was useful to understand the waitUntil Options that you can use in Puppeteer and Playwright.
Feel free to reach out to email@example.com if you have any questions or refer to the official documentation of each library.