Why we pick Puppeteer over Selenium almost every time

One of the most common questions we get, by far, is whether or not to go with Selenium or a newer library like puppeteer. Because we support both Selenium and puppeteer, it's hard for us to make generalized recommendations like this when it comes to technology as there's always outside factors that can sway your decision. Things like culture for companies, tools and languages you already have, and even the nature of the business can effect this. All that aside, and even keeping some of it, we almost always recommend not using Selenium. Here's why.

Selenium is an HTTP-based JSON API

Protocol has a lot do with the difference between these two libraries, and why we typically recommend puppeteer or really any CDP-based (Chrome DevTools Protocol) library. Why, you ask?

Let's take the example of loading a website, clicking a button, and getting the title. A rather simple task. In puppeteer this is all done over one socket connection, that starts and ends when we connect (or launch), then close. In Selenium, however, this is at least 6+ HTTP JSON payloads. Why is this so bad?

Each HTTP call has to go through the standard TCP handshake and ramp up, plus get routed to its end location. It can be fast with keepalive headers and more -- but you have to make sure that's all set up.
Doing any kind of load-balancing or round-robin requests gets a lot more difficult with this and you'll need a thing called sticky-sessions... or have to build it.
Rate-limiting also becomes much more challenging as well. Each "session" that Selenium manages can be potentially 100's of API calls each with their own "bursty" patterns that are hard to slap a rate-limiter on.
Finally, at the end of this all, there's a binary *somewhere *that's simply just doing a CDP message into chrome. So why need all of the HTTP stuff?

Now the obvious answer to all of this is to use something like Selenium grid to manage this. While this is perfectly tenable there's a lot of concepts you'll need to understand before feeling like you have control over the deployment. If you were to go with a technology like puppeteer, for instance, you can use just about any load-balancer out there (nginx, apache, envoy, etc.) and not have to re-learn how everything works agin. To put it simply: all your prior architecture skills are more easily transferable over to a library like playwright than with selenium, which requires somewhat specialized knowledge.

Selenium requires more binaries to keep track of

Both puppeteer and playwright both come "off the shelf" with the appropriate version of their respective browsers and everything just works. With selenium, you'll have to figure out what version of chromedriver corresponds to what version of chrome ... which corresponds to what version of selenium you're running. It's quite a lot to track if we're honest, and since these versions are all interlocked it can mean that any *one *the three can easily break your integration or use when doing any sort of updates.

Selenium Grid does help in this regard as well, however it's another competency that you or your team will have to take ownership of versus a more generalized tool that can run practically anywhere. More bins more problems.

Basic things are more difficult in Selenium

Want to add an extra header to each network request? Want to use a proxy? Need to debug what's going wrong? Everyone of these is easier done in puppeteer and more difficult (if not impossible) in Selenium. Proxying or using a proxy with authentication requires additional drivers or plugins, ofter being vendor specific, in order to work. Puppeteer, playwright and more just have this built into their libraries, and it's a very simple method call to do so.

Why would you need to add headers to a browser, you might ask? Well, suppose you want to load-test your own site but don't want those tests to show up analytics. A very easy way to do that is to set some kind of header so your analytics software can "ignore" those sessions. Another reason might be applying headers to certain network requests that are authenticated. Instead of writing a large body of code to automate that, it's much easier to just apply a simple header for those calls to authenticate.

In any event, having the ability augment headers and interact with proxies is a pretty table stakes feature for a library like this. Not having these features from the basic module is sort of an issue in my mind.

There's a lot more to set up in Selenium

Because selenium caters to numerous browsers out there, it's an rather complicated process to set up a simple script in order to do something. For instance, having Selenium get the title of example.com looks like this in NodeJS:

const { Builder, Capabilities } = require("selenium-webdriver");

(async function example() {
  const chromeCapabilities = Capabilities.chrome();
  chromeCapabilities.set("goog:chromeOptions", {
    prefs: {
      homepage: "about:blank",
    },
    args: ["--headless", "--no-sandbox"],
  });

  let driver = new Builder()
    .forBrowser("chrome")
    .withCapabilities(chromeCapabilities)
    .usingServer("http://localhost:3000/webdriver")
    .build();

  try {
    await driver.get("http://www.example.com/");
    console.log(await driver.getTitle());
  } catch (e) {
    console.log("Error", e.message);
  } finally {
    await driver.quit();
  }
})();

This is roughly 35 lines of code, which isn't too shabby until you compare it with puppeteer which is roughly half the lines of code:

const puppeteer = require("puppeteer");

(async function example() {
  let browser;
  try {
    browser = await puppeteer.connect({
      browserWSEndpoint: "ws://localhost:3000",
    });
    const page = await browser.newPage();
    await page.goto("https://example.com");
    console.log(await page.title());
  } catch (e) {
    console.log("Error", e.message);
  } finally {
    await browser.close();
  }
})();

New developers and folks just coding simple scripts will likely scratch their head at what capabilities are and why we want them, and what a "builder" is in this context. These concepts make a lot more sense when we're talking about larger deployments with different browsers and *their *capabilities, but for really simple things it's just pure overhead that doesn't express much.

It's about the future

The more modern libraries are wider encompassing, easier to set up, and feel less "stiff" when using them. They also run on newer protocols that are faster, easier to reason about, and require less specialized software to run and maintain... plus they work off the shelf with other technologies that you probably are already more familiar with.

Does that make selenium the wrong choice all the time? Certainly not. There's a reason it's so popular is it did get a lot of things right! It abstracted away all the various browsers, their respective protocols and integration pains with a very simple and straightforward API. Plus there's a good chance that you've Selenium-ed and didn't realize it! Many higher-level projects out there us it under-the-hood and try and mask away those problems for you. However, the *fact *that there is so many higher-level libraries also speaks to the complexity and set up pains that Selenium is somewhat known for at the same time.