Advanced issues when managing Chrome on AWS

December 5, 2023

contents

If you’re running Chrome on AWS, you’ll know the processes are a pain to manage. Chrome loves to chew up RAM and bandwidth, while not even fitting into a lambda. That’s before you get into version updates and orphaned processes.

Before you know it, you're using a dozen CPUs and heaps of RAM just to run some basic automations.

There’s plenty of guides that give you the basics of deploying Chrome, but not how to get it running efficiently. So, this article will look at those advanced issues, including:

Chrome-specific tradeoffs for Lambda Vs EC2
Killing zombies and preventing orphans
Load balancing with upstream directives
Compressing Chromium for Lambda
Avoiding font package size issues
Taming the volume of ingress
Cold start factors & strategies
Maintenance requirements

The good news is that although a tidy setup is tricky, it is possible. You’ll just need to roll-up those sleeves and tackle tasks like building Chrome yourself!

After that, it’s mostly a maintenance task such as keeping an eye on updates and CVEs.

(Pst! You can skip all this hassle by using our fleet of managed headless browsers.)

Challenges of running Chrome on Lambda

We typically recommend deploying Chrome using Lambda. However, there’s limits to be aware of as stated by AWS such as around file-sizes and the 15 minute time limit.

Your Lambdas ideally need around 2GB of memory, which AWS offers for an extra cost. Chrome also needs around 2 CPU cores available for quick responses, which means keeping your function “warm” with some kind of poll or keep-alive functionality. You could also go with a multi-tenancy approach, but that’s too advanced for this guide.

Request, response sizes and concurrency limits can all impact your usage, since Chrome will happily make a mess of those metrics. Chrome will also eat a lot of bandwidth, so beware of ingress and egress costs.

When to consider EC2

AWS’ EC2 service does mitigate a lot of these woes. There’s almost no technical restrictions, it’s just lots more work to set up, deploy and monitor.

You’ll still need to consider networking costs since, again, Chrome loves its network usage. Sure, things like a --user-data-dir can act as a quick cache, but we see sites come in at about 2.5MB on average, so tread lightly.

I’d also recommend prioritizing reproducibility with EC2. I’d caution against shelling into a new instance, running bash set up commands, and crossing your fingers that it "just works". After all, Chrome is notorious for shipping changes that require new packages.

Spend the time to get a proper build pipeline, a CI/CD set up, and ideally run some unit-tests on it as well.

Side note, why should you trust me?

Hey, I’m Joel Griffith! I created and have run browserless.io since 2017, which offers managed browsers as a service. We run millions of sessions a day across every industry imaginable.

That means I eat, sleep and dream about headless automation, plus I’ve seen a lot of infrastructure from our users. So, I know a thing or two about a good Chrome deployment.

Preparing EC2 for Chrome

Running on EC2 requires careful handling of Chrome to keep things running smoothly. Let’s look at some key points.

Slaying zombies and preventing orphans

If you’re using Docker, you’ll have hit the fun headache of Chrome exits leaving zombie processes.

To prevent those, you’ll need a bash package called dumb-init, available on most distros or through GitHub. It’ll stop spawned processes from not closing properly and getting zombied (yes, that’s the technical term). Run htop sometime and they’ll even label them, which should ideally be 0.

You’ll also need to shut down Chrome with the force of a thousand sledgehammers. A simple browser.close works most of the time, but we’ve seen many instances where a sub-process gets orphaned and mucks things up. It’s why we recommend something like:


import kill from 'tree-kill';

const pid = browser.process().pid;

browser.close();

kill(pid, 'SIGKILL');

This should keep the process list clean and avoid entanglements.

There’s more challenges depending on your use case, such as GPU support, fonts and locales. For more detailed reading, I’d recommend this guide.

Load balancing decisions

If you're using EC2 or another "bare-ish metal" service, then you'll need a load-balancing strategy.

We've found that nginx and a "least-connected" algorithm is a great place to start. Depending on what you're up to, you'll probably need to implement an internal health-check in your service to decide if the instance can handle another connection.

This is why Lambdas are appealing as they're isolated, load-balanced, and scale well. But, as we've outlined above, many use-cases require more than what Lambda can handle, so let's proceed with what we typically use here for the nginx directive below:


upstream chrome {
  least_conn;
  server 168.99.26.110:3000;
  server 168.99.26.111:3000;
  server 168.99.26.112:3000;
}

server {
  listen 80;
  proxy_next_upstream error timeout http_500 http_503 http_429 non_idempotent;

  location / {
    proxy_pass http://chrome;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection 'upgrade';
  }

This is grouping our Chrome-only servers into a chrome upstream directive, forwarding traffic to those instances and handling WebSocket connections where needed. The big thing is the proxy_next_upstream. Setting these arguments will help retry the request on a new host in case it goes kaput...which can and will happen.

Final EC2 considerations

As some final thoughts, tools and suggestions:

Visit our puppeteer-benchmark utility for finding a performant version.
Consider using some flags like --renderer-process-limit or others found here.
Scaling horizontally is almost always better than vertically. Bigger doesn't mean better.

Getting the most out of Lambdas

Lambdas solve lots of the EC2 challenges with neat hardware isolation. But, you trade runtime improvements for build time commitments.

Sadly, there’s no single best way to handle Chrome on Lambda. It varies wildly depending on your use-case, but I’ll go over some packages and recommendations.

Find a reputable source for Chromium

Out of the box Chrome is too big to fit into a lambda. You'll need to compress it (Brotli is generally best) and uncompress it when your lambda runs.

This is a pretty standard way of operating which packages like this one aim for. Sadly, many of these are deprecated or no longer maintained, so treat them with caution as Chrome will have risky CVE’s.

You must also be careful of using any old binary to run inside of Lambda. Web browsers are the wild-west, and anyone can embed just about anything into them, so please do a lot of due diligence when sourcing.

Find a fix for fonts

With Lambda’s file-size limits, you’ll want a solution for lazily loading fonts. Again, many packages are out there that aim to alleviate this process, but changes happen rapidly over time.

If it's a site that you own that you're consuming with Chrome, then another alternative is using web-fonts. This is a better longer-term strategy overall as it'll mean consistent rendering of your site in just about any context... headless Chrome or not!

Also, be aware to check font licenses.

Cold starts and tradeoffs

If speed is a priority, you’ll need to answer the cold-start problem. Cold-starts means decompressing Chrome and loading fonts before running any code. This isn’t ideal, since Lambda will take longer to thaw prior to execution.

For a better picture on the problem, be sure to read this GitHub question. The workaround to warming your Lambda up is to pay for a polling mechanism to keep your function alive and warm. This trades off slower initialization for an increase in costs, which may or may not be worth it.

Keeping ingress to a minimum

Chrome happily fetches any and all assets a site demands, so you’ll see big ingress and egress fees. To keep these from getting crazy, use network-interception that most libraries provide. It’ll help you cut down on fees while also improving session throughput.

Ongoing maintenance

Whether you use EC2 or Lambda, maintaining Chrome deployments generally requires one or two engineers working full time.

Every Chrome update will add complexities into your build and deployment pipelines. Chrome is also famous for using yet more resources with each update, so be prepared for your instances to steadily grow.

It’s essential to keep a very very close eye out for vulnerabilities in Chrome and underlying packages. While most updates are trivial to integrate back in your deployment, certain details often cause breakages or issues in surprising places. This is why a thought-out build pipeline with unit-tests and visual-diff tests is essential.

Finally, you’ll want to test things locally…and if local is an Apple product then be prepared for arm64 builds! I won't dive into those here, but be ready for lots of packages and even Chrome versions to not natively run on arm64.

Want to let us manage Chrome for you?

Thanks for sticking with me on this one! If you don't want to have to manage the browsers yourself, check out Browserless. We host headless browsers which you can connect to directly with Playwright or Puppeteer. Check out the options and grab a free trial.

We've helped thousands of companies use our container for hosting browsers, so you can be assured you're in good hands. Drop us a line if you want to discuss your deployment!

Take care out there, and best of luck!

Share this article