Deploying Playwright on AWS EC2 is a versatile solution for automating browser tasks such as web scraping, end-to-end testing, and interacting with modern web applications. But, it requires some dependencies and configuring to run smoothly.
In this guide, we’ll cover everything from selecting the appropriate instance type to installing necessary dependencies and configuring the environment for optimal performance, followed by running an example to capture screenshots. We’ll also look at using separately hosted browsers to simplify things.
{{banner}}
Selecting Instance Configurations and Operating Systems
When deploying Playwright on AWS EC2, selecting an appropriate instance type ensures smooth performance. A solid basic configuration is a t3.medium or t3.large instance with 4–8 GB of RAM. For storage, around 10 GB is recommended to handle browser binaries and temporary files.
Playwright does not support Amazon Linux natively, which means dependencies have to be downloaded separately. We’ve included the list of dependencies for Amazon Linux below.
Having said that, Playwright supports Ubuntu and provides the dependencies installation with a single command. Below commands are for Ubuntu.
Step 1: Launch and Connect to an EC2 Instance
To get started, launch an EC2 instance with sufficient storage and connect to it. You’ll install Node.js and Playwright using the following commands. Playwright comes bundled with browser binaries, so there’s no need to install them separately unless specific configurations are required for your use case.
Step 2: Install NodeJS
Step 3: Install Playwright and Dependencies
This command will prompt to install browsers and dependencies. In case of any issues, they can be downloaded separately using the following command:
Dependencies for Amazon Linux (skip this if you using Ubuntu)
Trying to install dependencies using Playwright tool above gives following warning
BEWARE: your OS is not officially supported by Playwright;
installing dependencies for ubuntu20.04-x64 as a fallback.
Therefore, we install all the dependencies using the following command, which has been tested on nodejs16:
Please note that xcb and xkbcommon are not directly available under those names in the default Amazon Linux. Therefore, we’ve installed the libraries using the following package names, which cover the same dependencies:
- xcb is part of libxcb.
- xkbcommon is part of libxkbcommon.
Example Code: Save Screenshots to S3
Let’s test some code that captures a screenshot of a webpage and uploads it to an S3 bucket.
First, create a new JavaScript file that will contain the code. You can run the following command in your terminal to create the file and open it in a text editor:
Then, copy and paste the following code into the file:
This code takes the website we want to capture as a command line input.
Run the following command and check the screenshot saved to S3 on a successful run.
Maintenance tips and challenges
Playwright and EC2 is a great combination, but it requires careful maintenance.
One of the most important aspects is dependency management. Playwright, along with its browser binaries, frequently releases updates to stay in sync with modern web standards.
These updates often bring new features, optimizations, and security fixes, which makes it essential to regularly update your Playwright version to avoid potential compatibility issues or vulnerabilities.
You will also need to keep an eye out for memory leaks. Issues such as zombie process and browsers not closing properly can gradually increase the resources needed to keep the automations running smoothly.
Run Playwright with Browserless to Keep Things Simple
To take the hassle out of scaling your scraping, screenshotting or other automations, try Browserless.
It takes a quick connection change to use our thousands of concurrent Chrome browsers. Try it today with a free trial.
The Easy Option: Connect Playwright to Our Browser Pool
Hosting Playwright is easy, it's the browsers that cause the issues. To simplify your setup, use our pool of thousands of concurrent browsers with just a change in endpoint.
You can either host just playwright-core
without the browsers, or use our REST APIs. There’s residential proxies, stealth options, HTML exports and other common needed features.