How to use Puppeteer with Netlify

contents

Web automation and testing have never been more accessible, thanks to tools like Puppeteer. When combined with cloud platforms like Netlify, the potential for scalable, efficient web applications becomes boundless.

This guide focuses on utilizing puppeteer-core, a lightweight version of Puppeteer. It lets you control a separately hosted Chrome instance, enabling more flexibility and efficiency in your automations.

Understanding puppeteer-core and Netlify

The default Puppeteer package comes bundled with Chromium. This might sound good at first glance, but can cause a range of challenges around scaling and resource management.

That's why we always recommend using puppeteer-core instead. It's essentially Puppeteer without its bundled Chromium, allowing developers to deploy the browsers separately. This approach is particularly useful for cloud environments where you might want to manage browsers on their own server for cost and performance reasons.

Netlify, on the other hand, offers an easy-to-use, serverless platform for deploying web applications and automating workflows. Its Functions feature supports serverless backend functions, perfect for automation scripts using Puppeteer.

By combining the two, developers can create serverless web automation solutions without the challenge of trying to host Chrome on Netlify.

Setting up your project

1. Initial setup:

  1. Create a Project Folder: Begin by creating a new folder for your project. This folder will contain all your project files, ensuring a clean and organized workspace.
  2. Initialize Node.js Project: Open a terminal, navigate to your project folder, and run npm init -y. This command creates a package.json file, which will track your project's dependencies and scripts. Think of it as the blueprint of your project.

2. Installing dependencies:

Install puppeteer-core and doteny. The second library manages allowing you to configure the Chrome instance's WebSocket endpoint securely.


   npm install puppeteer-core 

   npm install puppeteer-core dotenv

3. Host a Chrome instance

Decide how you'll host Chrome. For development, you might run Chrome on your machine, but for production you'll want to host them yourself in your chosen cloud platform, or use Browserless's managed browsers.

Writing your first puppeteer-core script

Let's craft a simple script that opens a web page and takes a screenshot. This example provides a hands-on introduction to Puppeteer's capabilities:


// screenshot.js
require('dotenv').config(); // Load environment variables
const puppeteer = require('puppeteer-core'); // Import Puppeteer-core

async function takeScreenshot() {
    // Connect to an external Chrome instance
    const browser = await puppeteer.connect({
        browserWSEndpoint: process.env.CHROME_WS_ENDPOINT,
    });

    // Open a new page
    const page = await browser.newPage();
    await page.goto('https://www.fashionnova.com'); // Navigate to the website
    await page.screenshot({ path: 'fashionova.png' }); // Take a screenshot
    await browser.close(); // Close the browser
}

takeScreenshot();

The script starts by loading environment variables, which includes your Chrome instance's WebSocket endpoint (CHROME_WS_ENDPOINT). It provides a secure way to store sensitive information outside your code.

Your script then connects to Chrome so it can navigate to a webpage and takes a screenshot.

If you would like an example of a more complex export, check out our guide on generating professional looking PDF reports using Puppeteer.

Integrating Puppeteer-core with Netlify Functions

Netlify Functions allows you to run serverless backend code without provisioning or managing servers. Integrating puppeteer-core into a Netlify Function enables you to execute browser automation tasks triggered by HTTP requests or scheduled events.

  1. Prepare Netlify Functions:

Create a Netlify directory at the root of your project with a functions directory within it. Netlify requires this structure to recognize and deploy your serverless functions.

  1. Function Script:

Inside the functions directory, create a JavaScript file for your function, e.g., screenshot.js. This file will contain the code to execute Puppeteer-core tasks.


// netlify/functions/screenshot.js
const puppeteer = require('puppeteer-core');

exports.handler = async function(event, context) {
    // Environment variable for the Chrome endpoint
    const browser = await puppeteer.connect({
        browserWSEndpoint: process.env.CHROME_WS_ENDPOINT,
    });

    const page = await browser.newPage();
    await page.goto('https://wikipedia.com');
    const screenshotBuffer = await page.screenshot();
    await browser.close();

    return {
        statusCode: 200,
        body: JSON.stringify({ message: "Screenshot taken successfully" }),
        isBase64Encoded: false,
    };
};

The exports.handler is the entry point for your Netlify Function. It's triggered whenever the function is invoked, which can be through an HTTP request or a scheduled event.

Inside the handler, Puppeteer connects to the Chrome instance, navigates to a page, and takes a screenshot, similar to our standalone script.

Then, the function returns a success message, indicating the operation's outcome. In practical applications, you might return the screenshot itself or a link to where it's stored.

Best practise for hosting Chrome yourself

Managing a Chrome deployment comes with challenges. Whether you're hosting Chrome on AWS, GCP, Azure or another platform, we would recommend giving thought to:

  • Health checks to regularly ensure your instances are response, and restart or replace any that are unresponsive of have high memory usage.
  • Access control using firewalls or network access control lists to prevent unauthorized use and protect against attacks.
  • Function timeouts to ensure tasks aren't being prematurely terminated and using aynch processing where suitable.
  • Data storage, both in terms of bypassing Netlify's memory limits and to make sure temp files don't fill up your servers.
  • Browser updates, making sure your Chrome instance and necessary libraries are all up to date, keeping up with the regular version updates.

Or, you can use our browsers.

Using Puppeteer with our fleet of managed browsers

Browserless lets you avoid the hassle of managing your own Chrome instances. We host thousands of browsers that are ready for anyone to use with their Playwright or Puppeteer scripts.

To use Netlify with Browserless, all you need to do is change the endpoint.


const browser = await puppeteer.connect({
    browserWSEndpoint: `wss://chrome.browserless.io?token=${process.env.BLESS_TOKEN}`,
  })

Your scripts will then run without you ever having to worry about memory leaks or issues from version updates.

For more info about using Puppeteer with Browserless, check out our docs.

Closing thoughts about Netlify and Puppeteer

Puppeteer and Netlify are a great combination, especially for tasks such as generating screenshots and PDFs.  By using puppeteer-core you can use browsers hosted elsewhere, either on your servers or ones managed by Browserless.

To get started with Browserless, just go ahead and grab a 7-day trial.

Share this article

Ready to try the benefits of Browserless?