How to screencast with just Puppeteer

June 9, 2020

contents

From time to time we like to do a deep dive on how a piece of the browserless stack operates. Sharing helps us reinforce our learnings, plus gives us a chance to give back to the community as well! Having recently rewritten our implementation of screencasting for Puppeteer, we decided that the implementation was interesting enough to go over in a blog post. We'll of course go over how we landed on the decision, and talk more about tradeoffs made for our particular API.

But before we get too far down the road, let's take a look at the ecosystem as it stands.

Current state

Puppeteer has a long standing issue regarding screencasts. If you sort by oldest, or even most commented, or really most of the sorting filters honestly, it's generally one of the top issues. It's fair to say that this, alongside downloads, is one of the highest requested features of the library.

It's a feature we don't think should be taken lightly given how messy licensing can be with video and audio. MPEG-4, arguably the most popular format, claims to have patents from several companies making this a tough option. If we're to ignore formats for now, even just getting a stream of video/audio out of chromium has its challenges:

The DevTools Protocol only offers an event-based screencastFrame of png data, not a video stream.
While you can run an extension to do screencasts you'll likely be plagued with modals on allowing this action to occur.
The other option is use technologies like XVFB and ffmpeg with pipes, but those have licensing and other production challenges as well.

Chrome Extension pitfalls

In our alpha implementation of screencasting we decided to go the route of writing a full-blown chromium extension. This is a pretty understood use-case that one could do in a extension, so it made sense to go this route for delivery of our alpha API. The initial implementation was easy enough to write and you can still see our extension code here, but after running it for some time we noticed several issues that made it difficult to maintain.

The first problem was the manner in which this extension had to be authored. For it to work properly, a lot of the core logic had to run as a background script so that screen could be recorded even during page navigation. This turned out to be much harder than originally anticipated as chromium's debugger, and even primitive console methods, failed to work for this script. It's entirely possible we might have missed a more effective way of doing this, so instead we had to use alert calls to debug references going through the system. As much as we like feeling nostalgia over the IE6 days, we don't miss this way of writing software.

The other deal-breaker we encountered was an un-explained shift in video height as you can see below:

This unexplainable resize is actually a hidden "info-bar" that can't be seen because, well, there's no user-interface when headless! It was only during local debugging, and not in docker, that we noticed the bar show up and disappear, which explained this behavior. Unfortunately there's no way to make this bar never show up or stay permanently, meaning any screencast will have this issue.

Piping with XVFB

Long time automation enthusiasts will no doubt be familiar with XVFB. Prior to chromium having a headless mode, you pretty much had to use a mock frame-buffer in order to run chromium in places where there was no GUI. So, it makes sense to use something like XVFB to pipe a video stream to some file.

The unfortunate thing is that the front-runner for handling XVFB to some video format is ffmpeg. ffmpeg, like mp4, has a lot of interesting licensing restrictions that make it hard to justify. Just look at that 18-point checklist!

XVFB also requires us not to use the --headless switch, meaning that certain workflows won't ever be possible (for instance screencasting a PDF generation, or writing other debugging tools). In order for this to work for browserless, we needed a solution that didn't impose how chromium was launched -- especially since many of our users use custom launch flags.

Now that you've gotten the whole picture of why this is hard, let's dive into how it works in browserless.

How browserless handles screencasts

Given all the history above, and knowing that's unlikely that Puppeteer will natively support a screencasting function, we recently decided to rewrite the screencasting implementation from the ground-up so that it can be reused elsewhere later. In order to do this, we need to subscribe to the Page.screencastFrame event in order to receive screen data. This is an easy process, however Puppeteer doesn't expose it in their API, so you have to use their private client in order to do so:


// Subscribe to screencast data, using the pages width/height
await client.send('Page.startScreencast', {
  format: 'jpeg',
  maxWidth: viewport.width,
  maxHeight: viewport.height,
});

Once the event is subscribed to, we then attach a handler to deal with incoming JPEG images:


client.on('Page.screencastFrame', ({ data, sessionId }) => {
  // Do something with `data` and ack the frame if we can
  client.send('Page.screencastFrameAck', { sessionId }).catch(() => {});
});

Great! We've got our handler setup to get data from the screencast event... but what do we do with it? You'll recall that this is event-based and not a stream of data. Generally speaking, videos themselves are simply images that change (or stream) over time, and modeling this in software means that an event-based architecture doesn't necessarily work unless we can pipe it someplace. Because we need to know how long the page stayed the same without changes, we really should be treating this as a stream of information.

We could take this binary event and use Node's piping capabilities to pipe this data into ffmpeg, but there's challenges there as well. For one, the licensing of ffmpeg is somewhat unclear, and we also would need to setup a lot of handling code on that binary to ensure that edge-cases like crashes, timeouts and other issues are handled. We also prefer using already existing libraries and tools within our stack, and always jump at the chance to use new web-technology.

It's with all these restrictions in mind that we decided browserless should instead:

Instantiate a canvas element and push frame data into it
Pass that stream into a MediaRecorder object to record.
Once done the video is complete, we trigger a download to get a webm out of chromium.

For the canvas part, this is pretty straightforward to setup:


const canvas = document.createElement('canvas');
document.body.appendChild(canvas);
const stream =canvas.captureStream();
new MediaRecorder(stream, { mimeType: 'video/webm' });

However there's a few problems with this. For one, we need to add this canvas element to the DOM in order for it to work properly -- but if we add it to the page we're trying to record then we'll have a infinite screencast recursion problem. Second, if our script navigates to multiple pages then we've lost all of our rendering objects!

The solution to this problem is to create another page, separate from the first, and use it for a rendering area:


// Startup our rendering page, keeping it blank
const renderer = await browser.newPage();

// Setup the screencast event
client.on('Page.screencastFrame', ({ data, sessionId }) => {
  // renderer pass data into the rendering page somehow
  client.send('Page.screencastFrameAck', { sessionId }).catch(() => {});
});

Now that we've got the page that we're screencasting separate from our rendering context, we need to find a way to pass data from the Node event to the chromium rendering page. For that answer we're going to use Puppeteer's excellent evaluateHandle method. This method allows you to create a reference in Chromium's runtime that can be accessed via our Node runtime. Since both NodeJS and chromium have their own JavaScript environments we can't simply pass this data back and forth so easily since there's a bridge of sorts between the two.

For browserless, specifically, we create a small helper class that does most of the heavy lifting of this rendering issue. Keep in mind this is abbreviated from what's in our actual API:


const screencaster = await renderer.evaluateHandle(() => {
  const screencastAPI = class {
    private canvas: HTMLCanvasElement;
    private ctx: CanvasRenderingContext2D;
    private recordingFinish: Promise;
    private recorder: any;
    private chunks: any[];

    constructor() {
      this.canvas = document.createElement('canvas');

      document.body.appendChild(this.canvas);

      this.ctx = this.canvas.getContext('2d') as CanvasRenderingContext2D;
      this.chunks = [];
    }

    private async beginRecording(stream: any): Promise {
      return new Promise((resolve, reject) => {
        // @ts-ignore No MediaRecorder
        this.recorder = new MediaRecorder(stream, { mimeType: 'video/webm' });
        this.recorder.ondataavailable = (e: any) => this.chunks.push(e.data);
        this.recorder.onerror = reject;
        this.recorder.onstop = resolve;
        this.recorder.start();
      });
    }

    private async download() {
      // Download the final webm file
    }

    async start({ width, height }: { width: number; height: number }) {
      this.canvas.width = width;
      this.canvas.height = height;
      // @ts-ignore No captureStream API
      this.recordingFinish = this.beginRecording(this.canvas.captureStream());
    }

    async draw(pngData: Buffer) {
      const data = await fetch(`data:image/png;base64,${pngData}`)
        .then(res => res.blob())
        .then(blob => createImageBitmap(blob));

      this.ctx.clearRect(0, 0, this.canvas.width, this.canvas.height);
      this.ctx.drawImage(data, 0, 0);

      return this;
    }

    stop() {
      this.recorder.stop();
      this.download();
      return this;
    }
  };

  return new screencastAPI();
});

Once this is done, we can now pipe screencasting data easily by re-using this handle. Please note that we have to pass in the screencaster reference to the evaluteHandle API, so that are arguments can be sent through this bridge into chromium.


client.on('Page.screencastFrame', ({ data, sessionId }) => {
  renderer.evaluateHandle((screencastAPI, data) => screencastAPI.draw(data), screencaster, data);
  client.send('Page.screencastFrameAck', { sessionId }).catch(() => {});
});

Finally, putting it altogether, we need a way to handle downloading the webm and piping that back through to our response handler in Node. We can make use of some anchor-magic in the browser, but assigning the webm file as a download. Here's a scoped version of what that looks like in our rendering page:


const downloadAnchor = document.createElement('a');
downloadAnchor.href = '#';
downloadAnchor.textContent = 'Download video';
downloadAnchor.id = 'download';
document.body.appendChild(downloadAnchor);

// Sometime later when we have our recording chunks
const blob = new Blob(chunks, { type: 'video/webm' });

this.downloadAnchor.onclick = () => {
  this.downloadAnchor.href = URL.createObjectURL(blob);
  this.downloadAnchor.download = downloadName;
};

this.downloadAnchor.click();

Almost there! Now that we have chromium downloading the file for us, the last thing we need to do is have Puppeteer tell chromium where to download the file, and then resolve our request with it:


await renderer._client.send('Page.setDownloadBehavior', {
  behavior: 'allow',
  downloadPath: '/some/good/file-path',
});

Since browserless already has a "hook" for dealing with downloads, we just reuse that hook:


res.sendFile('/some/good/file-path', (err) => {
  const message = err ?
    `Error streaming file back ${err}` :
    `File sent successfully`;

  // Cleanup our file-system
  rimraf(filePath, noop);
});

We can now observe our new spinner webm file with the jank now-gone

Wrapping it up

As you can see, getting screencasting working with just Puppeteer is quite the chore. While our implementation works well for video, it still doesn't support streaming audio, though we're looking to add that in the future. If you want to learn more about our screencast API, take a look here.

If you want to see our full-blown implementation of this newly refashioned API, feel free to check out our repo.

As always, we're happy to hear your feedback and impressions. Happy hacking!