Stream login windows during Puppeteer scripts, with our Hybrid Automations

contents

We’ve just launched our new Hybrid automation feature at Browserless. Let’s take a look at how it protects user data when working with automations.

No more asking for their login credentials

It’s very common to run scripts that involve someone else’s website. For example, maybe you automate a set of actions in an HR system.

That’s all fine in theory, but logging in presents a challenge.

Unfortunately, we’ve seen a trend for these types of scripts to ask users for their username and password, or even a 2FA code, which are then entered into the system. Doing so is hugely insecure, and is in violation of most site agreements. So how do we work around this?

Bringing users into the automation

We wanted to create an alternative workflow to just upfront asking for sensitive data. This new approach could go:

  1. Launch an automation with Puppeteer
  2. Start a browser and navigate to the relevant login page
  3. Live-stream this page to the user, 
  4. Let the user enter their details, validate a 2FA or solve a captcha in an iframe
  5. Carry on with the automation now that it has access to the system

That would create all the benefits of automating a process, without compromising the security of a system.

Introducing hybrid automations with Browserless

With our new hybrid automations, you can create the workflow described above.

It lets you write a normal Puppeteer script, but with events and other APIs you can use to “hook” into these workstreams. Browserless then interacts with the browser at the CDP layer to add custom behavior without us having to mess with the Puppeteer library.

Using Browserless.liveURL returns a fully-qualified URL loaded into a web browser. The URL doesn’t require a token, so you can share it with users for them to click, type or perform other interactions.

The livestream looks and feels like their normal browser, despite using a headless browser behind the scenes.


import puppeteer from 'puppeteer-core';

const login = async () => {
  const browser = await puppeteer.connect({
    browserWSEndpoint: 'ws://localhost:3000?token=YOUR-API-TOKEN',
  });
  const page = await browser.newPage();

  await page.goto('https://www.gmail.com/');
  const cdp = await page.createCDPSession();
  const { liveURL } = await cdp.send('Browserless.liveURL');

  // Send this one-time link to your end-users.
  // This URL doesn't contain an API-token so there's no
  // secrets being leaked by doing so
  console.log(`Shareable Public URL:`, liveURL);

  // This event is fired after a user closes the page.
  // Assuming the page is where it's supposed to be, we can
  // proceed with doing further automations
  await new Promise((r) => cdp.on('Browserless.liveComplete', r));

  // Implement your scraping, data collections or further automations here.

  // Don't forget to close!
  browser.close();
};

login().catch((e) => console.log(e));

User logins are the most popular use for this we’ve heard of, but they’re not the only options. You can use this approach to get the user involved with any type of page interactions, loaded as an iframe within your UI.

Whether you just want to solve a captcha or perform another task, you can set up a screencast to involve the browser.

Side note: If this is your first time learning about Browserless, the tl;dr is that we offer pools of managed browsers, all hosted and ready to use at scale with Puppeteer or Playwright.

Reusing the session after login

If you wish to run multiple automations with the logged-in session, you can open multiple pages on the same browser instance so that it shares the cookies.

To keep those cookies, sessions and cache, you can specify a workspace directory with the &--user-data-dir flag with a unique identifier on the initial connection.


puppeteer.connect({
  browserWSEndpoint: 'wss://chrome.browserless.io/?token=YOUR_API_KEY&--user-data-dir=~/browserless-cache-123',
});

You can then use this same flag with the identifier of the session you want to use on subsequent browser connections to reutilize the cookies, sessions, and cache that was initially stored. This way, your clients won’t have to login every time you need to reuse their logged-in session.

Keep in mind that the default workspace directory lifetime is 7 days.

Want to add hybrid functionality to your automations?

Hybrid automations are now available for users on our hosted enterprise plans. They’re the first to utilize this new CDP-based approach, and we’ll be adding more new features to this API to enhance the experience.

If you'd like to try it out for your own automations, then click below to contact our team.

Contact Browserless

Share this article

Ready to try the benefits of Browserless?