How @IrishEnergyBot used web scraping to help create awareness around Green Energy 🌱

May 6, 2022

contents

@IrishEnergyBot used web scraping by Browserless

What was the goal of your automation?

We decided to scrape the Irish electrical grid's public real-time dashboard to help create awareness around how Ireland is a leading country in wind power generation.

What were the results?

Our Twitter account @IrishEnergyBot now has 2,000 followers receiving a daily report on how much wind generation there was on the Irish electric grid in the last 24 hours. Over the past ~18 months wind has met ~33% of Irish electrical demand on average. On windy days it regularly goes as high as 75%! We're #2 in the world. Only Denmark has more wind power.

Why did you choose Browserless for automation?

@IrishEnergyBot scrapes its data from a free, public dashboard provided by Ireland's electrical grid operator. Because the dashboard loads data dynamically after the initial page load, a modern browser with JavaScript is required.

Thanks to Browserless I can keep my puppeteer script in a simple, low-maintenance serverless environment. The connection is fast and reliable and since I need just a few minutes of browser time each month, usage-based pricing works out great.

Browserless is an essential component of @IrishEnergyBot that I just never have to worry about.

1import * as _ from "underscore";
2import puppeteer = require("puppeteer");
3
4const TIMEOUT_MS = 10000;
5
6// or "roi" or "ni".
7const REGION = "all";
8
9(async () => {
10  const browser = await puppeteer.connect({
11    browserWSEndpoint: `wss://chrome.browserless.io?token=${process.env.BROWSERLESS_TOKEN}`,
12  });
13
14  try {
15    const scrapedData = await scrape(await browser.newPage());
16    console.log(JSON.stringify(scrapedData, undefined, 2));
17  } finally {
18    await browser.close();
19  }
20})();
21
22async function scrape(page: puppeteer.Page) {
23  // data frequently fails to load: empirically, if it hasn't loaded in the
24  // first ~10s then we may as well fail.
25  async function impatientGoto(url: string) {
26    await page.goto(url, {
27      waitUntil: "networkidle2",
28      timeout: TIMEOUT_MS,
29    });
30  }
31  async function impatientWaitForSelector(selector: string) {
32    await page.waitForSelector(selector, {
33      timeout: TIMEOUT_MS,
34    });
35  }
36
37  // figures are contained in various divs, all with the class .stat-box. there
38  // isn't a good way to find the ones we want without inspecting their text
39  // content. this function extracts the number from the "stat box" under the
40  // specified parent containing the specified phrase.
41  async function extractStatBoxFigure(parent: string, keyPhrase: string) {
42    const selector = `${parent} .stat-box`;
43    await impatientWaitForSelector(selector);
44    const statBoxesTextContents = await page.$$eval(selector, (elements) => {
45      return elements.map((element) => {
46        return element.textContent || "";
47      });
48    });
49
50    const matchingStatBox = _.find(
51      statBoxesTextContents,
52      (s) => s.toLowerCase().indexOf(keyPhrase) >= 0
53    );
54    if (!matchingStatBox) {
55      throw new Error(`no stat box found containing "${keyPhrase}"`);
56    }
57    return extractFirstNumber(matchingStatBox);
58  }
59
60  impatientGoto(`https://www.smartgriddashboard.com/#${REGION}/demand`);
61  const demand_mw = await extractStatBoxFigure("#demand", "system demand");
62
63  impatientGoto(`https://www.smartgriddashboard.com/#${REGION}/generation`);
64  const gen_mw = await extractStatBoxFigure("#generation", "system generation");
65
66  impatientGoto(`https://www.smartgriddashboard.com/#${REGION}/wind`);
67  const wind_mw = await extractStatBoxFigure("#wind", "wind generation");
68
69  return { gen_mw, demand_mw, wind_mw };
70}
71
72// extracts the first integer from a (potentially messy) blob of text, e.g.:
73//   "     LATEST SYSTEM GENERATION      4,994 MW   " -> 4994
74function extractFirstNumber(s: string) {
75  // remove commas, e.g. 4,800 -> 4800
76  const withoutCommas = s.replace(/,/g, "");
77
78  // https://stackoverflow.com/questions/8441915/tokenizing-strings-using-regular-expression-in-javascript
79  const tokens = withoutCommas.match(/[^\s]+/g) || [];
80
81  const firstNumber = _.find(
82    tokens.map((t) => parseInt(t, 10)),
83    (i) => !isNaN(i)
84  );
85  if (!firstNumber) {
86    throw new Error("no number found");
87  }
88  return firstNumber;
89}

How to get started with Browserless

There are different ways to use our product.

Use our online debugger to try it out!

Sign up for a free account and get an API key. You have 6 hours of usage for free! After that, you can pay as you go, and only pay per second that you use!

You can self-host for development purposes by using our OpenSource browserless docker image

If you’ve already tested our service and want a dedicated machine for your requests, you might be interested in signing up for a dedicated account, this works best if your doing screencasting or have a heavy load of requests since you won’t be sharing resources.

If you’re using one of our hosted services; be that usage-based or capacity-based, just connect to our WebSocket securely with your token to start web scraping!

Share this article

How @IrishEnergyBot used web scraping to help create awareness around Green Energy 🌱

How to get started with Browserless

Ready to try the benefits of Browserless?