We decided to scrape the Irish electrical grid's public real-time dashboard to help create awareness around how Ireland is a leading country in wind power generation.
What were the results?
Our Twitter account @IrishEnergyBot now has 2,000 followers receiving a daily report on how much wind generation there was on the Irish electric grid in the last 24 hours. Over the past ~18 months wind has met ~33% of Irish electrical demand on average. On windy days it regularly goes as high as 75%! We're #2 in the world. Only Denmark has more wind power.
Why did you choose Browserless for automation?
@IrishEnergyBot scrapes its data from a free, public dashboard provided by Ireland's electrical grid operator. Because the dashboard loads data dynamically after the initial page load, a modern browser with JavaScript is required.
Thanks to Browserless I can keep my puppeteer script in a simple, low-maintenance serverless environment. The connection is fast and reliable and since I need just a few minutes of browser time each month, usage-based pricing works out great.
Browserless is an essential component of @IrishEnergyBot that I just never have to worry about.
1import * as _ from"underscore";
2import puppeteer = require("puppeteer");
34constTIMEOUT_MS = 10000;
56// or "roi" or "ni".7constREGION = "all";
89(async () => {
10const browser = await puppeteer.connect({
11browserWSEndpoint: `wss://chrome.browserless.io?token=${process.env.BROWSERLESS_TOKEN}`,
12 });
1314try {
15const scrapedData = awaitscrape(await browser.newPage());
16console.log(JSON.stringify(scrapedData, undefined, 2));
17 } finally {
18await browser.close();
19 }
20})();
2122asyncfunctionscrape(page: puppeteer.Page) {
23// data frequently fails to load: empirically, if it hasn't loaded in the24// first ~10s then we may as well fail.25asyncfunctionimpatientGoto(url: string) {
26await page.goto(url, {
27waitUntil: "networkidle2",
28timeout: TIMEOUT_MS,
29 });
30 }
31asyncfunctionimpatientWaitForSelector(selector: string) {
32await page.waitForSelector(selector, {
33timeout: TIMEOUT_MS,
34 });
35 }
3637// figures are contained in various divs, all with the class .stat-box. there38// isn't a good way to find the ones we want without inspecting their text39// content. this function extracts the number from the "stat box" under the40// specified parent containing the specified phrase.41asyncfunctionextractStatBoxFigure(parent: string, keyPhrase: string) {
42const selector = `${parent} .stat-box`;
43awaitimpatientWaitForSelector(selector);
44const statBoxesTextContents = await page.$$eval(selector, (elements) => {
45return elements.map((element) => {
46return element.textContent || "";
47 });
48 });
4950const matchingStatBox = _.find(
51 statBoxesTextContents,
52(s) => s.toLowerCase().indexOf(keyPhrase) >= 053 );
54if (!matchingStatBox) {
55thrownewError(`no stat box found containing "${keyPhrase}"`);
56 }
57returnextractFirstNumber(matchingStatBox);
58 }
59​
60impatientGoto(`https://www.smartgriddashboard.com/#${REGION}/demand`);
61const demand_mw = awaitextractStatBoxFigure("#demand", "system demand");
62​
63impatientGoto(`https://www.smartgriddashboard.com/#${REGION}/generation`);
64const gen_mw = awaitextractStatBoxFigure("#generation", "system generation");
65​
66impatientGoto(`https://www.smartgriddashboard.com/#${REGION}/wind`);
67const wind_mw = awaitextractStatBoxFigure("#wind", "wind generation");
6869return { gen_mw, demand_mw, wind_mw };
70}
7172// extracts the first integer from a (potentially messy) blob of text, e.g.:73// " LATEST SYSTEM GENERATION 4,994 MW " -> 499474functionextractFirstNumber(s: string) {
75// remove commas, e.g. 4,800 -> 480076const withoutCommas = s.replace(/,/g, "");
7778// https://stackoverflow.com/questions/8441915/tokenizing-strings-using-regular-expression-in-javascript79const tokens = withoutCommas.match(/[^\s]+/g) || [];
8081const firstNumber = _.find(
82 tokens.map((t) =>parseInt(t, 10)),
83(i) => !isNaN(i)
84 );
85if (!firstNumber) {
86thrownewError("no number found");
87 }
88return firstNumber;
89}
Sign up for a free account and get an API key. You have 6 hours of usage for free! After that, you can pay as you go, and only pay per second that you use!
If you’ve already tested our service and want a dedicated machine for your requests, you might be interested in signing up for a dedicated account, this works best if your doing screencasting or have a heavy load of requests since you won’t be sharing resources.
If you’re using one of our hosted services; be that usage-based or capacity-based, just connect to our WebSocket securely with your token to start web scraping!