Web Scraping Business Ideas for Everyone

Girish Patil
Girish Patil
/
December 30, 2022

Our purpose with this article is education. We want you to know about different ways you can use web scraping in your business and life.

Areas that we will explore in this article

  • Software / IT
  • Sports
  • E-commerce
  • Travel
  • Finance
  • Agriculture
  • Marketing
  • Real estate
  • Media, Entertainment, and News
  • Jobs
  • Automate research and study
  • Some surprise in the end

Before we get into different areas and their use cases, you should check out some of the articles listed below to learn more about browser automation.

  1. What is browser automation? A guide to getting started
  2. What is data scraping? Tutorial for beginners
  3. Quick start with Browserless.io

Let’s see how different businesses can be built around web scraping under the above-mentioned areas. Note that all the example scripts and reference guides will be using browserless.io.

1. Software / IT

Web scraping as a service

Many products don’t have APIs for public usage. You can build scrapers/API around these products and monetize them.
For ex, APIs for amazon.com, alibaba.com, govt. websites etc. can be built and provided as a service. Such APIs can be directly used by other developers who are building solutions, as they don't have to worry about scraping the web for data and maintaining the infrastructure for this.

Example code to scrape today’s deals page of amazon.com


const puppeteer = require('puppeteer');

const API_TOKEN = process.env.API_TOKEN;

if (!API_TOKEN) {
  throw new Error('No API_TOKEN provided');
}

(async () => {
  try {
    const browser = await puppeteer.connect({
      browserWSEndpoint:
      'wss://chrome.browserless.io?token=' + API_TOKEN,
    });

    const page = await browser.newPage();

    await page.goto('https://www.amazon.com/gp/goldbox');
    await page.waitForSelector('.DealGridItem-module__dealItemDisplayGrid_e7RQVFWSOrwXBX4i24Tqg');

    const frontPageOffers = await page.evaluate(() => {
      const items = document.querySelectorAll('.DealGridItem-module__dealItemDisplayGrid_e7RQVFWSOrwXBX4i24Tqg');
      return Array.from(items).map(item => item.textContent);
    })
    
    console.log('amazon.com today\'s deal', frontPageOffers);

    browser.close();
  } catch (error) {
    console.log(error);
  }
})();

Output of the above script will be something like this


[
  'Up to 51% offEnds in 02:19:22Up to 51% offEnds in 02:19:22Vera Bradley Handbags and Accessories',
  'Up to 30% offDealUp to 30% offDealElectronics & Office Products from Amazon Basics',
  'Up to 61% offTop dealUp to 61% offTop dealPreschool Toys',
  "Up to 32% offDealUp to 32% offDealMen's Dress Shirts from Calvin Klein and Kenneth Cole",
  'Up to 53% offDealUp to 53% offDealG-Star Raw Apparel',
  "Up to 40% offTop dealUp to 40% offTop deal33,000ft Men' and Women's Outdoor clothing",
  'Up to 57% offTop dealUp to 57% offTop dealColgate Hum & whitening',
  'Up to 50% offTop dealUp to 50% offTop dealDAYBETTER Lighting Products',
  'Up to 50% offTop dealUp to 50% offTop dealVahdam India Gifts',
  'Up to 55% offTop dealUp to 55% offTop dealFIT KING Foot Massagers',
  ...
]

You can make use of browserless.io’s APIs as well, read more about it here, specifically scrape API can be powerful in many places.

Reference guides

Website health monitoring service

Performance metrics are some of the critical numbers for a developer to gauge the real-world behavior of a website. These metrics include details on latency, time-to-interaction of a website, layout shifts, and many other vital areas.

Monitoring these website metrics can have a massive impact on its usability. It can give you feedback on SEO, which can help you rank better in search engine results. Since this is hard to maintain when you are deploying changes frequently, a service that can keep track of this automatically in intervals or on every new deployment can have a considerable impact.

The /stats API by Browserless gives you important metrics about a website, including accessibility details, best practices, performance, layout shift information, time-to-interaction, and much other information. Since this is powered by Google’s lighthouse project, a widely accepted tool to run audits, you can entirely rely on this. If you are working on a product around this, you can focus on the business logic and building the solution without worrying about infrastructure.

Ex. You can try generating stats of google.com with a simple curl command. Make sure to replace YOUR_API_TOKEN with your actual token. You can also go through https://docs.browserless.io/v1/ to set up your Browserless.io account.


curl -X POST \
  https://chrome.browserless.io/stats?token=YOUR_API_TOKEN \
  -H 'Cache-Control: no-cache' \
  -H 'Content-Type: application/json' \
  -d '
{
  "url": "https://google.com"
}'

Know more about the metrics API here https://docs.browserless.io/v1/APIs/stats

2. Sports

Content curation for print magazines

Content curation plays a major role in any magazine. You can develop a flow that can scrape content for print magazines by staying up to date on athletes’ social profiles, news, etc.

Sports statistics website

By scraping data from different sports websites, you can create a website that provides detailed analytics of a particular match, information about the schedule, team, etc. You can also share a detailed report after every match by automatically scpare different sports websites.

Below example script fetches matches from espncricinfo.com and nfl.com


const puppeteer = require('puppeteer');

const API_TOKEN = process.env.API_TOKEN;

if (!API_TOKEN) {
  throw new Error('No API_TOKEN provided');
}

(async () => {
  const browser = await puppeteer.connect({
    browserWSEndpoint:
    'wss://chrome.browserless.io?token=' + API_TOKEN,
  });

  const page = await browser.newPage();

  await page.goto("https://www.espncricinfo.com");

  const cricketMatches = await page.evaluate(() => {
    const elements = document.querySelectorAll('.card.scorecard');
    const content = [];
    elements.forEach(ele => content.push(ele.textContent));
    return content;
  });

  // The outputs are in raw text format, you can format them for better structure
  console.log("Cricket matches from espncricinfo.com", cricketMatches);

  // Set a different viewport, since the match list is not visible in smaller screens and puppeteer opens up window in defualt viewport of 800x600
  await page.setViewport({width: 1400, height: 1200 });
    
  await page.goto("https://www.nfl.com/");

  await page.waitForTimeout(3000);

  await page.waitForSelector('.score-strip-list-container .score-strip-game');

  // Nfl 
  const nflMatches = await page.evaluate(() => {
    const elements = document.querySelectorAll('.score-strip-list-container .score-strip-game')
    const content = [];
    elements.forEach(ele => content.push(ele.textContent));
    return content;
  });

  console.log("Cricket matches from nfl.com", nflMatches);

  browser.close();
})();

3. E-commerce

Product review aggregation
You can scrape product review websites, forums, and e-commerce websites to create a database of product reviews and ratings, which can be used to help consumers make informed purchasing decisions.

Product monitoring service
Product monitoring service can be developed as a tool for e-commerce website owners to keep track of their competitor’s inventory and pricing. This solution can help them adjust and optimize their product pricing according to their competition.

Automation for e-commerce owners
Business owners who maintain multiple stores or upload their products on numerous e-commerce platforms usually have to go through the repetitive task of creating and uploading content on various platforms. You can build a platform to manage all their listings in one place. For those e-commerce platforms which don’t have APIs, build automation that can upload/edit and manage listings of multiple e-commerce platforms from one place.

Reference guides

4. Travel

You can tap into one of the most booming industries, and tackle some of the significant problems of travelers.

  • Scraping data from travel websites, you can create a search engine that helps people find the best deals on flights, hotels, and rental cars.
  • A website that scrapes and compares all listings from Airbnb, Google maps, and other hotel booking platforms for a particular destination. (P.S If you want to know how to scrape Google maps, you can check out this article)
  • Automate flight/hotel booking. This can help customers book the best price available anytime, even when asleep.
  • A helper tool that can generate a list of necessary details on a particular travel route, such as hospitals, gas stations, eateries, etc.

Reference guides

5. Finance

Stock market and trading

  • Let users hook their trading platforms/investment tools into your product and automate usual tasks like generating daily trading reports, analyzing trades that made them profit or loss.
  • Let them control their trades algorithmically on traditional trading platforms.
  • Generate a brief about the news that can help decide on a trade
  • A morning brief of important news about a list of companies before trading hours start.

Venture capital

  • Generate deep insights about companies for investors. Knowing where the company’s name appears on the web can be crucial. Information on the founding team, the number of funding rounds etc can be additional deciding points for any investor.

Example script to login to tradingview.com


// tradingview-login.js
const puppeteer = require("puppeteer");

const API_TOKEN = process.env.API_TOKEN;

if (!API_TOKEN) {
  throw new Error("No API_TOKEN provided");
}

// It's better to store these in a .env files
const creds = {
  EMAIL: 'YOUR_EMAIL_HERE',
  PASSWORD: 'YOUR_PASSWORD_HERE',
};

(async () => {
  const browser = await puppeteer.connect({ browserWSEndpoint: 'wss://chrome.browserless.io?token=' + API_TOKEN });

  const page = await browser.newPage();

  await page.goto("https://in.tradingview.com/");
  await page.click(".tv-header__user-menu-button.tv-header__user-menu-button--anonymous.js-header-user-menu-button");
  await page.waitForSelector("[data-name=header-user-menu-sign-in]");
  await page.click("[data-name=header-user-menu-sign-in]");

  await page.waitForSelector(".tv-signin-dialog__area > div > .i-clearfix > div > .tv-signin-dialog__social");
  await page.click(".tv-signin-dialog__area > div > .i-clearfix > div > .tv-signin-dialog__social");

  await page.waitForSelector("[id^=email-signin__user-name-input]");

  await page.focus("[id^=email-signin__user-name-input]");

  await page.type("[id^=email-signin__user-name-input]",creds.EMAIL);

  await page.focus("[id^=email-signin__user-name-input]");

  await page.type("[id^=email-signin__password-input]",creds.PASSWORD);

  await page.click("[id^=email-signin__submit-button]");

  await page.waitForSelector("[data-symbol-short=USDINR]");

  const usdinr = await page.evaluate(() => {
    return document.querySelector('[data-symbol-short=USDINR]').textContent;
  });

  console.log("usd/inr", usdinr);

  browser.close();
})();

If you run the above script

API_TOKEN=YOUR_TOKEN_HERE node tradingview-login.js

You should get something like this as output. The current USD/INR forex rate at the time you run this script.

usdinr UUSDINRRMarket Open82.1750−0.1060−0.13%0

Although this is a simple script to log in, this can demonstrate how to login to platforms via a headless method. You can build on top of these to automate trades, scraping daily reports and feeding them to Google sheets, etc., and an infinite number of solutions like these.

6. Agriculture

  • Market research can be a massive boon for Agriculture. Knowing what to grow in the upcoming season, knowledge about the price trends, and alerts or predictions about climate conditions and other situations are useful for farmers and investors
  • Farmers usually need to be made aware of the trends in other parts of their country or world where the soil can be of the same type. Keep them informed about the new tools, utilities, fertilizers, and crops that can be used or grown on their land.

7. Marketing

Social media is the go-to for any marketer to find users or people who can advertise their products/services.

  • Scrape platforms like Instagram, Twitter, TikTok, etc. to create a database that has information about influencers, the area that they are active in, and various other factors. This can act as a lead-generation tool for marketing teams. By providing a way to collect and download a list of good leads from various platforms, you can help marketing teams to know exactly whom to get in touch with.
  • Collecting detailed insights on how a product/brand is doing on various social media platforms w.r.t the trending hashtags, followers, or about the influencers working with those brands, etc.
  • Content production is also one of the significant problems for marketing teams. A solution that can scrape content from the web and use AI to generate content for their social media promotions can also be a very interesting solution.

8. Real estate

  • Scrape real estate listings from multiple websites and create a comprehensive database of available properties, complete with photos, descriptions, and pricing information.
  • Analyzing current market trends and future hot spots for real estate.
  • Giving insights on how competitors are doing around an area w.r.t price/construction types etc.
  • Scrape real estate listings for price images, area, types and various other factors to provide in-depth market analysis.
  • Adjusting the pricing of a real estate owner’s listings automatically based on specific trends/market conditions can help them make higher profits.

Reference guides

9. Media, Entertainment and News

Scrape news websites and blogs to create a comprehensive news feed that covers a wide range of topics, including politics, business, sports, and entertainment. Use web scraping to gather data from online sources and conduct market research on a wide range of topics, including consumer trends, industry analysis, and competitor analysis.

  • For media companies, help them see what competitors are up to by scraping news websites and other media outlets. Provide ways to research, for ex. Understanding which genre could be a hit on what time of the year will be a good thing to know before launching a new show or planning new content. An in-depth analysis of past data from various websites can help decide when to announce or launch something for an entertainment company.
  • Help companies in talent hunting by scraping social media, and other media websites and personal portfolio
  • Finding the right content to watch and not waste time looking for it is a challenge. You can build solutions that can exactly show what consumers need based on their history. You can also automatically allot time and send notifications daily.

Developing a news aggregation platform
By scraping data from news websites, you can create a platform that curates and organizes the latest news, stories from around the world.

Reference guides

10. Jobs

Job listing website
Create a job listings aggregator website, by automatically scraping the career pages of companies. Notify job seekers when there are openings for their interested areas.

Newsletter service
You can also create a weekly newsletter service that can send a list of job openings.

Reference guides

11. Automate research and study

Web scraping is one of the most common ways to generate large datasets for machine learning models.

  • Provide automated flows to generate keyword-specific data, citations, and research material by scraping the web.
  • Help news agencies dig deeper into a keyword, detect fake news, look for trending and breaking news, etc.
  • Generate summaries of web pages relevant to keywords to ease the research further.

Reference guides

12. Micro sass ideas

  • E-commerce price notifier & aggregator
    • You can create a Twitter bot, Telegram bot, newsletter or an SMS broadcaster that sends a notification when the price of an item goes below a specified value by the customer. You can charge nominal fees for the customers of this tool.
    • You can also make something like an “Amazon product of the day” bot, which tweets a particular product under some XYZ category at its lowest price.
  • Education
    • Help new college students/parents decide on an educational institution. Scrape the internet for positive/negative feedback, the academic results, crime rates, facilities, fees of an educational institution w.r.t different streams. Let the students/parents make an informed decision.
  • Social media watcher
    • Most of social media platforms or any content platforms do not provide intelligent tools to manage spam/hate etc. You can build a profile watcher which can monitor spam/hate content for a particular user. They can get notified automatically whenever something unwanted is found on their profiles, posts or feed etc.
  • Automated reservation/booking platforms
    • For high-traffic websites like event platforms and sports ticketing platforms, you can develop tools to automate booking as soon as the tickets are launched.
  • Weather forecasting website
    • By scraping data from weather websites, you can create a website that provides accurate and up-to-date weather forecasts for any location.
  • Recipe search engine
    • Scrape recipe websites, and cooking show websites to create a searchable database of recipes, complete with ingredients, instructions, and nutritional information.

That’s all for today, thank you for reading this article. I hope this has given some insights into how you can use web scraping to build solutions and develop a business around them.

Browserless can be your best bet if you build a web scraping business. You can take care of developing your business and let Browserless handle scaling, maintaining, upgrading infrastructure and all other backend tasks that might steer you away from growing your business. Give it a try today - Visit https://www.browserless.io to know more.

Share this article

Ready to try the benefits of Browserless?

Sign Up