How To Build Your Own X (Twitter) Scraper Tool
Introduction
If you've tried to build anything serious on X (formerly Twitter) over the last couple of years, you've probably hit a wall: The old, generous Twitter API is effectively gone for most developers.
The platform has retired legacy tiers, removed free access, and pushed most meaningful access behind paid plans with aggressive rate limits and higher pricing.
That's why so many teams have shifted from using the official twitter api to scraping X's web UI directly, carefully and within legal and ethical boundaries, to collect public data. Instead of a traditional Twitter scraper API, you run a real browser, load a Twitter profile, and extract data from the rendered DOM.
In this guide, you'll build your own X scraper / Twitter scraper tool that:
- Connects to a managed Chrome instance in the cloud through Browserless
- Navigates to a public Twitter/X profile
- Scrapes profile info, follower counts, and engagement data from recent tweets
- Returns clean JSON you can export to CSV, send to a database, or feed into AI models
By the end, you'll have a reusable Twitter profile scraper you can adapt to track engagement metrics, do sentiment analysis on tweet text, monitor trends for specific hashtags, or watch influential users over time.
What You'll Need: Tools For a Production-Ready X Scraper
You don't need to run Chrome locally or fight with headless config on a server. Instead, you'll lean on Browserless to provide Browsers as a Service.
Here's the stack you'll use:
Browserless account and API key
Browserless runs headless (and "headed") browsers for you in the cloud. You connect over WebSocket via Puppeteer or Playwright, or call simple REST endpoints like /scrape when you just want structured data back.
- You'll use Browsers as a Service (BaaS) with Puppeteer for this tutorial
- You'll authenticate with an api key passed as a token in the connection URL
Node.js + puppeteer-core
puppeteer-core is the "bring your own Chrome" version of Puppeteer – perfect when Chrome is running on Browserless, not on your box. You get full browser automation without shipping Chrome binaries in Browserless using Puppeteer Core.
Basic JavaScript familiarity
You're already shipping JS/TS, so we'll keep it straightforward: one script, a couple of functions, some DOM selectors.
Somewhere to put the scraped data
In this guide, you'll just console.log JSON. In production you'll usually:
- Write to a database (PostgreSQL, Mongo, etc.)
- Dump to CSV for Excel / BI tools
- Stream to an analytics pipeline or queue for downstream processing
You can think of this setup as a self-hosted alternative to other scraper options, but fully under your control and designed around Browserless rather than a black-box service.
The 5 Steps to Scrape X (Formerly Twitter) with Browserless
You're going to build a small but real X scraper tool that:
- Opens a Twitter/X profile
- Scrapes user info (profile name, username, followers, following)
- Scrapes recent tweets with engagement data and optional media URLs
- Returns structured JSON
Step 1: Create a Browserless Account and Grab Your API token
- Sign up or log in to Browserless.
- Grab your API token from the dashboard.
- Keep it safe – you'll pass it as token=YOUR_API_TOKEN_HERE in your WebSocket URL.
You're essentially outsourcing all the heavy lifting around headless Chrome: scaling, rate limits, bot detection tuning, regional endpoints, and session management.
Your code never launches Chrome directly, it just connects to Browserless and starts issuing commands.
Step 2: Initialize Your Node Project and Install Puppeteer-core
Create a new folder and initialize a Node project:
mkdir x-scraper && cd x-scraper
npm init -y
npm install puppeteer-core
If you prefer ESM (recommended here), add this to your package.json:
{
"type": "module"
}
Now create x-scraper.mjs (or index.mjs):
import puppeteer from "puppeteer-core";
const BROWSERLESS_API_KEY = process.env.BROWSERLESS_API_KEY;
You've just laid the groundwork for a scraper Twitter/X devs can extend into their own internal X data pipelines.
Step 3: Connect to Browserless and Open a Twitter/X Profile
Next, connect to Browserless and navigate to a profile page.
At Browserless, we recommend connecting via a regional endpoint, such as production-sfo.browserless.io for US West.
Add this to your script:
import puppeteer from "puppeteer-core";
const BROWSERLESS_API_KEY = process.env.BROWSERLESS_API_KEY;
async function createBrowser() {
const browser = await puppeteer.connect({
browserWSEndpoint: `wss://production-sfo.browserless.io?token=${BROWSERLESS_API_KEY}`,
});
return browser;
}
async function getTwitterData(profileUrl) {
const browser = await createBrowser();
const page = await browser.newPage();
// X will often lazy-load content, so wait for the UI to settle
await page.goto(profileUrl, { waitUntil: "networkidle2" });
// Tweets are rendered as <article> elements
await page.waitForSelector("article");
// TODO: scrape profile + tweets
await browser.close();
}
Then:
const data = await getTwitterData("https://x.com/NASA");
// or: const data = await getTwitterData("https://twitter.com/NASA");
console.log(JSON.stringify(data, null, 2));
At this point, you've already got a minimal Python-ready flow too. If you'd rather build a Python Twitter scraper, you can hit Browserless's REST /scrape API from Python instead of using Puppeteer (we'll touch on that in the FAQ).
Step 4: Scrape Profile Info and Follower Stats
Now you'll turn this into a real X profile scraper by extracting user info:
- profileName – human-readable display name
- username – the @username handle
- followers – label text (e.g., 58M Followers)
- following – label text (e.g., 182 Following)
Add a helper to run inside the page context:
async function getProfileInfo(page) {
return page.evaluate(() => {
const $ = (selector) => document.querySelector(selector);
// Selectors may change when X updates the UI – keep an eye on them.
const profileName = $('[data-testid="UserName"] div span')?.innerText ?? null;
const username = $('[data-testid="UserName"] div:nth-of-type(2) span')?.innerText ?? null;
const followers = document.querySelector('a[href$="/followers"] span')?.innerText ?? null;
const following = document.querySelector('a[href$="/following"] span')?.innerText ?? null;
return { profileName, username, followers, following };
});
}
The idea is simple:
- Let X render the SPA in the browser.
- Use DOM selectors to extract data from the final HTML instead of trying to reverse engineer internal JSON APIs that may be blocked or rate-limited.
Inside getTwitterData, call getProfileInfo after waitForSelector:
async function getTwitterData(profileUrl) {
const browser = await createBrowser();
const page = await browser.newPage();
await page.goto(profileUrl, { waitUntil: "networkidle2" });
await page.waitForSelector("article");
const profile = await getProfileInfo(page);
await browser.close();
return { ...profile, tweets: [] };
}
This gives you enough public data to build a simple X/Twitter follower scraper that monitors follower changes over time, using your own datastore and dashboards instead of relying on third-party tooling.
Step 5 – Scrape Tweets, Engagement Metrics, and Media URLs
Now you'll turn this into a more complete X scraper that can:
- Scrape tweets from the visible timeline
- Pull engagement data (replies, retweets, likes)
- Optionally scrape tweet images and other media URLs
Each tweet is an article element, so you can iterate over them and grab what you need:
async function getTweetMetrics(page, maxTweets = 20) {
return page.evaluate((limit) => {
const articles = [...document.querySelectorAll("article")].slice(0, limit);
return articles.map((el) => {
const timeEl = el.querySelector("time");
const stats = {
submitted: timeEl?.dateTime ?? null,
tweetText: el.querySelector('[data-testid="tweetText"]')?.innerText ?? null,
replies: el.querySelector('[data-testid="reply"]')?.innerText ?? "0",
retweets: el.querySelector('[data-testid="retweet"]')?.innerText ?? "0",
likes: el.querySelector('[data-testid="like"]')?.innerText ?? "0",
};
// Simple twitter media scraper: collect visible image URLs
const imageEls = el.querySelectorAll('img[src*="twimg.com/media"]');
const mediaUrls = [...imageEls].map((img) => img.src);
return { ...stats, mediaUrls };
});
}, maxTweets);
}
Update getTwitterData to combine everything:
async function getTwitterData(profileUrl) {
const browser = await createBrowser();
const page = await browser.newPage();
await page.goto(profileUrl, { waitUntil: "networkidle2" });
await page.waitForSelector("article");
const profile = await getProfileInfo(page);
const tweets = await getTweetMetrics(page, 30);
await browser.close();
return {
...profile,
tweets,
scrapedAt: new Date().toISOString(),
};
}
const data = await getTwitterData("https://x.com/NASA");
console.log(JSON.stringify(data, null, 2));
You've now got:
A working X scraper that returns JSON and enough structure to:
- Export to CSV (for Excel or BI tools)
- Feed into AI models for sentiment analysis or topic detection on tweet text
- Build dashboards showing trends, engagement, and conversation threads over time
- Pull profiles into a lead qualification tool
In practice, you'll likely:
- Wrap this in a job that runs on a schedule.
- Add basic logging and error-handling for 403/429 responses, network issues, or UI changes.
- Store results in a database keyed by username plus timestamp.
From here, it's trivial to adapt this into an X/Twitter media scraper (collecting more media urls and videos), an X/Twitter email scraper (if you're scraping emails that the user has chosen to make public), or a wider X/Twitter scraper API inside your own backend.
Conclusion
The official X API has become expensive, tightly controlled, and heavily rate-limited for most small teams and individual developers.
Instead of wrestling with that, you've just built a Browserless-powered X/Twitter scraper tool that:
- Uses a real browser to load Twitter/X like a normal user
- Scrapes user profiles and manages scraping tweets directly from the DOM
- Captures engagement data and optional media URLs in structured JSON
You're not trying to be invisible – just to build a robust, maintainable scraping tool that respects rate limits, focuses on legitimate public data collection, and avoids fragile hacks.
If you outgrow a single Puppeteer script, Browserless gives you multiple ways to scale up:
- Switch from raw Puppeteer to BrowserQL or the /scrape REST API when you want structured JSON back without custom DOM logic.
- Use regional endpoints and advanced launch options when you need better reliability on more challenging platforms like X.
- Keep sessions alive across runs to reduce login friction and improve performance.
From here, you can extend this into whatever you need: monitoring audience growth for a specific user, tracking hashtags and search results, pulling media from tweets you've posted for automated tracking, or building in-house analytics around posts, retweets, and replies – all without running your own Chrome fleet.
FAQs
1. What is a Twitter/X Scraper?
A Twitter/X scraper is any script or service that loads Twitter/X pages and extracts public data from them:
- Profile info (username, bio, followers, following)
- Tweets (text, timestamps, hashtags, URLs)
- Engagement (replies, retweets, likes)
- Optional media (images, videos, media URLs)
In the past, a "Twitter scraper" often meant calling the official twitter api. Today, many teams use headless browsers instead – a Browserless-backed scraper Twitter/X setup is effectively an automated browser that navigates the site just like a real user and returns scraped data in JSON.
You can use this data to:
- Track engagement metrics for a specific user or brand
- Analyze trending topics, hashtags, and conversation threads
- Feed tweets into AI models for classification or sentiment analysis
Whatever stack you choose (Node, Python, etc.), the core idea is the same: use a scraping tool to gather public data responsibly.
2. How do I scrape data from Twitter/X?
High-level, scraping Twitter/X safely and sustainably looks like this:
- Decide what you're scraping: A single user timeline, multiple user profiles, search results, or specific hashtag pages. Different pages expose slightly different structures.
- Use a real browser, not raw HTTP: X is a modern SPA, with plenty of dynamic content and anti-bot measures. Trying to scrape with plain requests and static HTML is brittle. A Browserless-backed browser (Puppeteer, Playwright, or BrowserQL) handles all the JS and html rendering for you.
- Extract data with focused selectors: Target article elements for tweets, data-testid attributes for counters, and img tags for tweet images. Keep your selectors narrow so you don't break when non-critical parts of the UI change.
- Respect limits and ToS: X can change rate limits, block IPs, or require login for certain pages. Only scrape public data you're allowed to access, and avoid aggressive crawling patterns that look like abuse.
If you've used hosted tools before, this Browserless approach is similar, but you own the code, control the data collection, and can adapt quickly when X tweaks the UI.
3. How do I scrape Twitter data using Python?
You've just seen the Node.js version, but building an X/Twitter scraper with Python is straightforward with Browserless.
You have two main options:
- Use Browserless's /scrape REST API as a Python Twitter scraper:
import os
import requests
import json
TOKEN = os.environ["BROWSERLESS_API_KEY"]
payload = {
"url": "https://x.com/NASA",
"elements": [
{ "selector": '[data-testid=\"UserName\"] div span', "name": "profileName" },
{ "selector": '[data-testid=\"UserName\"] div:nth-of-type(2) span', "name": "username" },
{ "selector": 'a[href$=\"/followers\"] span', "name": "followers" },
{ "selector": 'a[href$=\"/following\"] span', "name": "following" },
]
}
resp = requests.post(
f"https://production-sfo.browserless.io/scrape?token={TOKEN}",
headers={"content-type": "application/json"},
data=json.dumps(payload),
timeout=60,
)
data = resp.json()
print(json.dumps(data, indent=2))
Browserless handles the browser session, applies the selectors, and returns structured JSON – perfect when you want a lightweight Twitter scraper API inside your Python backend.
- Use Puppeteer-style libraries for Python (e.g., pyppeteer or Playwright) and connect over WebSocket to Browserless, just like you did with Node.
- Connect to
wss://production-sfo.browserless.io?token=YOUR_API_TOKEN - Open the url for the user timeline or search
- Use
page.querySelector/querySelectorAllequivalents to scrape tweets and engagement metrics
In both cases, Python is just the control plane. Browserless runs the browser, deals with rate limits, bot detection, and the messy bits of headless Chrome. You focus on mapping search terms, usernames, and timelines to the scraped data you actually care about – and pushing it into your own csv, json, or database outputs.
