Node.js Scraper

Google News Scraper

manticarodrigo Node.js 2 Comments


Google does not allow robots and scraper scripts to fetch content from their search engine. You can read the details here. Consequently, this tutorial is meant to be instructional but not for use in a production application.


A scraper is an automated script that downloads a website’s template/HTML and then parses the content to extract data in a meaningful way. This is very similar to the web crawling. Google’s search engine crawls sites to index the web and make it easy for us to find relevant content online.

This is a simple Node.js program that can start a Google News search and then extract an article’s title, description, image, and url. The same concept can be used for a news feed inside of your application or website.

Building the Scraper

For this tutorial, you need Node.js installed on your machine and a text editor (Visual Studio Code recommended).

First of all, you want to do is create a folder for the project and navigate to it in your terminal window (command prompt for PC).

Run npm init.

This will create a new package.json inside of your project folder.

Next run npm install request --save and npm install cheerio --save.

The request module allows us to run an http request from the server-side using node. This is the module that will fetch Google New’s website template/HTML.

The cheerio module is a library that uses jQuery-like syntax to interact with our HTML using node.

From here, all you need to do is create a file named scraper.js and paste the following code:

You will notice that the searchUrl is constructed using a searchTerm variable. The input typed in the search box will be a part of the url of the page with the search results when you run a google search. We are applying the same concept here.

Lastly, you can see the console print out the first page’s results in the format we created when you run node scraper.js.

That’s it! This is simple example of a Node.js scraper. You can create your own to crawl thousands of sources and fetch valuable data for users.

You can check out my other tutorial on setting up a basic Node.js architecture with similar functionality to Google’s popular DBAAS, Firebase, here.

Comments 2

    1. Post

Leave a Reply

Your email address will not be published. Required fields are marked *