Create a new project with NPM and Node.js. Install necessary libraries, and test that everything works before starting the next modules.
When you open a website in a browser, the browser first downloads the page's initial HTML. To do the same thing with Node.js, we will install a program - an NPM module - to help us with it. NPM modules are installed using
npm, which is another program, automatically installed with Node.js.
Before we can install NPM modules, we need to create an NPM project. To do that, you can create a new directory or use the one that you already have open in VSCode (you can delete the hello.js file now) and from that directory run this command in your terminal:
npm init -y
It will set up an empty NPM project for you and create a file called package.json. This is a very important file in Node.js programming as it contains information about the project.
Node.js and NPM support two types of projects, let's call them legacy and modern. For backwards compatibility, the legacy version is used by default. To switch to the modern version, open your package.json and add this line to the end of the JSON object. Don't forget to add a comma to the end of the previous line 😉
If you want to learn more about JSON and its syntax, we recommend this tutorial on MDN.
Now that we have a project set up, we can install NPM modules into the project. We will do that and install libraries that will help us very easily download and process websites' HTML. In the project directory, run the following command, which will install two libraries into your project. got-scraping and Cheerio.
npm install got-scraping cheerio
got-scraping is a library that's made especially for scraping and downloading page's HTML. It's based on the very popular got library, which means any features of got are also available in got-scraping. More precisely, got and got-scraping are HTTP clients. To learn more about HTTP, visit this MDN tutorial.
With the libraries installed, create a new file in the project's folder called main.js. This is where we will put all our code. Before we start scraping, though, let's do a simple check that everything installed correctly. Inside main.js add this piece of code.
import gotScraping from 'got-scraping'; import cheerio from 'cheerio'; console.log('it works!');
import statements tell Node.js that it should give you access to the got-scraping library under the
gotScraping variable and the Cheerio library under the
cheerio variable. Now run this command in your terminal:
type property to your package.json. If you see a different error, try copying and pasting it into Google, and you'll find a solution soon.
With the project set up, the next lesson will show you how to use got-scraping to download the website's HTML and collect data from it with Cheerio.