Learn about browser DevTools, a valuable tool in the world of web scraping , and how you can use them to collect data from a website.
Even though DevTools stands for developer tools, everyone can use them to inspect a website. Each major browser has their own DevTools. We will use Chrome DevTools as an example, but the advice is applicable to any browser, as the tools are extremely similar. To open Chrome DevTools, press F12 or right-click anywhere in the page and choose Inspect.
When you first open Chrome DevTools, you will start on the Elements tab (In Firefox it's called the Inspector). You can use this tab to inspect the page's HTML on the left hand side, and its CSS on the right. The items in the HTML view are called elements or nodes. Each element is enclosed in an HTML tag. For example
<span> are all tags. When you add something inside of those tags, like
<p>Hello!</p> you create an element. You can also see elements inside other elements in the Elements tab. This is called nesting, and it gives the page its structure.
In the top left corner of DevTools, there's a little arrow icon with a square. When you click it and hover your mouse over some of the website's content, DevTools will show you information about the HTML element being hovered over. When you click the element, it will be selected in the Elements tab, which allows for further inspection of the node and its content.
After you select an element, you can right-click the highlighted element in the Elements tab to show a menu with available actions. For now, select Store as global variable (Use in Console in Firefox). You'll see that a new variable called
For example, if you wanted to scrape the text in the element, you could use the
textContent property to get it. The following command will get the text of your
temp1 element and print it to the console using the
You can also get the HTML of the element using this command:
And you could even change the text of the element, if you wanted to:
temp1.textContent = 'Hello!';
You can interact with the page in many more ways using the Console. If you want to dive deeper we recommend this tutorial on documents. A web page in a browser is called a document.
In this lesson, we learned the absolute basics of interaction with a page using the DevTools. In the next lesson, you will learn how to extract data from it. We will collect data about the on-sale products on this e-commerce website. This isn't a real website, just a demo site for the Academy which uses the latest technologies that a real-world modern e-commerce web-application would use.