WEB SCRAPING WITH PYTHON: https://campus.datacamp.com/courses/web-scraping-with-python
WEB SCRAPING WITH PYTHON:
https://campus.datacamp.com/courses/web-scraping-with-python
Consider the HTML code:
<html>
<body>
<div>
<p>Good Luck!</p>
<p>Not here...</p>
</div>
<div>
<p>Where am I?</p>
</div>
</body>
</html>
It's Time to P
In the lecture, we learned how to use double forward-slashes to navigate to all future generations. In this exercise, you will select all paragraph p
elements within the HTML. Because we want you to navigate to all paragraph elements, it is not important that you know what the HTML code is, since the task can be accomplished with a simple XPath string using the double forward-slash notation you have learned.
A classy span
Although we haven't yet gone deep into XPath, one thing we can do is select elements by their attributes using an XPath. For example, if we want to direct to the div
element within the HTML document whose id
attribute is "uid"
, then we could write the XPath string '//div[@id="uid"]'
. The first part of this string, //div
, first looks at all div
elements in the HTML document. Then, using the brackets, we specify that we want only the div
element with a specific id
attribute (in this case uid
). To note, the phrase @id="uid"
in the brackets would be read as "attribute id
equals uid
".
In this exercise, you will select all span
elements whose class attribute equals "span-class"
. (Note: span
is just another possible tag-name).
Fundamental techniques in computational web scraping
Usage of Web Scrapping
· Web
scraping can be useful for looking through product reviews to gauge public
opinion about a particular product.
· Web
scraping can be useful for reading through social media posts between users in
different areas to compare different language usage.
· Web scraping can be useful for going through online news publications to pick out articles discussing a particular topic.
Choose DataCamp!
In this exercise, we want to give you the opportunity to create your own XPath string to achieve a certain task; the task is to select the paragraph element containing the text "Choose DataCamp!".
Consider the following HTML:
<html>
<body>
<div>
<p>Hello World!</p>
<div>
<p>Choose DataCamp!</p>
</div>
</div>
<div>
<p>Thanks for Watching!</p>
</div>
</body>
</html>
Where it's @
In this exercise, you'll begin to write an XPath string using attributes to achieve a certain task; that task is to select the paragraph element containing the text "Thanks for Watching!". We've already created most of the XPath string for you.
Consider the following HTML:
<html>
<body>
<div id="div1" class="class-1">
<p class="class-1 class-2">Hello World!</p>
<div id="div2">
<p id="p2" class="class-2">Choose DataCamp!</p>
</div>
</div>
<div id="div3" class="class-2">
<p class="class-2">Thanks for Watching!</p>
</div>
</body>
</html>
<html>
<body>
<div id="div1" class="class-1">
<p class="class-1 class-2">Hello World!</p>
<div id="div2">
<p id="p2" class="class-2">Choose DataCamp!</p>
</div>
</div>
<div id="div3" class="class-2">
<p class="class-2">Thanks for Watching!</p>
</div>
</body>
</html>
<script.py> output: Hello World!
Comments
Post a Comment