WEB SCRAPING WITH PYTHON: https://campus.datacamp.com/courses/web-scraping-with-python

WEB SCRAPING WITH PYTHON:

 https://campus.datacamp.com/courses/web-scraping-with-python


Consider the HTML code:

<html>
  <body>
    <div>
      <p>Good Luck!</p>
      <p>Not here...</p>
    </div>
    <div>
      <p>Where am I?</p>
    </div>
  </body>
</html>










It's Time to P

In the lecture, we learned how to use double forward-slashes to navigate to all future generations. In this exercise, you will select all paragraph p elements within the HTML. Because we want you to navigate to all paragraph elements, it is not important that you know what the HTML code is, since the task can be accomplished with a simple XPath string using the double forward-slash notation you have learned.






A classy span

Although we haven't yet gone deep into XPath, one thing we can do is select elements by their attributes using an XPath. For example, if we want to direct to the div element within the HTML document whose id attribute is "uid", then we could write the XPath string '//div[@id="uid"]'. The first part of this string, //div, first looks at all div elements in the HTML document. Then, using the brackets, we specify that we want only the div element with a specific id attribute (in this case uid). To note, the phrase @id="uid" in the brackets would be read as "attribute id equals uid".

In this exercise, you will select all span elements whose class attribute equals "span-class". (Note: span is just another possible tag-name).





 Fundamental techniques in computational web scraping






















































Usage of Web Scrapping 

·  Web scraping can be useful for looking through product reviews to gauge public opinion about a particular product.

·  Web scraping can be useful for reading through social media posts between users in different areas to compare different language usage.

·  Web scraping can be useful for going through online news publications to pick out articles discussing a particular topic.

 
























































Choose DataCamp!

In this exercise, we want to give you the opportunity to create your own XPath string to achieve a certain task; the task is to select the paragraph element containing the text "Choose DataCamp!".

Consider the following HTML:

<html>
  <body>
    <div>
      <p>Hello World!</p>
      <div>
        <p>Choose DataCamp!</p>
      </div>
    </div>
    <div>
      <p>Thanks for Watching!</p>
    </div>
  </body>
</html>






























































    Where it's @

    In this exercise, you'll begin to write an XPath string using attributes to achieve a certain task; that task is to select the paragraph element containing the text "Thanks for Watching!". We've already created most of the XPath string for you.

    Consider the following HTML:

    <html>
      <body>
        <div id="div1" class="class-1">
          <p class="class-1 class-2">Hello World!</p>
          <div id="div2">
            <p id="p2" class="class-2">Choose DataCamp!</p>
          </div>
        </div>
        <div id="div3" class="class-2">
          <p class="class-2">Thanks for Watching!</p>
        </div>
      </body>
    </html>
    # Create an Xpath string to select desired p element
    xpath = '//*[@id="div3"]/p'

    # Print out selection text
    print_element_text( xpath )


    <html>
      <body>
        <div id="div1" class="class-1">
          <p class="class-1 class-2">Hello World!</p>
          <div id="div2">
            <p id="p2" class="class-2">Choose DataCamp!</p>
          </div>
        </div>
        <div id="div3" class="class-2">
          <p class="class-2">Thanks for Watching!</p>
        </div>
      </body>
    </html>
    # Create an XPath string to select p element by class
    xpath = '//p[@class="class-1 class-2"]'

    # Print out select text
    print_element_text( xpath )


    <script.py> output: Hello World!











    Comments

    Popular posts from this blog

    PANDAS micro course by www.Kaggle.com https://www.kaggle.com/learn/pandas

    Course No 2 Using Python to Interact with the Operating System Rough Notes

    Introduction to Git and GitHub https://www.coursera.org/learn/introduction-git-github/