Home > python, scraping > web scraping: BS4 supports CSS select

web scraping: BS4 supports CSS select

BeautifulSoup is an excellent tool for web scraping. The development of BeautifulSoup 3 stopped in 2012, its author concentrates on BeautifulSoup 4 since then.

In this post I want to show how to use CSS selectors. With CSS selectors you can select part of a webpage, which is what we need when we do web scraping. Another possibility is to use XPath. I find CSS selectors easier to use. You can read this post too for a comparison: Why CSS Locators are the way to go vs XPath.

Let’s go through a concrete example, that way it will be easier to understand.

The page http://developerexcuses.com/ prints a funny line that developers can use as an excuse. Let’s extract this line.

Visit the page, start Firebug, and click on the line (steps 1 and 2 on the figure below):


Right click on the orange line (“<a style=...“) and choose “Copy CSS Path”. Now the CSS path of the selected HTML element is on the clipboard, which is “html body div.wrapper center a” in this example.

Now let’s write a script that prints this part of the HTML source:

import requests
import bs4

def main():
    r = requests.get("http://developerexcuses.com/")
    soup = bs4.BeautifulSoup(r.text)
    print soup.select("html body div.wrapper center a")[0].text

if __name__ == "__main__":
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: