Posts Tagged ‘cssselect’

BeautifulSoup with CssSelect? Yes!

September 18, 2011 Leave a comment

(20131215) This post is out-of-date. BeautifulSoup 4 has built-in support for CSS selectors. Check out this post.

A few days ago I started to explore lxml (it’s been on my list for a long time) and I really like its CSS selector. As I used BeautifulSoup a lot in the past, I wondered if it were possible to add this functionality to BS. I made a quick search on Google and here is what I found:

A single function, select(soup, selector), that can be used to select items from a BeautifulSoup instance using CSS selector syntax. Currently supports type selectors, class selectors, id selectors, attribute selectors and the descendant combinator.

Just what I needed :) You can also patch BS and integrate this new functionality:

>>> from BeautifulSoup import BeautifulSoup as Soup
>>> import soupselect; soupselect.monkeypatch()
>>> import urllib
>>> soup = Soup(urllib.urlopen(''))
>>> soup.findSelect('div.title h3')
Categories: python Tags: , ,