Archive

Archive for September, 2012

BeautifulSoup: _detectEncoding error

September 30, 2012 Leave a comment

Problem
While parsing an HTML page with BeautifulSoup, I got a similar error message:

File ".../BeautifulSoup.py", line 1915, in _detectEncoding
    '^<\?.*encoding=[\'"](.*?)[\'"].*\?>').match(xml_data)
TypeError: expected string or buffer

In the code I had this:

text = get_page(url)
soup = BeautifulSoup(text)

Solution

text = get_page(url)
text = str(text)    # here is the trick
soup = BeautifulSoup(text)

Tip from here.

Categories: python Tags: ,

Check Gmail for new messages

September 8, 2012 Leave a comment

Problem
I want to check my new Gmail messages periodically. When I get a message from a specific sender (with a specific Subject), I want to trigger some action. How to do that?

Solution
Fortunately, there is an atom feed of unread Gmail messages at https://mail.google.com/mail/feed/atom. All you have to do it is visit this page, send your login credentials, fetch the feed and process it.

import urllib2

FEED_URL = 'https://mail.google.com/mail/feed/atom'

def get_unread_msgs(user, passwd):
    auth_handler = urllib2.HTTPBasicAuthHandler()
    auth_handler.add_password(
        realm='New mail feed',
        uri='https://mail.google.com',
        user='{user}@gmail.com'.format(user=user),
        passwd=passwd
    )
    opener = urllib2.build_opener(auth_handler)
    urllib2.install_opener(opener)
    feed = urllib2.urlopen(FEED_URL)
    return feed.read()

##########

if __name__ == "__main__":
    import getpass

    user = raw_input('Username: ')
    passwd = getpass.getpass('Password: ')
    print get_unread_msgs(user, passwd)

For reading XML I use the untangle module:

import untangle    # sudo pip install untangle

xml = get_unread_msgs(USER, PASSWORD)
o = untangle.parse(xml)
try:
    for e in o.feed.entry:
        title = e.title.cdata
        print title
except IndexError:
    pass    # no new mail

Links

Categories: python Tags: , , , ,

Obfuscated Python

September 7, 2012 Leave a comment
Categories: python Tags:

The best free Python resources

September 2, 2012 Leave a comment
Categories: python Tags: ,

Print unicode text to the terminal

September 2, 2012 2 comments

Problem
I wrote a script in Eclipse-PyDev that prints some text with accented characters to the standard output. It runs fine in the IDE but it breaks in the console:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 11: ordinal not in range(128)

This thing bugged me for a long time but now I found a working solution.

Solution
Insert the following in your source code:

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

I found this trick here. “This allows you to switch from the default ASCII to other encodings such as UTF-8, which the Python runtime will use whenever it has to decode a string buffer to unicode.”

Related

Categories: python Tags: , , , ,