Home > python > Writing non-ASCII text to file

Writing non-ASCII text to file

You download the source of an HTML page in a string and you want to save it in a file. However, you get some UnicodeDecodeError :(


foo = u'Δ, Й, ק, ‎ م, ๗, あ, 叶, 葉, and 말.'
f = open('test', 'w')

Here is how to read it back:

f = open('test', 'r')
print f.read().decode('utf8')

This tip is from here.

  1. fulibacsi
    December 2, 2012 at 22:16

    Don’t forget that if you concatenate two strings (even if both were in utf-8 encoding), the result will be in ascii! So encode the concatenated string:

    f.write((u'unicode string 1' + u'unicode string 2').encode('utf8'))

    I had to learn this lesson on my own. It took an hour to figure out what went wrong…

    • December 2, 2012 at 22:22

      Thanks Fuli! Yeah, Unicode kind of sucks in Python 2…

  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: