Archive

Posts Tagged ‘unicode’

Reading (writing) unicode text from (to) files

August 6, 2015 Leave a comment

Problem
You want to write some special characters to a file (e.g. f.write("voilá")) but you get immediately some unicode error in your face.

Solution
Instead of messing with the encode, decode methods, use the codecs module.

import codecs

# read
with codecs.open(fname, "r", "utf-8") as f:
    text = f.read()

# write
with codecs.open(tmp, "w", "utf-8") as to:
    to.write(text)

As can be seen, its usage is very similar to the well-known open function.

This tip is from here.

Categories: python Tags: , ,

monkeypatching the string type

January 8, 2014 Leave a comment

Problem
A monkey patch is a way to extend or modify the run-time code of dynamic languages without altering the original source code.” (via wikipedia) That is, we have the standard library, and we want to add new features to it. For instance, in the stdlib a string cannot tell whether it is a palindrome or not, but we would like to extend the string type to support this feature:

>>> s = "racecar"
>>> print(s.is_palindrome())    # Warning! It won't work.
True

Is it possible in Python?

Solution
As pointed out in this thread, built-in types are implemented in C and you cannot modify them in runtime. As I heard Ruby allows this, but it doesn’t work in Python.

However, there is a workaround if you really want to do something like this. You can make a subclass of the built-in type and then you can extend it as you want. Example:

from __future__ import (absolute_import, division,
                        print_function, unicode_literals)

class MyStr(unicode):
    """
    "monkeypatching" the unicode class

    It's not real monkeypatching, just a workaround.
    """ 
    def is_palindrome(self):
        return self == self[::-1]

def main():
    s = MyStr("radar")
    print(s.is_palindrome())

####################

if __name__ == "__main__":
    main()
Categories: python Tags: , ,

Writing non-ASCII text to file

December 2, 2012 2 comments

Problem
You download the source of an HTML page in a string and you want to save it in a file. However, you get some UnicodeDecodeError :(

Solution

foo = u'Δ, Й, ק, ‎ م, ๗, あ, 叶, 葉, and 말.'
f = open('test', 'w')
f.write(foo.encode('utf8'))
f.close()

Here is how to read it back:

f = open('test', 'r')
print f.read().decode('utf8')

This tip is from here.

‘ascii’ codec can’t encode character: ordinal not in range(128)

March 29, 2012 6 comments

Problem

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 1: ordinal not in range(128)

Solution

def encode(text):
    """
    For printing unicode characters to the console.
    """
    return text.encode('utf-8')

Or:

reload(sys)
sys.setdefaultencoding("latin-1")

a = u'\xe1'
print str(a) # no exception

This tip is from here.

Categories: python Tags: , ,

unicode to ascii

December 17, 2010 Leave a comment

Problem

I had the following unicode string: “Kellemes Ünnepeket!” that I wanted to simplify to this: “Kellemes Unnepeket!”, that is strip “Ü” to “U”. Furthermore, most of the strings were normal ascii, only some of them were in unicode.

Solution

import unicodedata

title = ...   # get the string somehow
try:
    # if the title is a unicode string, normalize it
    title = unicodedata.normalize('NFKD', title).encode('ascii','ignore')
except TypeError:
    # if it was not a unicode string => OK, do nothing
    pass

Credits

I used the following resources:

Categories: python Tags: , , ,