unicode | Python Adventures

Reading (writing) unicode text from (to) files

August 6, 2015 Jabba Laci Leave a comment

Problem
You want to write some special characters to a file (e.g. f.write("voilá")) but you get immediately some unicode error in your face.

Solution
Instead of messing with the encode, decode methods, use the codecs module.

import codecs

# read
with codecs.open(fname, "r", "utf-8") as f:
    text = f.read()

# write
with codecs.open(tmp, "w", "utf-8") as to:
    to.write(text)

As can be seen, its usage is very similar to the well-known open function.

This tip is from here.

Categories: python Tags: codecs, unicode, utf-8

Unicode box-drawing characters

March 20, 2014 Jabba Laci Leave a comment

I wanted to make a logo for my project PrimCom. Since it runs in the command line, I wanted to draw the logo with characters.

You can find a list of box-drawing characters here. I designed the logo on a paper, translated the box characters to normal characters (this way it was easier to type them in), then translated the characters back with a script.

Python source:

#!/usr/bin/env python
# encoding: utf-8

from __future__ import (absolute_import, division,
                        print_function, unicode_literals)

import sys

chars = {
    'a': '┌',
    'b': '┐',
    'c': '┘',
    'd': '└',
    'e': '─',
    'f': '│',
    'g': '┴',
    'h': '├',
    'i': '┬',
    'j': '┤',
    'k': '╷',
    'l': '┼',
}

logo = """
aeeb     aeeb
fabf     faec
fdcheiieejf aeeieeb
faejaljkkff fabfkkf
ff fffffffdejdcffff
dc dcdggggeegeegggc
"""

def main():
    for c in logo:
        if c in chars:
            sys.stdout.write(chars[c])
        else:
            sys.stdout.write(c)

if __name__ == "__main__":
    main()

Output:

┌─────┐           ┌─────┐
│ ┌─┐ │           │ ┌───┘
│ └─┘ ├───┬─┬─────┤ │   ┌─────┬─────┐
│ ┌───┤ ┌─┼─┤ ╷ ╷ │ │   │ ┌─┐ │ ╷ ╷ │
│ │   │ │ │ │ │ │ │ └───┤ └─┘ │ │ │ │
└─┘   └─┘ └─┴─┴─┴─┴─────┴─────┴─┴─┴─┘

Update (20140322)
If you want rainbow colors, check out the colout project.

Categories: python Tags: box-drawing, logo, rainbow, unicode

monkeypatching the string type

January 8, 2014 Jabba Laci Leave a comment

Problem
“A monkey patch is a way to extend or modify the run-time code of dynamic languages without altering the original source code.” (via wikipedia) That is, we have the standard library, and we want to add new features to it. For instance, in the stdlib a string cannot tell whether it is a palindrome or not, but we would like to extend the string type to support this feature:

>>> s = "racecar"
>>> print(s.is_palindrome())    # Warning! It won't work.
True

Is it possible in Python?

Solution
As pointed out in this thread, built-in types are implemented in C and you cannot modify them in runtime. As I heard Ruby allows this, but it doesn’t work in Python.

However, there is a workaround if you really want to do something like this. You can make a subclass of the built-in type and then you can extend it as you want. Example:

from __future__ import (absolute_import, division,
                        print_function, unicode_literals)

class MyStr(unicode):
    """
    "monkeypatching" the unicode class

    It's not real monkeypatching, just a workaround.
    """ 
    def is_palindrome(self):
        return self == self[::-1]

def main():
    s = MyStr("radar")
    print(s.is_palindrome())

####################

if __name__ == "__main__":
    main()

Categories: python Tags: monkeypatch, palindrome, unicode

Writing non-ASCII text to file

December 2, 2012 Jabba Laci 2 comments

Problem
You download the source of an HTML page in a string and you want to save it in a file. However, you get some UnicodeDecodeError :(

Solution

foo = u'Δ, Й, ק, ‎ م, ๗, あ, 叶, 葉, and 말.'
f = open('test', 'w')
f.write(foo.encode('utf8'))
f.close()

Here is how to read it back:

f = open('test', 'r')
print f.read().decode('utf8')

This tip is from here.

Categories: python Tags: read unicode text from file, unicode, write unicode text to file

‘ascii’ codec can’t encode character: ordinal not in range(128)

March 29, 2012 Jabba Laci 6 comments

Problem

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 1: ordinal not in range(128)

Solution

def encode(text):
    """
    For printing unicode characters to the console.
    """
    return text.encode('utf-8')

Or:

reload(sys)
sys.setdefaultencoding("latin-1")

a = u'\xe1'
print str(a) # no exception

This tip is from here.

Categories: python Tags: ascii, unicode, UnicodeEncodeError

unicode to ascii

December 17, 2010 Jabba Laci Leave a comment

Problem

I had the following unicode string: “Kellemes Ünnepeket!” that I wanted to simplify to this: “Kellemes Unnepeket!”, that is strip “Ü” to “U”. Furthermore, most of the strings were normal ascii, only some of them were in unicode.

Solution

import unicodedata

title = ...   # get the string somehow
try:
    # if the title is a unicode string, normalize it
    title = unicodedata.normalize('NFKD', title).encode('ascii','ignore')
except TypeError:
    # if it was not a unicode string => OK, do nothing
    pass

Credits

I used the following resources:

Categories: python Tags: accent, ascii, convert, unicode

Python Adventures

Archive

Reading (writing) unicode text from (to) files

Unicode box-drawing characters

monkeypatching the string type

Writing non-ASCII text to file

‘ascii’ codec can’t encode character: ordinal not in range(128)

unicode to ascii

Blog Stats

Random Post

Recent Posts

Archives

Meta