Archive

Posts Tagged ‘utf-8’

convert a file to an UTF-8-encoded text

December 16, 2017 Leave a comment

I wrote a simple script that takes an input file, changes its character encoding to UTF-8, and prints the result to the screen.

It’s actually a wrapper around the Unix commands “file” and “iconv“. The goal was to make its usage as simple as possible. The script is here: to_utf8.py.

Usage:

$ to_utf8.py input.txt

The program tries to detect the encoding of the input file.

Links

Categories: bash, python Tags: ,

Reading (writing) unicode text from (to) files

August 6, 2015 Leave a comment

Problem
You want to write some special characters to a file (e.g. f.write("voilá")) but you get immediately some unicode error in your face.

Solution
Instead of messing with the encode, decode methods, use the codecs module.

import codecs

# read
with codecs.open(fname, "r", "utf-8") as f:
    text = f.read()

# write
with codecs.open(tmp, "w", "utf-8") as to:
    to.write(text)

As can be seen, its usage is very similar to the well-known open function.

This tip is from here.

Categories: python Tags: , ,

Print unicode text to the terminal

September 2, 2012 2 comments

Problem
I wrote a script in Eclipse-PyDev that prints some text with accented characters to the standard output. It runs fine in the IDE but it breaks in the console:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 11: ordinal not in range(128)

This thing bugged me for a long time but now I found a working solution.

Solution
Insert the following in your source code:

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

I found this trick here. “This allows you to switch from the default ASCII to other encodings such as UTF-8, which the Python runtime will use whenever it has to decode a string buffer to unicode.”

Related

Categories: python Tags: , , , ,