Home > python > Is a file binary?

Is a file binary?

I want to process all text files in a folder recursively. (Actually, I want to extract all URLs from them). However, their extensions are not necessarily .txt. How to separate text files from binary files?

In this thread I found a solution. Here is my slightly modified version:

def is_binary(fname):
    Return true if the given filename is binary.

    found at http://stackoverflow.com/questions/898669
    CHUNKSIZE = 1024
    with open(fname, 'rb') as f:
        while True:
            chunk = f.read(CHUNKSIZE)
            if '\0' in chunk: # found null byte
                return True
            if len(chunk) < CHUNKSIZE:
                break # done

    return False

If it finds a '\0' character, then the file is considered to be binary. Note that it will also classify UTF-16-encoded text files as “binary”.

Categories: python Tags: ,
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: