Is a file binary?
I want to process all text files in a folder recursively. (Actually, I want to extract all URLs from them). However, their extensions are not necessarily
.txt. How to separate text files from binary files?
In this thread I found a solution. Here is my slightly modified version:
def is_binary(fname): """ Return true if the given filename is binary. found at http://stackoverflow.com/questions/898669 """ CHUNKSIZE = 1024 with open(fname, 'rb') as f: while True: chunk = f.read(CHUNKSIZE) if '\0' in chunk: # found null byte return True if len(chunk) < CHUNKSIZE: break # done return False
If it finds a
'\0' character, then the file is considered to be binary. Note that it will also classify UTF-16-encoded text files as “binary”.