text | Python Adventures

automatic text summarization

November 2, 2015 Jabba Laci Leave a comment

See https://github.com/miso-belica/sumy . In the README there is a list of alternative projects.

Categories: python Tags: summary, text

Is a file binary?

June 17, 2014 Jabba Laci Leave a comment

Problem
I want to process all text files in a folder recursively. (Actually, I want to extract all URLs from them). However, their extensions are not necessarily .txt. How to separate text files from binary files?

Solution
In this thread I found a solution. Here is my slightly modified version:

def is_binary(fname):
    """
    Return true if the given filename is binary.

    found at http://stackoverflow.com/questions/898669
    """
    CHUNKSIZE = 1024
    with open(fname, 'rb') as f:
        while True:
            chunk = f.read(CHUNKSIZE)
            if '\0' in chunk: # found null byte
                return True
            if len(chunk) < CHUNKSIZE:
                break # done

    return False

If it finds a '\0' character, then the file is considered to be binary. Note that it will also classify UTF-16-encoded text files as “binary”.

Categories: python Tags: binary, text

Reading and writing a file

December 17, 2010 Jabba Laci Leave a comment

Here is a mini cheat sheet for reading and writing a text file.

Read a text file line by line and write each line to another file (copy):

f1 = open('./in.txt',  'r')
to = open('./out.txt', 'w')

for line in f1:
    to.write(line)

f1.close()
to.close()

Variations:

text = f.read()             # read the entire file
line = f.readline()         # read one line at a time
lineList = f.readlines()    # read the entire file as a list of lines

Categories: python Tags: file, line by line, read, text, write

Python Adventures

Archive

automatic text summarization

Is a file binary?

Reading and writing a file

Blog Stats

Random Post

Recent Posts

Archives

Meta