Posts Tagged ‘text’
automatic text summarization
November 2, 2015
Leave a comment
See https://github.com/miso-belica/sumy . In the README there is a list of alternative projects.
Is a file binary?
June 17, 2014
Leave a comment
Problem
I want to process all text files in a folder recursively. (Actually, I want to extract all URLs from them). However, their extensions are not necessarily .txt
. How to separate text files from binary files?
Solution
In this thread I found a solution. Here is my slightly modified version:
def is_binary(fname): """ Return true if the given filename is binary. found at http://stackoverflow.com/questions/898669 """ CHUNKSIZE = 1024 with open(fname, 'rb') as f: while True: chunk = f.read(CHUNKSIZE) if '\0' in chunk: # found null byte return True if len(chunk) < CHUNKSIZE: break # done return False
If it finds a '\0'
character, then the file is considered to be binary. Note that it will also classify UTF-16-encoded text files as “binary”.
Reading and writing a file
December 17, 2010
Leave a comment
Here is a mini cheat sheet for reading and writing a text file.
Read a text file line by line and write each line to another file (copy):
f1 = open('./in.txt', 'r') to = open('./out.txt', 'w') for line in f1: to.write(line) f1.close() to.close()
Variations:
text = f.read() # read the entire file line = f.readline() # read one line at a time lineList = f.readlines() # read the entire file as a list of lines