Archive

Posts Tagged ‘zip’

working with zip files

August 21, 2015 Leave a comment

Problem

In a project of mine I had to deal with folders, where a folder can contain several thousands of small text files. I kept this project on Dropbox, so I could use it on all my machines. However, Dropbox is quite slow when trying to synchronize several thousand files. So I decided to put files in a folder into a zip file.

So the question is: how to deal with zip files? How to do basic operations with them: create zip, delete from zip, list zip, add to zip, move to zip, extract from zip, etc.

Solution

In this project of mine I used the external zip command as well as the zipfile package from the stdlib. Let’s see both of them.

Manipulating zip files from the command-line
Let’s see some examples. Compress every .json file in the current directory except the desc.json file:

zip -9 files.zip *.json -x desc.json

The switch “-9” gives the best compression, files.zip is the output, and “-x” is short for “--exclude“. From Python you can call it as an external command with os.system() for instance.

The previous example creates a zip file and leaves the original files. Now let’s move files into a zip file (and delete the original files when they were added successfully to the archive):

zip -9 -m files.zip *.json -x desc.json

Delete a file from an archive:

zip -d files.zip desc.json

It will delete desc.json from the zip file.

List the content of a zip file:

zipinfo files.zip

Add a file to the archive:

zip -g files.zip new.json

Where “-g” means: grow.

Extract just one file from a zip file:

# basic:
unzip files.zip this.json

# extract to a specific folder:
unzip files.zip this.json -d /extract/here/

It will extract this.json from the archive.

Read the content of a zip file in Python
OK, say we have a zip file that contains some files. How to get the filenames? How to read them? I found some nice examples here.

List the file names in a zip file:

import zipfile

zfile = zipfile.ZipFile("files.zip", "r")

for name in zfile.namelist():
    print(name)

Read files in a zip file:

import zipfile

zfile = zipfile.ZipFile("files.zip", "r")

for name in zfile.namelist():
    data = zfile.read(name)
    print(data)

Links

Advertisements
Categories: bash, python Tags: ,

Python equivalent of Java .jar files

January 5, 2014 Leave a comment

Problem
In Java, you can distribute your project in JAR format. It is essentially a ZIP file with some metadata. The project can be launched easily:

$ java -jar project.jar

What is its Python equivalent? How to distribute a Python project (with several modules and packages) in a single file?

Solution
The following is based on this post, written by bheklilr. Thanks for the tip.

Let’s see the following project structure:

MyApp/
    MyApp.py          <--- Main script
    alibrary/
        __init__.py
        alibrary.py
        errors.py
    anotherlib/
        __init__.py
        another.py
        errors.py
    configs/
        config.json
        logging.json

Rename the main script to __main__.py and compress the project to a zip file. The extension can be .egg:

myapp.egg/             <--- technically, it's just a zip file
    __main__.py        <--- Renamed from MyApp.py
    alibrary/
        __init__.py
        alibrary.py
        errors.py
    anotherlib/
        __init__.py
        another.py
        errors.py
    configs/
        config.json
        logging.json

How to zip it? Enter the project directory (MyApp/) and use this command:

zip -r ../myapp.egg .

Now you can launch the .egg file just like you launch a Java .jar file:

$ python myapp.egg

You can also use command-line arguments that are passed to __main__.py.

Categories: python Tags: , , , ,

Hamming distance

October 19, 2010 Leave a comment

The Hamming distance is defined between two strings of equal length. It measures the number of positions with mismatching characters.

Example: the Hamming distance between “toned” and “roses” is 3.

#!/usr/bin/env python

def hamming_distance(s1, s2):
    assert len(s1) == len(s2)
    return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))

if __name__=="__main__":
    a = 'toned'
    b = 'roses'
    print hamming_distance(a, b)   # 3

If you need the number of matching character positions:

#!/usr/bin/env python

def similarity(s1, s2):
    assert len(s1) == len(s2)
    return sum(ch1 == ch2 for ch1, ch2 in zip(s1, s2))

if __name__=="__main__":
    a = 'toned'
    b = 'roses'
    print similarity(a, b)    # 2

Actually this is equal to len(s1) - hamming_distance(s1, s2). Remember, len(s1) == len(s2).

More info on zip() here.

Categories: python Tags: , , ,