unzip: perform the opposite of zip
zip
>>> a = [1, 2, 3] >>> b = ["one", "two", "three"] >>> zip(a, b) <zip object at 0x7fd30310b508> >>> list(zip(a, b)) [(1, 'one'), (2, 'two'), (3, 'three')]
unzip
How to perform the opposite of zip? That is, we have [(1, 'one'), (2, 'two'), (3, 'three')]
, and we want to get back [1, 2, 3]
and ["one", "two", "three"]
.
>>> li [(1, 'one'), (2, 'two'), (3, 'three')] >>> a, b = zip(*li) >>> a (1, 2, 3) >>> b ('one', 'two', 'three')
Notice that the results are tuples.
More info here.
working with zip files
Problem
In a project of mine I had to deal with folders, where a folder can contain several thousands of small text files. I kept this project on Dropbox, so I could use it on all my machines. However, Dropbox is quite slow when trying to synchronize several thousand files. So I decided to put files in a folder into a zip file.
So the question is: how to deal with zip files? How to do basic operations with them: create zip, delete from zip, list zip, add to zip, move to zip, extract from zip, etc.
Solution
In this project of mine I used the external zip command as well as the zipfile package from the stdlib. Let’s see both of them.
Manipulating zip files from the command-line
Let’s see some examples. Compress every .json
file in the current directory except the desc.json
file:
zip -9 files.zip *.json -x desc.json
The switch “-9
” gives the best compression, files.zip
is the output, and “-x
” is short for “--exclude
“. From Python you can call it as an external command with os.system()
for instance.
The previous example creates a zip file and leaves the original files. Now let’s move files into a zip file (and delete the original files when they were added successfully to the archive):
zip -9 -m files.zip *.json -x desc.json
Delete a file from an archive:
zip -d files.zip desc.json
It will delete desc.json
from the zip file.
List the content of a zip file:
zipinfo files.zip
Add a file to the archive:
zip -g files.zip new.json
Where “-g
” means: grow.
Extract just one file from a zip file:
# basic: unzip files.zip this.json # extract to a specific folder: unzip files.zip this.json -d /extract/here/
It will extract this.json
from the archive.
Read the content of a zip file in Python
OK, say we have a zip file that contains some files. How to get the filenames? How to read them? I found some nice examples here.
List the file names in a zip file:
import zipfile zfile = zipfile.ZipFile("files.zip", "r") for name in zfile.namelist(): print(name)
Read files in a zip file:
import zipfile zfile = zipfile.ZipFile("files.zip", "r") for name in zfile.namelist(): data = zfile.read(name) print(data)
Links
- The zipfile module at effbot.org.
Python equivalent of Java .jar files
Problem
In Java, you can distribute your project in JAR format. It is essentially a ZIP file with some metadata. The project can be launched easily:
$ java -jar project.jar
What is its Python equivalent? How to distribute a Python project (with several modules and packages) in a single file?
Solution
The following is based on this post, written by bheklilr. Thanks for the tip.
Let’s see the following project structure:
MyApp/ MyApp.py <--- Main script alibrary/ __init__.py alibrary.py errors.py anotherlib/ __init__.py another.py errors.py configs/ config.json logging.json
Rename the main script to __main__.py
and compress the project to a zip file. The extension can be .egg
:
myapp.egg/ <--- technically, it's just a zip file __main__.py <--- Renamed from MyApp.py alibrary/ __init__.py alibrary.py errors.py anotherlib/ __init__.py another.py errors.py configs/ config.json logging.json
How to zip it? Enter the project directory (MyApp/
) and use this command:
zip -r ../myapp.egg .
Now you can launch the .egg
file just like you launch a Java .jar
file:
$ python myapp.egg
You can also use command-line arguments that are passed to __main__.py
.
Hamming distance
The Hamming distance is defined between two strings of equal length. It measures the number of positions with mismatching characters.
Example: the Hamming distance between “toned” and “roses” is 3.
#!/usr/bin/env python def hamming_distance(s1, s2): assert len(s1) == len(s2) return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2)) if __name__=="__main__": a = 'toned' b = 'roses' print hamming_distance(a, b) # 3
If you need the number of matching character positions:
#!/usr/bin/env python def similarity(s1, s2): assert len(s1) == len(s2) return sum(ch1 == ch2 for ch1, ch2 in zip(s1, s2)) if __name__=="__main__": a = 'toned' b = 'roses' print similarity(a, b) # 2
Actually this is equal to len(s1) - hamming_distance(s1, s2)
. Remember, len(s1) == len(s2)
.
More info on zip()
here.