Archive

Archive for January, 2014

MoviePy: script-based movie editing

January 25, 2014 Leave a comment

I haven’t tried it yet but it looks awesome.

MoviePy is a Python module for script-based movie editing, which enables basic operations (cuts, concatenations, title insertions) to be done in a few lines. It can also be used for advanced compositing and special effects.”

Example: putting some clips together:

import os
from moviepy.editor import *
files = sorted( os.listdir("clips/") )
clips = [ VideoFileClip('clips/%s'%f) for f in files]
video = concatenate(clips, transition = VideoFileClip("logo.avi"))
video.to_videofile("demos.avi",fps=25, codec="mpeg4")

The author of MoviePy shows how to manipulate GIF files with MoviePy: http://zulko.github.io/blog/2014/01/23/making-animated-gifs-from-video-files-with-python/.

Categories: python Tags: ,

jpegtran-cffi: fast JPEG transformations

January 25, 2014 Leave a comment

I haven’t tried it yet but it seems perfect for creating thumbnails for instance for a collection of images.

jpegtran-cffi has a very intuitive interface. Examples:

from jpegtran import JPEGImage
img = JPEGImage('image.jpg')

# Dimensions
print img.width, img.height  # "640 480"

# Transforming the image
img.scale(320, 240).save('scaled.jpg')
img.rotate(90).save('rotated.jpg')
img.crop(0, 0, 100, 100).save('cropped.jpg')

# Transformations can be chained
data = (img.scale(320, 240)
            .rotate(90)
            .flip('horizontal')
            .as_blob())

It looks nice, worth checking out.

Fabric

January 25, 2014 Leave a comment

I haven’t used Fabric yet, but I heard a lot about it. This post is a reminder for me to check it out once.

Fabric is a Python library and command-line tool for streamlining the use of SSH for application deployment or systems administration tasks.

It provides a basic suite of operations for executing local or remote shell commands (normally or via sudo) and uploading/downloading files, as well as auxiliary functionality such as prompting the running user for input, or aborting execution.”

See this post for a concrete example: Deploying Python Apps with Fabric.

Another link with examples: Systems Administration with Fabric.

Related projects

Categories: python Tags: ,

pip-tools

January 25, 2014 Leave a comment

“A set of two command line tools (pip-review + pip-dump) to help you keep your pip-based packages fresh, even when you’ve pinned them.

pip-review checks PyPI and reports available updates. It uses the list of currently installed packages to check for updates, it does not use any requirements.txt.

pip-dump dumps the exact versions of installed packages in your active environment to your requirements.txt file.”

I haven’t used it yet, so this post a reminder for me. I think I will need it soon.

Categories: python Tags: ,

faker: generate fake data

January 25, 2014 Leave a comment

Faker is a Python package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in your persistence to stress test it, or anonymize data taken from a production service, Faker is for you.”

Discussion @reddit.

Faker is a hot project at the moment of writing with lots of new ideas and reported issues. If you want to contribute, do it now :)

Categories: python Tags: ,

PyCon 2014 talk schedule

January 25, 2014 Leave a comment

PyCon 2014 talk schedule is online: https://us.pycon.org/2014/schedule/talks/.

Each talk has an abstract, so you can peek into the future to see the hot topics in 2014.


PyCon 2014 will be held in Montreal this year where I spent 4 whole years between January 2008 and October 2011. I wish I could attend PyCon this year :)

Categories: python Tags: ,

News extraction

January 25, 2014 Leave a comment

With newspaper, you can do “news extraction, article extraction and content curation in python. Built with multithreading, 10+ languages, NLP, ML, and more!”

I haven’t tried it yet but if you need a corpus with news articles, this project can help.

Categories: python Tags: ,

PyLaTeX: Python + LaTeX

January 25, 2014 Leave a comment

PyLaTeX is a Python library for creating LaTeX files. The goal of this library is being an easy, but extensible interface between Python and LaTeX.”

I haven’t tried it yet but since I work with LaTeX a lot, it can be interesting in the future. If you need to generate a PDF report for instance, it can be a good way to go.

Categories: python Tags: ,

Generating pseudo random text using Markov chains

January 23, 2014 Leave a comment

The following entry is based on this post: Generating pseudo random text with Markov chains using Python (by Shabda Raaj).

Problem
I’ve been interested for a long time in generating “random” texts using a given corpus. A naive way is to take words randomly and drop them together but it would result in an unreadable text. The words in the generated text should come in an order that gives the impression that the text is more or less legit :)

Solution
We will use Markov chains to solve this problem. In short, a Markov chain is a stochastic process with the Markov property. By this property the changes of state of the system depend only on the current state of the system, and not additionally on the state of the system at previous steps.

The algorithm for generating pseudo random text is the following:

  1. Take two consecutive words from the corpus. We will build a chain of words and the last two words of the chain represent the current state of the Markov chain.
  2. Look up in the corpus all the occurrences of the last two words (current state). If they appear more than once, select one of them randomly and add the word that follows them to the end of the chain. Now the current state is updated: it consists of the 2nd word of the former tail of the chain and the new word.
  3. Repeat the previous step until you reach the desired length of the generated text.

When reading and splitting up a corpus to words, don’t remove commas, punctuations, etc. This way you can get a more realistic text.

Example
Let’s see this text:

A is the father of B.
C is the father of A.

From this we can build the following dictionary:

{('A', 'is'): ['the'],
 ('B.', 'C'): ['is'],
 ('C', 'is'): ['the'],
 ('father', 'of'): ['B.', 'A.'],
 ('is', 'the'): ['father', 'father'],
 ('of', 'B.'): ['C'],
 ('the', 'father'): ['of', 'of']}

The key is a tuple of two consecutive words. The value is a list of words that follow the two words in the key in the corpus. The value is a multiset, i.e. duplications are allowed. This way, if a word appears several times after the key, it will be selected with a higher probability.

Let’s start the generated sentence with “A is“. “A is” is followed by “the” (“A is the“). “is the” is followed by “father” (“A is the father“). “the father” is followed by “of” (“A is the father of“). At “father of” we have a choice: let’s pick “A” for instance. The end result is: “A is the father of A.“.

Python code
This is a basic version of the algorithm. Since the input corpus can be a UTF-8 file, I wrote it in Python 3 to suffer less with Unicode.

#!/usr/bin/env python3
# encoding: utf-8

import sys
from pprint import pprint
from random import choice

EOS = ['.', '?', '!']


def build_dict(words):
    """
    Build a dictionary from the words.

    (word1, word2) => [w1, w2, ...]  # key: tuple; value: list
    """
    d = {}
    for i, word in enumerate(words):
        try:
            first, second, third = words[i], words[i+1], words[i+2]
        except IndexError:
            break
        key = (first, second)
        if key not in d:
            d[key] = []
        #
        d[key].append(third)

    return d


def generate_sentence(d):
    li = [key for key in d.keys() if key[0][0].isupper()]
    key = choice(li)

    li = []
    first, second = key
    li.append(first)
    li.append(second)
    while True:
        try:
            third = choice(d[key])
        except KeyError:
            break
        li.append(third)
        if third[-1] in EOS:
            break
        # else
        key = (second, third)
        first, second = key

    return ' '.join(li)


def main():
    fname = sys.argv[1]
    with open(fname, "rt", encoding="utf-8") as f:
        text = f.read()

    words = text.split()
    d = build_dict(words)
    pprint(d)
    print()
    sent = generate_sentence(d)
    print(sent)
    if sent in text:
        print('# existing sentence :(')

####################

if __name__ == "__main__":
    if len(sys.argv) == 1:
        print("Error: provide an input corpus file.")
        sys.exit(1)
    # else
    main()

Tips
Try to choose a long corpus to work with.

In our version the current state consists of two words. If you decide to put more words (3 for instance) in the current state, then the text will look less random, but also, it will look less gibberish (see also this gist).

Links

How automation works in reality

January 20, 2014 Leave a comment

Python is an excellent choice if you want to automate a task. But how does automation actually work?


http://xkcd.com/1319/

Categories: fun Tags: