### Archive

Archive for January, 2014

## PyLaTeX: Python + LaTeX

PyLaTeX is a Python library for creating LaTeX files. The goal of this library is being an easy, but extensible interface between Python and LaTeX.”

I haven’t tried it yet but since I work with LaTeX a lot, it can be interesting in the future. If you need to generate a PDF report for instance, it can be a good way to go.

Categories: python Tags: ,

## Generating pseudo random text using Markov chains

The following entry is based on this post: Generating pseudo random text with Markov chains using Python (by Shabda Raaj).

Problem
I’ve been interested for a long time in generating “random” texts using a given corpus. A naive way is to take words randomly and drop them together but it would result in an unreadable text. The words in the generated text should come in an order that gives the impression that the text is more or less legit :)

Solution
We will use Markov chains to solve this problem. In short, a Markov chain is a stochastic process with the Markov property. By this property the changes of state of the system depend only on the current state of the system, and not additionally on the state of the system at previous steps.

The algorithm for generating pseudo random text is the following:

1. Take two consecutive words from the corpus. We will build a chain of words and the last two words of the chain represent the current state of the Markov chain.
2. Look up in the corpus all the occurrences of the last two words (current state). If they appear more than once, select one of them randomly and add the word that follows them to the end of the chain. Now the current state is updated: it consists of the 2nd word of the former tail of the chain and the new word.
3. Repeat the previous step until you reach the desired length of the generated text.

When reading and splitting up a corpus to words, don’t remove commas, punctuations, etc. This way you can get a more realistic text.

Example
Let’s see this text:

```A is the father of B.
C is the father of A.
```

From this we can build the following dictionary:

```{('A', 'is'): ['the'],
('B.', 'C'): ['is'],
('C', 'is'): ['the'],
('father', 'of'): ['B.', 'A.'],
('is', 'the'): ['father', 'father'],
('of', 'B.'): ['C'],
('the', 'father'): ['of', 'of']}
```

The key is a tuple of two consecutive words. The value is a list of words that follow the two words in the key in the corpus. The value is a multiset, i.e. duplications are allowed. This way, if a word appears several times after the key, it will be selected with a higher probability.

Let’s start the generated sentence with “`A is`“. “`A is`” is followed by “`the`” (“`A is the`“). “`is the`” is followed by “`father`” (“`A is the father`“). “`the father`” is followed by “`of`” (“`A is the father of`“). At “`father of`” we have a choice: let’s pick “`A`” for instance. The end result is: “`A is the father of A.`“.

Python code
This is a basic version of the algorithm. Since the input corpus can be a UTF-8 file, I wrote it in Python 3 to suffer less with Unicode.

```#!/usr/bin/env python3
# encoding: utf-8

import sys
from pprint import pprint
from random import choice

EOS = ['.', '?', '!']

def build_dict(words):
"""
Build a dictionary from the words.

(word1, word2) => [w1, w2, ...]  # key: tuple; value: list
"""
d = {}
for i, word in enumerate(words):
try:
first, second, third = words[i], words[i+1], words[i+2]
except IndexError:
break
key = (first, second)
if key not in d:
d[key] = []
#
d[key].append(third)

return d

def generate_sentence(d):
li = [key for key in d.keys() if key.isupper()]
key = choice(li)

li = []
first, second = key
li.append(first)
li.append(second)
while True:
try:
third = choice(d[key])
except KeyError:
break
li.append(third)
if third[-1] in EOS:
break
# else
key = (second, third)
first, second = key

return ' '.join(li)

def main():
fname = sys.argv
with open(fname, "rt", encoding="utf-8") as f:

words = text.split()
d = build_dict(words)
pprint(d)
print()
sent = generate_sentence(d)
print(sent)
if sent in text:
print('# existing sentence :(')

####################

if __name__ == "__main__":
if len(sys.argv) == 1:
print("Error: provide an input corpus file.")
sys.exit(1)
# else
main()
```

Tips
Try to choose a long corpus to work with.

In our version the current state consists of two words. If you decide to put more words (3 for instance) in the current state, then the text will look less random, but also, it will look less gibberish (see also this gist).

Categories: python

## How automation works in reality

Python is an excellent choice if you want to automate a task. But how does automation actually work? `http://xkcd.com/1319/`

Categories: fun Tags:

## pythonium: a Python to JavaScript translator

Pythonium is a Python 3 to Javascript translator written in Python that produce fast portable JavaScript code.

Example:

```\$ echo "for i in range(10): print(i)" >> loop.py
\$ pythonium -V loop.py
var iterator_i = range(10);
for (var i_iterator_index=0; i_iterator_index < iterator_i.length; i_iterator_index++) {
var i = iterator_i[i_iterator_index];
console.log(i);
}
```

I haven’t tried it yet, so this post is a reminder for me to check it out.

Categories: python Tags:

## What is a BDFL?

A BDFL, a term originally used by Python creator Guido van Rossum, is basically a leader of an open-source project who resolves disputes and has final say on big decisions.” (source)

Categories: python Tags: ,

## Python news in French

I just came across the site http://news.humancoders.com which is a news collector in French. Users can submit and discuss news here. It has a subpage dedicated to Python.

Human Coders News
est un service permettant de partager les meilleures ressources trouvées sur la toile à propos d’un thème précis. Vous pouvez consulter l’ensemble des news sur la page d’accueil, ou bien, cliquer sur un sujet pour filtrer.

Categories: python Tags: ,

## a command line progress bar for your loops

Problem
When I was working on Project Euler, there were several problems that I solved with a brute force approach and thus the runtime was several hours. To see some changes and to know approximately when it finishes, I added a simple counter to the main loop that showed the progress in percentage.

Is there a simpler way to add a progress bar to a loop?

Solution
The project tqdm addresses this problem. Reddit discussion is here.

Usage example:

```import time
from tqdm import tqdm

def main():
for i in tqdm(range(10000)):
time.sleep(.005)
```

Notes:

• you can use `xrange` too: `tqdm(xrange(10000))`
• you can write `trange`: `trange(10000)`

On the github page of the project you will find an animated gif too.

Categories: python