Posts Tagged ‘tab’

How to make a python, command-line program autocomplete arbitrary things

June 4, 2013 2 comments

This entry is based on this SO post.

You have an interactive command-line Python script and you want to add autocompletion to it when hitting the TAB key.

Here is a working example (taken from here):

import readline

addrs = ['', '', '']

def completer(text, state):
    options = [x for x in addrs if x.startswith(text)]
        return options[state]
    except IndexError:
        return None

readline.parse_and_bind("tab: complete")

while True:
    inp = raw_input("> ")
    print "You entered", inp
Categories: python Tags: , ,

Extract all links from a web page

March 10, 2011 21 comments


You want to extract all the links from a web page. You need the links in absolute path format since you want to further process the extracted links.


Unix commands have a very nice philosophy: “do one thing and do it well”. Keeping that in mind, here is my link extractor:

#!/usr/bin/env python


import re
import sys
import urllib
import urlparse
from BeautifulSoup import BeautifulSoup

class MyOpener(urllib.FancyURLopener):
    version = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv: Gecko/20110303 Firefox/3.6.15'

def process(url):
    myopener = MyOpener()
    #page = urllib.urlopen(url)
    page =

    text =

    soup = BeautifulSoup(text)

    for tag in soup.findAll('a', href=True):
        tag['href'] = urlparse.urljoin(url, tag['href'])
        print tag['href']
# process(url)

def main():
    if len(sys.argv) == 1:
        print "Jabba's Link Extractor v0.1"
        print "Usage: %s URL [URL]..." % sys.argv[0]
    # else, if at least one parameter was passed
    for url in sys.argv[1:]:
# main()

if __name__ == "__main__":

You can find the up-to-date version of the script here.

The script will print the links to the standard output. The output can be refined with grep for instance.


The HTML parsing is done with the BeautifulSoup (BS) library. If you get an error, i.e. BeautifulSoup cannot parse a tricky page, download the latest version of BS and put in the same directory where is located. I had a problem with the version that came with Ubuntu 10.10 but I could solve the problem by upgrading to the latest version of BeautifulSoup.
Update (20110414): To update BS, first remove the package python-beautifulsoup with Synaptic, then install the latest version from PyPI: sudo pip install beautifulsoup.


Basic usage: get all links on a given page.


Basic usage: get all links from an HTML file. Yes, it also works on local files.

./ index.html

Number of links.

./ | wc -l

Filter result and keep only those links that you are interested in.

./ | grep -i jpg

Eliminate duplicates.

./ | sort | uniq

Note: if the URL contains the special character “&“, then put the URL between quotes.

./ ""

Open (some) extracted links in your web browser. Here I use the script “” that I introduced in this post. You can also download “open_in_tabs.pyhere.

./ | grep -i jpg | sort | uniq | ./

Update (20110507)

You might be interested in another script called “” that extracts all image links from a webpage. Available here.

Categories: python Tags: , , , ,

Check downloaded movies on

February 27, 2011 3 comments

Recently, I downloaded a nice pack of horror movies. The pack contained more than a hundred movies :) I wanted to see their IMDB ratings to decide which ones to watch, but typing their titles in the browser would be too much work. Could it be automated?


Each movie was located in a subdirectory. Here is an extract:


Fortunately, the directories were named in a consistent way: title of the movie (words separated with a dot), year, extra info. Thus, extracting titles was very easy. Idea: collect the titles in a list and open them in Firefox on, each in a new tab.

First, I redirected the directory list in a file. It was easier to work with a text file than doing globbing:

ls >a.txt

And finally, here is the script:

#!/usr/bin/env python

import re
import urllib
import webbrowser

base = ''
firefox = webbrowser.get('firefox')

f1 = open('a.txt', 'r')

for line in f1:
    line = line.rstrip('\n')
    if line.startswith('#'):

    # else
    result ='(.*)\.\d{4}\..*', line)
    if result:
        address ='.', ' ')
        url = "%s&q=%s" % ( base, urllib.quote(address) )
        print url
        #webbrowser.open_new_tab(url)    # try this if the line above doesn't work


Achtung! Don’t try it with a huge list, otherwise your system will die :) Firefox won’t handle too many open tabs… Try to open around ten titles at a time. In the input file (a.txt) you can comment lines by adding a leading ‘#‘ sign, thus those lines will be discarded by the script.

Categories: python Tags: , , , ,