Archive

Archive for April, 2011

PyCon Italia

April 12, 2011 Leave a comment

I was looking for some info about scraping techniques when I found this presentation: http://www.pycon.it/media/stuff/slides/tecniche-di-scraping-in-python.pdf.

If you visit the main page at http://www.pycon.it/, you can get to the summary of all the presentations since 2007. In some cases slides and videos are also available.

More conferences:

See http://www.pycon.org/ for an up-to-date list.

Download genomes from Genbank

April 12, 2011 1 comment

Problem

For a project, I had to download a bunch of records from the NCBI (National Center for Biotechnology Information) website. A record looks like this: CP002059.1 (almost 5 MB):

LOCUS       CP002059             5354700 bp    DNA ...
DEFINITION  'Nostoc azollae' 0708, complete genome.
ACCESSION   CP002059 ACIR01000000 ACIR01000001-ACIR01000216
VERSION     CP002059.1  GI:298231532
DBLINK      Project: 30807
...
ORIGIN
//

I needed this data in text format.

Solution #1
My first idea was to download the page with wget. However, I was surprised to see that the downloaded file was less than 100 KB instead of 5 MB! When I looked at the source, it turned out that it’s full of AJAX calls. That is, the browser downloads this short HTML and then it is expanded. If you save the page with File -> Save as…, you have the complete HTML but how to automate the download process? How to get the post-AJAX version of a web page?

I will write about this problem and its general solution in another post.

Solution #2
Fortunately, there is a CGI program at NCBI that can return us the required data. For instance, the data of CP002059.1 can be retrieved via the following URL:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=CP002059.1&rettype=gb

A (very) short overview of the EFetch CGI is here.

If you use Biopython, you can download this record like this:

from Bio import Entrez

# ref.: http://wilke.openwetware.org/Parsing_Genbank_files_with_Biopython.html

# replace with your real email (optional):
Entrez.email = 'whatever@mail.com'
# accession id works, returns genbank format, looks in the 'nucleotide' database:
handle=Entrez.efetch(db='nucleotide',id='CP002059.1',rettype='gb')
# store locally:
local_file=open('CP002059.1.gb', 'w')
local_file.write(handle.read())
handle.close()
local_file.close()

Solution #3 (in Perl)
Let’s see the same thing in Perl too, using the BioPerl package. Thanks Alix for the Perl code.

#!/usr/bin/perl

use Bio::Perl;
#use Bio::Seq;
#use Bio::Tools::Run::RemoteBlast;
use Bio::DB::GenBank;
#use Data::Dumper;

use strict;

my $gb = new Bio::DB::GenBank;

my $id = 'CP002059.1';

my $seq = $gb->get_Stream_by_acc($id);
while( my $seq_elt =  $seq->next_seq ) {
    write_sequence(">$id.gb", 'genbank', $seq_elt);
}

Update (20110706)
I forgot to mention how to install Biopython:

sudo pip install biopython

GUI for the output of PyLint

April 7, 2011 Leave a comment

Problem

I discovered PyLint yesterday and after some tests I find it very useful. However, one thing bothered me in the workflow. PyLint tells you where (in which lines) you should improve your code but if you add/remove some lines in the source, these line numbers become invalid. Thus, you need to relaunch pylint lots of times until you resolve all the problems.

Solution

Idea: make a simple GUI that shows the output of PyLint. If necessary, refresh the content of this window.

Download:

Visit https://github.com/jabbalaci/PyLint-Output-Visualizer. Source code is here.

Usage:

pylov.py <source_to_be_analyzed.py>

You can refresh the content by pressing ‘r’, ‘u’, or F5.

Update (20110408)
This morning I was notified that PyLint has a simple GUI that is shipped with it; it’s called pylint-gui :)) Great! Why is it nowhere mentioned on the project’s home page? I’ve read several reviews too, nobody says it has a GUI… Now I searched for the string “gui” in the manual and yes, they mention it in two lines, but no screenshot! Either you read it word by word or you miss it. To fill the gap, here is my screenshot of the mysterious pylint-gui:

Well, if you prefer minimal design, you can try Pylov :) Otherwise use the official GUI.

Update (20110426)
I made Pylov because the PyLint plugin of the Eric IDE didn’t have the refresh option. I contacted the author of Eric and he added this feature :) So if you use Eric, it is recommended to use the PyLint plugin.

[ @reddit ]

Categories: python Tags: , , , , ,

Global Module Index for Python

April 7, 2011 Leave a comment

List of available modules: http://docs.python.org/modindex.html.

Categories: python Tags: ,

Index of Python Enhancement Proposals (PEPs)

April 6, 2011 Leave a comment

See http://www.python.org/dev/peps/ for the full list.

An extract:

 Meta-PEPs (PEPs about PEPs or Processes)

 P     1  PEP Purpose and Guidelines
 P     2  Procedure for Adding New Modules
 P     4  Deprecation of Standard Modules
 P     5  Guidelines for Language Evolution
 P     6  Bug Fix Releases
 P     7  Style Guide for C Code
 P     8  Style Guide for Python Code
 P     9  Sample Plaintext PEP Template
 P    10  Voting Guidelines
 P    11  Removing support for little used platforms
 P    12  Sample reStructuredText PEP Template
 P   374  Choosing a distributed VCS for the Python project
 P   385  Migrating from Subversion to Mercurial
 P   387  Backwards Compatibility Policy
 P  3000  Python 3000
 P  3002  Procedure for Backwards-Incompatible Changes
 P  3003  Python Language Moratorium
 P  3099  Things that will Not Change in Python 3000

Categories: python Tags:

Write better code with the help of Pylint

April 6, 2011 Leave a comment

Pylint analyzes Python source code looking for bugs and signs of poor quality. Pylint is a Python tool that checks if a module satisfies a coding standard. Pylint is similar to PyChecker but offers more features, like checking line-code’s length, checking if variable names are well-formed according to your coding standard, or checking if declared interfaces are truly implemented, and much more (see the complete check list). …
Pylint is shipped with Pyreverse which creates UML diagrams for python code.” (source)

With the help of Pylint, you can refactor your code so that it better satisfies coding standards. Its usage is dead simple:

pylint hello.py

For more details, see the official tutorial.

Installation:

sudo pip install pylint

Notes:

I’m using the Eric IDE for Python. I was very happy when I discovered that it has a PyLint plugin! You can install the plugin from Eric.

Related

  • lint (lint is the original static code analyzer for C)

Update (20120915)

If you want to see the errors only, call pylint like this:

pylint -E hello.py
Categories: python Tags: , , ,

Python on reddit

April 4, 2011 Leave a comment

I didn’t use reddit (wikipedia article) in the past but yesterday I visited its Python community and I found some really interesting links! So I think from now on I will check reddit regularly :)

Python-related communities on reddit:

For a detailed list of subreddits, visit http://www.reddit.com/reddits/. Help, FAQ.

See this post for some notes about reddit.

Categories: python Tags: