Archive

Archive for April, 2011

Write XML to file

April 4, 2011 2 comments

Problem

I wanted to create an XML file. The file was simple but I wanted to avoid producing it with “print” commands. Which API should be used for this purpose? The produced XML should be human readable, i.e. pretty printed (indented).

Solution

This post is based on the thread Best XML writing tool for Python.

(1) elementtree.SimpleXMLWriter (no indenting)

The SimpleXMLWriter module contains a simple helper class for applications that need to generate well-formed XML data. The interface is very simple:

#!/usr/bin/env python

from elementtree.SimpleXMLWriter import XMLWriter
import sys

w = XMLWriter(sys.stdout)
html = w.start("html")

w.start("head")
w.element("title", "my document")
w.element("meta", name="generator", value="my application 1.0")
w.end()

w.start("body")
w.element("h1", "this is a heading")
w.element("p", "this is a paragraph")

w.start("p")
w.data("this is ")
w.element("b", "bold")
w.data(" and ")
w.element("i", "italic")
w.data(".")
w.end("p")

w.close(html)

However, the output is not indented and as I saw, this feature is missing :( Here is the output of the code above:

<html><head><title>my document</title><meta name="generator" value="my application 1.0" /></head><body><h1>this is a heading</h1><p>this is a paragraph</p><p>this is <b>bold</b> and <i>italic</i>.</p></body></html>

If we prettify it, it will look like this:

<?xml version="1.0"?>
<html>
  <head>
    <title>my document</title>
    <meta name="generator" value="my application 1.0"/>
  </head>
  <body>
    <h1>this is a heading</h1>
    <p>this is a paragraph</p>
    <p>this is <b>bold</b> and <i>italic</i>.</p>
  </body>
</html>

You can install elementtree from PyPI.

(2) lxml.etree (can do indenting)

This is what I chose for my project. This API is also very easy to use and it can do indenting. Documentation is here.

Example:

#!/usr/bin/env python

from lxml import etree as ET

root = ET.Element('background')
starttime = ET.SubElement(root, 'starttime')
hour = ET.SubElement(starttime, 'hour')
hour.text = '00'
minute = ET.SubElement(starttime, 'minute')
minute.text = '00'
second = ET.SubElement(starttime, 'second')
second.text = '01'

print ET.tostring(root, pretty_print=True, xml_declaration=True)
# write to file:
# tree = ET.ElementTree(root)
# tree.write('output.xml', pretty_print=True, xml_declaration=True)

Output:

<?xml version='1.0' encoding='ASCII'?>
<background>
  <starttime>
    <hour>00</hour>
    <minute>00</minute>
    <second>01</second>
  </starttime>
</background>

Installation:
On PyPI, you can find lxml here. However, you will have to install some additional packages too:

sudo apt-get install libxml2-dev
sudo apt-get install libxslt-dev
# until Ubuntu 10.10:
sudo apt-get install python2.6-dev
# from Ubuntu 11.04:
sudo apt-get install python2.7-dev
# under Ubuntu 14.04 I needed this too:
sudo apt-get install -y zlib1g-dev

Then, you can install the library with “sudo pip install lxml“.

Links

Related posts

New string formatting syntax

April 4, 2011 2 comments

I’m still using Python 2.6 but I think it’d be a good idea to start using the new string formatting syntax that was introduced in Python 3. Since it was backported to the 2.6 version, we can start using it right away.

Learn more:

This post is rather a reminder for me that I should read more about this topic. Later, I’ll add some examples too.

Update (20110704)

I asked a question about string formatting on python-list and got lots of useful answers. Here I’d make a short summary.

Old style, but still supported:

"the %s is %s" % ('sky', 'blue')

New style #1:

"the {0} is {1}".format('sky', 'blue')

New style #2, from Python 2.7+:

"the {} is {}".format('sky', 'blue')

New style #3, very useful for long string formattings:

"the {what} is {color}".format(what='sky', color='blue')

In new codes, I stopped using the old style. I use new style #1 and #3.

Related posts

Categories: python Tags: ,

Some list comprehensions examples

April 4, 2011 Leave a comment

Example #1:
Read all files from a directory and keep *.jpg only.

some_dir = '/opt/example'
images = [x for x in os.listdir(some_dir) if x.lower().endswith('jpg')]

We could use globbing too, this is just an example.

Example #2:
Strip whitespace characters from a list of words.

li = ['  apple ', '     banana  ', '  kiwi']
li = [e.strip() for e in li]

Links

Categories: python Tags:

Get the size (dimension) of an image

April 4, 2011 Leave a comment

Problem
You want to get the size (dimension) of an image.

Solution
We will use the Pillow package here, which is the successor of PIL.

from PIL import Image  # uses pillow

image_file = "something.jpg"
im = Image.open(image_file)
print im.size   # return value is a tuple, ex.: (1200, 800)

Related
If the image is on the web and you don’t want to download it, you can get the size of the image in bytes (see this post and get ‘Content-Length’).

What IDE to use for Python?

April 4, 2011 2 comments

For a comprehensive list of Python IDEs and comments, refer to this thread: http://stackoverflow.com/questions/81584/what-ide-to-use-for-python.

Results so far:

  1. PyDev with Eclipse (CP, F, AC, PD, EM, SI, MLS, UML, SC, UT, LN, CF, BM, CT)
  2. Komodo (CP, C/F, MLS, PD, AC, SC, SI, BM, LN, CF, CT, EM, UT, DB)
  3. Vim (CP, F, AC, MLS, SI, BM, LN, CF, UT, PD, EM, SC, CT)
  4. Emacs (CP, F, AC, MLS, PD, EM, SC, SI, BM, LN, CF, CT, UT, UML)
  5. TextMate (Mac, CT, CF, MLS, SI, BM, LN)
  6. Gedit (Linux/Windows, F, AC, MLS, SI, BM, LN, CT [sort of])
  7. Idle (CP, F, AC)
  8. PIDA (Linux/Windows, CP, F, AC, MLS, SI, BM, LN, CF)(VIM Based)
  9. NotePad++ (Windows, F, MLS, LN)
  10. BlueFish (Linux)
  11. JEdit (CP, F, BM, LN, CF, MLS)
  12. E-Texteditor (TextMate Clone for Windows)
  13. WingIde (CP, C, AC, MLS (support for C), PD, EM, SC, SI, BM, LN, CF, CT, UT)
  14. Eric Ide (CP, F, AC, PD, EM, SI, LN, CF, UT)
  15. Pyscripter (Windows, F, AC, PD, EM, SI, LN, CT, UT)
  16. ConTEXT (Windows, C)
  17. SPE (F, AC, UML)
  18. SciTE (CP, F, MLS, EM, BM, LN, CF, CT, SH)
  19. Zeus (W, C, BM, LN, CF, SI, SC, CT)
  20. NetBeans (CP, F, PD, UML, AC, MLS, SC, SI, BM, LN, CF, CT, UT, RAD)
  21. DABO (CP)
  22. BlackAdder (C, CP, CF, SI)
  23. PythonWin (W, F, AC, PD, SI, BM, CF)
  24. Geany (CP, F, very limited AC, MLS, SI, BM, LN, CF)
  25. UliPad (CP, F, AC, PD, MLS, SI, LI, CT, UT, BM)
  26. Boa Constructor (CP, F, AC, PD, EM, SI, BM, LN, UML, CF, CT)
  27. ScriptDev (W, C, AC, MLS, PD, EM, SI, BM, LN, CF, CT)
  28. Spyder (CP, F, AC, PD, EM, SI, BM, LN)
  29. Editra (CP, F, AC, MLS, SC, SI, BM, LN, CF)
  30. Pfaide (Windows, C, AC, MLS, SI, BM, LN, CF, CT)
  31. KDevelop (CP, F, MLS, SC, SI, BM, LN, CF)
  32. Dr.Python (F,EM)
  33. DreamPie (F)
  34. PyCharm (CP, C, AC, PD, EM, MLS (Javascript), SC, SI, BM, LN, CF, PD, UT)
  35. Sublime Text (CP, C, AC, MLS, SI, BM, LN, CT, extensible with Python)

Currently, I’m using Eric4. Works well for me.

Categories: python Tags:

Inspiration

April 3, 2011 Leave a comment

I hear and I forget. I see and I remember. I do and I understand.” – Confucius

Ref.: https://beginnertomaster.wordpress.com/2011/03/30/inspirational-quotes/.

Prettify HTML with BeautifulSoup

April 3, 2011 Leave a comment

With the Python library BeautifulSoup (BS), you can extract information from HTML pages very easily. However, there is one thing you should keep in mind: HTML pages are usually malformed. BS tries to correct an HTML page, but it means that BS’s internal representation of the HTML page can be slightly different from the original source. Thus, when you want to localize a part of an HTML page, you should work with the internal representation.

The following script takes an HTML and prints it in a corrected form, i.e. it shows how BS stores the given page. You can also use it to prettify the source:

#!/usr/bin/env python

# prettify.py
# Usage: prettify <URL>

import sys
import urllib
from BeautifulSoup import BeautifulSoup

class MyOpener(urllib.FancyURLopener):
    version = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.15) Gecko/20110303 Firefox/3.6.15'

def process(url):
    myopener = MyOpener()
    #page = urllib.urlopen(url)
    page = myopener.open(url)

    text = page.read()
    page.close()

    soup = BeautifulSoup(text)
    return soup.prettify()
# process(url)

def main():
    if len(sys.argv) == 1:
        print "Jabba's HTML Prettifier v0.1"
        print "Usage: %s <URL>" % sys.argv[0]
        sys.exit(-1)
    # else, if at least one parameter was passed
    print process(sys.argv[1])
# main()

if __name__ == "__main__":
    main()

You can find the latest version of the script at https://github.com/jabbalaci/Bash-Utils.

Categories: python Tags: , , ,