fold / unfold URLs

Problem
When you visit a gallery, very often the URLs follow a pattern. For instance:
http://www.website.com/001.jpg, http://www.website.com/002.jpg, …, http://www.website.com/030.jpg. There is a sequence: [001-030]. Thus, these URLs can be represented in a compact way: http://www.website.com/ [001-030].jpg (without space). I call it a sequence URL.

There are two challenges here:

  1. Having a sequence URL, restore all the URLs. We can call it unpacking / unfolding.
  2. The opposite of the previous: having a list of URLs (that follow a pattern), compress them to a sequence URL. We can call it folding.

I met this challenge when I was working with URLs but it can be generalized to strings.

Unfolding
I wrote an algorithm for this (see later) but later I found a module that could do it better. I posed my question on Reddit and got a very good answer (see here). It was suggested that I should use the ClusterShell project. This project was made for administrating Linux clusters. We have nothing to do with Linux clusters, but it contains an implementation of string folding / unfolding that we can re-use here.

Installation is trivial: “pip install clustershell“.

Then, I made a wrapper function for unfolding:

from ClusterShell.NodeSet import NodeSet

def unfold_sequence_url(text):
    """
    From a sequence URL restore all the URLs (unpack, unfold).

    Input: "node[1-3]"
    Output: ["node1", "node2", "node3"]
    """
    # Create a new nodeset from string
    nodeset = NodeSet(text)
    res = [str(node) for node in nodeset]
    return res

Folding

Here is another wrapper function for folding:

from ClusterShell.NodeSet import NodeSet

def fold_urls(lst):
    """
    Now the input is a list of URLs
    that we want to compress (fold) to a sequence URL.

    Example:
    Input: ["node1", "node2", "node3"]
    Output: "node[1-3]"
    """
    res = NodeSet.fromlist(lst)    # it's a ClusterShell.NodeSet.NodeSet object
    return str(res)

My own implementation (old)
Naively, I implemented the unfolding since I didn’t know about ClusterShell. I put it here, but I suggest you should use ClusterShell (see above).

#!/usr/bin/env python3

"""
Unpack a sequence URL.

How it works:

First Gallery Image: http://www.website.com/001.jpg
Last Gallery Image: http://www.website.com/030.jpg
Sequence: [001-030]
Sequence URL: http://www.website.com/[001-030].jpg

From the sequence URL we restore the complete list of URLs.
"""

import re

from jive import mylogging as log


def is_valid_sequence_url(url, verbose=True):
    lst = re.findall("\[(.+?)-(.+?)\]", url)
    # print(lst)
    if len(lst) == 0:
        if verbose: log.warning(f"no sequence was found in {url}")
        return False
    if len(lst) > 1:
        if verbose: log.warning(f"several sequences were found in {url} , which is not supported")
        return False
    # else, if len(lst) == 1
    return True
        

def get_urls_from_sequence_url(url, statusbar=None):
    res = []

    if not is_valid_sequence_url(url):
        return []

    m = re.search("\[(.+?)-(.+?)\]", url)
    if m:
        start = m.group(1)
        end = m.group(2)

        prefix = url[:url.find('[')]
        postfix = url[url.find(']')+1:]

        zfill = start.startswith('0') or end.startswith('0')

        # print(url)
        # print(prefix)
        # print(postfix)

        if zfill and (len(start) != len(end)):
            log.warning(f"start and end sequences in {url} must have the same lengths if they are zero-filled")
            return []
        # else
        length = len(start)
        if start.isdigit() and end.isdigit():
            start = int(start)
            end = int(end)
            for i in range(start, end+1):
                middle = i
                if zfill:
                    middle = str(i).zfill(length)
                curr = f"{prefix}{middle}{postfix}"
                res.append(curr)
            # endfor
        # endif
    # endif

    return res

##############################################################################

if __name__ == "__main__":
    url = "http://www.website.com/[001-030].jpg"    # for testing

    urls = get_urls_from_sequence_url(url)
    for url in urls:
        print(url)

Links

Update

It turned out that ClusterShell doesn’t install on Windows. However, I could extract that part of it which does the (un)folding. Read this ticket for more info. The extracted part works on Windows too.

Advertisements

pip install –user

June 29, 2018 Leave a comment

Problem
When we install something with pip, usually we do a “sudo pip install pkg_name“. However, there are some problems with this approach. First, you need root privileges. Second, it installs the package globally, which can cause conflicts in the system. Is there a way to install something with pip locally?

Solution
The good news is that you can install a package with pip locally too. Under Linux the destination folder by default is ~/.local . Add the following line to the end of your ~/.bashrc :

export PATH=~/.local/bin:$PATH

Then install the package locally. For instance, let’s install pipenv:

$ pip install pipenv --user

Open a new terminal (thus ~/.bashrc is read), and launch pipenv. It should be available. Let’s check where it is:

$ which pipenv
/home/jabba/.local/bin/pipenv

pynt: a lightweight build tool, written in Python

Problem
I mainly work under Linux and when I write a Python program, I don’t care if it runs on other platforms or not. Does it work for me? Good :) So far I haven’t really used any build tools. If I needed something, I solved it with a Bash script.

However, a few weeks ago I started to work on a larger side project (JiVE Image Viewer) and I wanted to make it portable from the beginning. Beside Linux, it must also work on Windows (on Mac I couldn’t try it).

Now, if I want to automate some build task (e.g. creating a standalone executable from the project), a Bash script is not enough as it doesn’t run under Windows. Should I write the same thing in a .bat file? Hell, no! Should I install Cygwin on all my Windows machines? No way! It’s time to start using a build tool. The time has finally come.

Solution
There are tons of build tools. I wanted something very simple with which I can do some basic tasks: run an external command, create a directory, delete a directory, move a file, move a directory, etc. As I am most productive in Python, I wanted a build tool that I can program in pure Python. And I wanted something simple that I can start using right away without reading tons of docs.

And this is how I found pynt. Some of its features:

  • easy to learn
  • build tasks are just python funtions
  • manages dependencies between tasks
  • automatically generates a command line interface
  • supports python 2.7 and python 3.x” (source)

Just create a file called build.py in your project’s root folder and invoke the build tool with the command “pynt“.

My project is in a virtual environment. First I installed pynt in the virt. env.:

$ pip install pynt

Here you can find an example that I wrote for JiVE.

Update (20180628)
I had a little contribution to the project: https://github.com/rags/pynt/pull/17. If the name of a task starts with an underscore, then it’s a hidden task, thus it won’t appear in the auto-generated docs. This way you can easily hide sub-tasks.

Categories: python Tags: , , ,

Convert a nested OrderedDict to normal dict

Problem
You have a nested OrderedDict object and you want to convert it to a normal dict.

Today I was playing with the configparser module. It reads an .ini file and builds a dict-like object. However, I prefer normal dict objects. With a configparser object’s “._sections” you can access the underlying dictionary object, but it’s a nested OrderedDict object.

Example:

; preferences.ini

[GENERAL]
onekey = "value in some words"

[SETTINGS]
resolution = '1024 x 768'
import configparser
from pprint import pprint

config = configparser.ConfigParser()
config.read("preferences.ini")
pprint(config._sections)

Sample output:

OrderedDict([('GENERAL', OrderedDict([('onekey', '"value in some words"')])),
             ('SETTINGS', OrderedDict([('resolution', "'1024 x 768'")]))])

Solution
JSON to the rescue! Convert the nested OrderedDict to json, thus you lose the order. Then, convert the json back to a dictionary. Voilá, you have a plain dict object.

    def to_dict(self, config):
        """
        Nested OrderedDict to normal dict.
        """
        return json.loads(json.dumps(config))

Output:

{'GENERAL': {'onekey': '"value in some words"'},
 'SETTINGS': {'resolution': "'1024 x 768'"}}

As you can see, quotes around string values are kept by configparser. If you want to remove them, see my previous post.

I found this solution here @ SO.

Using ConfigParser, read an .ini file to a dict and remove quotes around string values

Problem
In Python you can read .ini files easily with the configparser module.

An .ini file looks like this:

[OPTIONS]
name = Jabba

As you can see, string values are not quoted. However, for me it looks lame. IMO a string must be between quotes or apostrophes. With quotes you can also add whitespace characters to the beginning or the end of a string. So I prefer writing this:

[OPTIONS]
name = "Jabba"

But now quotes become part of the string. If you read it with configparser, the value of name is '"Jabba"' instead of 'Jabba'.

Solution
When using configparser, it builds a dict-like object. I prefer to work with normal dictionaries. So first I read the .ini file, then convert the configparser object to dict, and finally I remove quotes (or apostrophes) from string values. Here is my solution:

preferences.ini

[GENERAL]
onekey = "value in some words"

[SETTINGS]
resolution = '1024 x 768'

example.py

#!/usr/bin/env python3

from pprint import pprint
import preferences

prefs = preferences.Preferences("preferences.ini")
d = prefs.as_dict()
pprint(d)

preferences.py

import sys
import configparser
import json
from pprint import pprint

def remove_quotes(original):
    d = original.copy()
    for key, value in d.items():
        if isinstance(value, str):
            s = d[key]
            if s.startswith(('"', "'")):
                s = s[1:]
            if s.endswith(('"', "'")):
                s = s[:-1]
            d[key] = s
            # print(f"string found: {s}")
        if isinstance(value, dict):
            d[key] = remove_quotes(value)
    #
    return d

class Preferences:
    def __init__(self, preferences_ini):
        self.preferences_ini = preferences_ini

        self.config = configparser.ConfigParser()
        self.config.read(preferences_ini)

        self.d = self.to_dict(self.config._sections)

    def as_dict(self):
        return self.d

    def to_dict(self, config):
        """
        Nested OrderedDict to normal dict.
        Also, remove the annoying quotes (apostrophes) from around string values.
        """
        d = json.loads(json.dumps(config))
        d = remove_quotes(d)
        return d

The line d = remove_quotes(d) is responsible for removing the quotes. Comment / uncomment this line to see the difference.

Output:

$ ./example.py

{'GENERAL': {'onekey': 'value in some words'},
 'SETTINGS': {'resolution': '1024 x 768'}}

I also posted this to SO (link here).

JiVE: A general purpose, cross-platform image viewer with some built-in NSFW support, written in Python 3.6 using PyQt5

In the past 2-3 weeks I’ve been working on a general purpose, cross-platform image viewer that has some built-in NSFW support. It’s called JiVE and it’s in Python 3.6 using PyQt5. A unique feature of JiVE is that it allows you to browse online images just as if they were local images.

You can find it on GitHub: https://github.com/jabbalaci/JiVE-Image-Viewer. I also wrote a detailed documentation.

Screenshots

In action:

jive.png

Selecting an NSFW subreddit:

nsfw.png

Read the docs for more info.

install PyQt5

The following is based on this YouTube video.

$ sudo apt install python3-pyqt5
...
$ python3
>>> import PyQt5
>>>
=======================================================================
$ sudo apt install python3-pyqt5.qtsql
...
$ python3
>>> from PyQt5 import QtSql
>>>
=======================================================================
$ sudo apt install qttools5-dev-tools
...
$ ls -al /usr/lib/x86_64-linux-gnu/qt5/bin/designer
lrwxrwxrwx 1 root root 25 ápr   14 09:38 /usr/lib/x86_64-linux-gnu/qt5/bin/designer -> ../../../qt5/bin/designer

I put a symbolic link on designer to launch it easily.

Categories: PyQt5, python