programming - handyfloss

Speed up PyGTK and Cairo by reusing images

March 18, 2010 at 14:15 pm · Filed under Free software and related beasts

As you might have read in this blog, I own a Neo FreeRunner since one year ago. I have used it far less than I should have, mostly because it’s a wonderful toy, but a lousy phone. The hardware is fine, although externally quite a bit less sexy than other smartphones such as the iPhone. The software, however, is not very mature. Being as open as it is, different Linux-centric distros have been developed for it, but I haven’t been able to find one that converts the Neo into an everyday use phone.

But let’s cut the rant, and stick to the issue: that the Neo is a nice playground for a computer geek. Following my desire to play, I installed Debian on it. Next, I decided to make some GUI programs for it, such a screen locker. I found Zedlock, a program written in Python, using GTK+ and Cairo. Basically, Zedlock paints a lock on the screen, and refuses to disappear until you paint a big “Z” on the screen with your finger. Well, that’s what it’s supposed to do, because the 0.1 version available at the Openmoko wiki is not functional. However, with Zedlock I found just what I wanted: a piece of software capable of doing really cool graphical things on the screen of my Neo, while being simple enough for me to understand.

Using Zedlock as a base, I am starting to have real fun programming GUIs, but a problem has quickly arisen: their response is slow. My programs, as all GUIs, draw an image on the screen, and react to tapping in certain places (that is, buttons) by doing things that require that the image on the screen be modified and repainted. This repainting, done as in Zedlock, is too slow. To speed things up, I googled the issue, and found a StackOverflow question that suggested the obvious route: to cache the images. Let’s see how I did it, and how it turned out.

Material

You can download the three Python scripts, plus two sample PNGs, from: http://isilanes.org/pub/blog/pygtk/.

Version 0

You can download this program here. Its main loop follows:

C = Canvas()

# Main window:
C.win = gtk.Window()
C.win.set_default_size(C.width, C.height)

# Drawing area:
C.canvas = gtk.DrawingArea()
C.win.add(C.canvas)
C.canvas.connect('expose_event', C.expose_win)

C.regenerate_base()

# Repeat drawing of bg:
try:
  C.times = int(sys.argv[1])
except:
  C.times = 1

gobject.idle_add(C.regenerate_base)
C.win.show_all()

# Main loop:
gtk.main()

As you can see, it generates a GTK+ window (line 04), with a DrawingArea inside (line 08), and then executes the regenerate_base() function every time the main loop is idle (line 20). Canvas() is a class whose structure is not relevant for the discussion here. It basically holds all variables and relevant functions. The regenerate_base() function follows:

def regenerate_base(self):
    
    # Base Cairo Destination surface:
    self.DestSurf = cairo.ImageSurface(cairo.FORMAT_ARGB32, self.width, self.height)
    self.target   = cairo.Context(self.DestSurf)
  
    # Background:
    if self.bg == 'bg1.png':
      self.bg = 'bg2.png'
    else:
      self.bg = 'bg1.png'

    self.i += 1

    image       = cairo.ImageSurface.create_from_png(self.bg)
    buffer_surf = cairo.ImageSurface(cairo.FORMAT_ARGB32, self.width, self.height)
    buffer      = cairo.Context(buffer_surf)
    buffer.set_source_surface(image, 0,0)
    buffer.paint()
  
    self.target.set_source_surface(buffer_surf, 0, 0)
    self.target.paint()
  
    # Redraw interface:
    self.win.queue_draw()

    if self.i > self.times:
      sys.exit()

    return True

As you can see, it paints the whole window with a PNG file (lines 15-25), choosing alternately bg1.png and bg2.png each time it is called (lines 07-11). Since the re-painting is done every time the main event loop is idle, it just means that images are painted to screen as fast as possible. After a given amount of re-paintings, the script exits.

You can run the code above by placing two suitable PNGs (480×640 pixels) in the same directory as the above code. If an integer argument is given to the script, it re-paints the window that many times, then exits (default, just once). You can time this script by executing, e.g.:

% /usr/bin/time -f %e ./p0.py 1000

Version 1

You can download this version here.

The first difference with p1.py is that the regenerate_base() function has been separated into the first part (generate_base()), which is executed only once at program startup (see below), and all the rest, which is executed every time the background is changed.

def generate_base(self):

    # Base Cairo Destination surface:
    self.DestSurf = cairo.ImageSurface(cairo.FORMAT_ARGB32, self.width, self.height)
    self.target   = cairo.Context(self.DestSurf)

The main difference, though, is that two new functions are introduced:

  def mk_iface(self):

    if not self.bg in self.buffers:
      self.buffers[self.bg] = self.generate_buffer(self.bg)

    self.target.set_source_surface(self.buffers[self.bg], 0, 0)
    self.target.paint()

  def generate_buffer(self, fn):

    image       = cairo.ImageSurface.create_from_png(fn)
    buffer_surf = cairo.ImageSurface(cairo.FORMAT_ARGB32, self.width, self.height)
    buffer      = cairo.Context(buffer_surf)
    buffer.set_source_surface(image, 0,0)
    buffer.paint()
  
    # Return buffer surface:
    return buffer_surf

The function mk_iface() is called within regenerate_base(), and draws the background. However, the actual generation of the background image (the Cairo surface) is done in the second function, generate_buffer(), and only happens once per each background (i.e., twice in total), because mk_iface() reuses previously generated (and cached) surfaces.

Version 2

You can download this version here.

The difference with Revision 1 is that I eliminated some apparently redundant procedures for creating surfaces upon surfaces. As a result, the generate_base() function disappears again. I get rid of the DestSurf and C.target variables, so the mk_iface() and expose_win() functions end up as follows:

  def mk_iface(self):

    if not self.bg in self.buffers:
      self.buffers[self.bg] = self.generate_buffer(self.bg)

    buffer = self.canvas.window.cairo_create()
    buffer.set_source_surface(self.buffers[self.bg],0,0)
    buffer.paint()

  def expose_win(self, drawing_area, event):

    nm = 'bg1.png'

    if not nm in self.buffers:
      self.buffers[nm] = self.generate_buffer(nm)

    ctx = drawing_area.window.cairo_create()
    ctx.set_source_surface(self.buffers[nm], 0, 0)
    ctx.paint()

A side effect is that I can get also rid of the forced redraws of self.win.queue_draw().

Results

I have run the three versions above, varying the C.times variable, i.e., making a varying number of reprints. The command used (actually inside a script) would be something like the one mentioned above:

% /usr/bin/time -f %e ./p0.py 1000

The following table sumarizes the results for Flanders and Maude (see my computers), a desktop P4 and my Neo FreeRunner, respectively. All times in seconds.

Flanders
Repaints	Version 0	Version 1	Version 2
1	0.26	0.43	0.33
4	0.48	0.40	0.42
16	0.99	0.43	0.40
64	2.77	0.76	0.56
256	9.09	1.75	1.15
1024	37.03	6.26	3.44

Maude
Repaints	Version 0	Version 1	Version 2
1	4.17	4.70	5.22
4	8.16	6.35	6.41
16	21.58	14.17	12.28
64	75.14	44.43	35.76
256	288.11	165.58	129.56
512	561.78	336.58	254.73

Data in the tables above has been fitted to a linear equation, of the form t = A + B n, where n is the number of repaints. In that equation, parameter A would represent a startup time, whereas B represents the time taken by each repaint. The linear fits are quite good, and the values for the parameters are given in the following tables (units are milliseconds, and milliseconds/repaint):

Flanders
Parameter	Version 0	Version 1	Version 2
A	291	366	366
B	36	6	3

Maude
Parameter	Version 0	Version 1	Version 2
A	453	3218	4530
B	1092	648	487

Darn it! I have mixed feelings for the results. In the desktop computer (Flanders), the gains are huge, but hardly noticeable. Cacheing the images (Version 1) makes for a 6x speedup, whereas Version 2 gives another twofold increase in speed (a total of 12x speedup!). However, from a user’s point of view, a 36 ms refresh is just as immediate as a 6 ms refresh.

On the other hand, on the Neo, the gains are less spectacular: the total gain in speed for Version 2 is a mere 2x. Anyway, half-a-second repaints instead of one-second ones are noticeable, so there’s that.

And at least I had fun and learned in the process! :^)

Permalink Comments (2)

ChopZip: a parallel implementation of arbitrary compression algorithms

December 20, 2009 at 18:59 pm · Filed under Free software and related beasts

Remember plzma.py? I made a wrapper script for running [[LZMA]] in parallel. The script could be readily generalized to use any compression algorithm, following the principle of breaking the file in parts (one per CPU), compressing the parts, then [[tar (file format)|tarring]] them together. In other words, chop the file, zip the parts. Hence the name of the program that evolved from plzma.py: ChopZip.

Introduction

Currently ChopZip supports [[LZMA|lzma]], [[XZ Utils|xz]], [[gzip]] and lzip. Of them, lzip deserves a brief comment. It was brought to my attention by ~~the~~ a reader of this blog. It is based on the LZMA algorithm, as are lzma and xz. Apparently unlike them, multiple files compressed with lzip can be concatenated to form a single valid lzip-compressed file. Uncompressing the latter generates a concatenation of the formers.

To illustrate the point, check the following shell action:

% echo hello > head

% echo bye > tail

% lzip head

% lzip tail

% cat head.lz tail.lz > all.lz

% lzip -d all.lz

% cat all

hello

bye

However, I just discovered that all gzip, bzip2 and xz do that already! It seems that lzma is advertised as capable of doing it, but it doesn’t work for me. Sometimes it will uncompress the concatenated file to the original file just fine, others it will decompress it to just the first chunk of the set, yet other times it will complain that the “data is corrupt” and refuse to uncompress. For that reason, chopzip will accept two working modes: simple concatenation (gzip, lzip, xz) and tarring (lzma). The relevant mode will be used transparently for the user.

Also, if you use Ubuntu, this bug will apply to you, making it impossible to have xz-utils, lzma and lzip installed at the same time.

The really nice thing about concatenability is that it allows for trivial parallelization of the compression, while maintaining compatibility with the serial compression tool, which can still uncompress the product of a parallel compression. Unfortunatelly, for non-concatenatable compression formats, the output of chopzip will be a tar file of the compressed chunks, making it imposible to uncompress with the original compressor alone (first an untar would be needed, then uncompressing, then concatenation of chunks. Or just use chopzip to decompress).

The rationale behind plzma/chopzip is simple: multi-core computers are commonplace nowadays, but still the most common compression programs do not take advantage of this fact. At least the ones that I know and use don’t. There are at least two initiatives that tackle the issue, but I still think ChopZip has a niche to exploit. The most consolidated one is pbzip2 (which I mention in my plzma post). pbzip2 is great, if you want to use bzip2. It scales really nicely (almost linearly), and pbzipped files are valid bzip2 files. The main drawback is that it uses bzip2 as compression method. bzip2 has always been the “extreme” bother of gzip: compresses more, but it’s so slow that you would only resort to it if compression size is vital. LZMA-based programs (lzma, xz, lzip) are both faster, and even compress more, so for me bzip2 is out of the equation.

A second contender in parallel compression is pxz. As its name suggests, it compresses in using xz. Drawbacks? it’s not in the official repositories yet, and I couldn’t manage to compile it, even if it comprises a single C file, and a Makefile. It also lacks ability to use different encoders (which is not necessarily bad), and it’s a compiled program, versus chopzip, which is a much more portable script.

Scalability benchmark

Anyway, let’s get into chopzip. I have run a simple test with a moderately large file (a 374MB tar file of the whole /usr/bin dir). A table follows with the speedup results for running chopzip on that file, using various numbers of chunks (and consequently, threads). The tests were conducted in a 4GB RAM Intel Core 2 Quad Q8200 computer. Speedups are calculated as how many times faster did #chunks perform with respect to just 1 chunk. It is noteworthy that in every case running chopzip with a single chunk is virtually identical in performance to running the orginal compressor directly. Also decompression times (not show) were identical, irrespective of number of chunks. ChopZip version vas r18.

#chunks	xz	gzip	lzma	lzip
1	1.000	1.000	1.000	1.000
2	1.862	1.771	1.907	1.906
4	3.265	1.910	3.262	3.430
8	3.321	1.680	3.247	3.373
16	3.248	1.764	3.312	3.451

Note how increasing the number of chunks beyond the amount of actual cores (4 in this case) can have a small benefit. This happens because N equal chunks of a file will not be compressed with equal speed, so the more chunks, the smaller overall effect of the slowest-compressing chunks.

Conclusion

ChopZip speeds up quite noticeably the compression of arbitrary files, and with arbitrary compressors. In the case of concatenatable compressors (see above), the resulting compressed file is an ordinary compressed file, apt to be decompressed with the regular compressor (xz, lzip, gzip), as well as with ChopZip. This makes ChopZip a valid alternative to them, with the parallelization advantage.

Permalink Comments (6)

plzma.py: a wrapper for parallel implementation of LZMA compression

July 23, 2009 at 14:39 pm · Filed under Free software and related beasts

Update: this script has been superseded by ChopZip

Introduction

I discovered the [[Lempel-Ziv-Markov chain algorithm|LZMA]] compression algorithm some time ago, and have been thrilled by its capacity since. It has higher compression ratios than even [[bzip2]], with a faster decompression time. However, although decompressing is fast, compressing is not: LZMA is even slower than bzip2. On the other hand, [[gzip]] remains blazing fast in comparison, while providing a decent level of compression.

More recently I have discovered the interesting pbzip2, which is a parallel implementation of bzip2. With the increasing popularity of multi-core processors (I have a quad-core at home myself), parallelizing the compression tools is a very good idea. pbzip2 performs really well, producing bzip2-compatible files with near-linear scaling with the number of CPUs.

LZMA being such a high performance compressor, I wondered if its speed could be boosted by using it in parallel. Although the [[Lempel-Ziv-Markov chain algorithm|Wikipedia article]] states that the algorithm can be parallelized, I found no such implementation in Ubuntu 9.04, where the utility provided by the lzma package is exclusively serial. Not finding one, I set myself to produce it.

About plzma.py

Any compression can be parallelized as follows:

Split the original file into as many pieces as CPU cores available
Compress (simultaneously) all the pieces
Create a single file by joining all the compressed pieces, and call the result “the compressed file”

In a Linux environment, these three tasks can be carried out easily by split, lzma itself, and tar, respectively. I just made a [[Python (programming language)|Python]] script to automate these tasks, called it plzma.py, and put it in my web site for anyone to download (it’s GPLed). Please notice that plzma.py has been superseded by chopzip, starting with revision 12, whereas latest plzma is revision 6.

I must remark that, while pbzip2 generates bzip2-compatible compressed files, that is not the case with plzma. The products of plzma compression must be decompressed with plzma as well. The actual format of a plzma file is just a TAR file containing as many LZMA-compressed chunks as CPUs used for compression. These chunks, once decompressed individually, can be concatenated (with the cat command) to form the original file.

Benchmarks

What review of compression tools lacks benchmarks? No matter how inaccurate or silly, none of them do. And neither does mine :^)

I used three (single) files as reference:

molekel.tar – a 108 MB tar file of the (GPL) [[Molekel]] 5.0 source code
usr.bin.tar – 309 MB tar file of the contens of my /usr/bin/ dir
hackable.tar – a 782 MB tar file of the hackable:1 [[Debian]]-based distro for the [[Neo FreeRunner]]

The second case is intended as an example of binary file compression, whereas the other two are more of a “real-life” example. I didn’t test text-only files… I might in the future, but don’t expect the conclusions to change much. The testbed was my Frink desktop PC (Intel Q8200 quad-core).

The options for each tool were:

gzip/bzip/pbzip2: compression level 6
lzma/plzma: compression level 3
pbzip2/plzma: 4 CPUs

Compressed size

The most important feature of a compressor is the size of the resulting file. After all, we used it in first place to save space. No matter how fast an algorithm is, if the resulting file is bigger than the original file I wouldn’t use it. Would you?

The graph below shows the compressed size ratio for compression of the three test files with each of the five tools considered. The compressed size ratio is defined as the compressed size divided by the original size for each file.

This test doesn’t surprise much: gzip is the least effective and LZMA the most one. The point to make here is that the parallel implementations perform as well or badly as their serial counterparts.

If you are unimpressed by the supposedly higher performance of bzip2 and LZMA over gzip, when in the picture all final sizes do not look very different, recall that gzip compressed molekel.tar ~ 3 times (to a 0.329 ratio), whereas LZMA compressed it ~ 4.3 times (to a 0.233 ratio). You could stuff 13 LZMAed files where only 9 gzipped ones fit (and just 3 uncompressed ones).

Compression time

However important the compressed size is, compression time is also an important subject. Actually, that’s the very issue I try to address parallelizing LZMA: to make it faster while keeping its high compression ratio.

The graph below shows the normalized times for compression of the three test files with each of the five tools considered. The normalized time is taken as the total time divided by the time it took gzip to finish (an arbitrary scale with t(gzip)=1.0).

Roughly speaking, we could say that in my setting pbzip2 makes bzip2 as fast as gzip, and plzma makes LZMA as fast as serial bzip2.

The speedups for bzip2/pbzip2 and LZMA/plzma are given in the following table:

File	pbzip2	plzma
molekel.tar	4.00	2.72
usr.bin.tar	3.61	3.38
hackable.tar	3.80	3.04

The performance of plzma is nowere near pbzip2, but I’d call it acceptable (wouldn’t I?, I’m the author!). There are two reasons I can think of to explain lower-than-linear scalability. The first one is the overhead imposed when cutting the file into pieces then assembling them back. The second one, maybe more important, is the disk performance. Maybe each core can compress each file independently, but the disk I/O for reading the chunks and writing them back compressed is done simultaneously on the same disk, which the four processes share.

Update: I think that a good deal of under-linearity comes from the fact that files of equal size will not be compressed in an equal time. Each chunk compression will take a slightly different time to complete, because some will be easier than others to compress. The program waits for the last compression to finish, so it’s as slow as the slowest one. It is also true that pieces of 1/N size might take more than 1/N time to complete, so the more chunks, the slower the compression in total (the opposite could also be true, though).

Decompression times

Usually we pay less attention to it, because it is much faster (and because we often compress things never to open them again, in which case we had better deleted them in first place… but I digress).

The following graph shows the decompression data equivalent to the compression times graph above.

The most noteworthy point is that pbzip2 decompresses pbzip2-compressed files faster than bzip2 does with bzip2-compressed files. That is, both compression and decompression benefit from the parallelization. However, for plzma that is not the case: decompression is slower than with the serial LZMA. This is due to two effects: first, the decompression part is still not parallelized in my script (it will soon be). This would lead to decompression speeds near to the serial LZMA. However, it is slower due to the second effect: the overhead caused by splitting and then joining.

Another result worth noting is that, although LZMA is much slower than even bzip2 to compress, the decompression is actually faster. This is not random. LZMA was designed with fast uncompression time in mind, so that it could be used in, e.g. software distribution, where a single person compresses the original data (however painstakingly), then the users can download the result (the smaller, the faster), and uncompress it to use it.

Conclusions

While there is room for improvement, plzma seems like a viable option to speed up general compression tasks where a high compression ratio (LZMA level) is desired.

I would like to stress the point that plzma files are not uncompressable with just LZMA. If you don’t use plzma to decompress, you can follow the these steps:

% tar -xf file.plz
% lzma -d file.0[1-4].lz
% cat file.0[1-4] > file
% rm file.0[1-4] file.plz

Permalink Comments (4)

Changing font style in PyGTK ComboBox

June 10, 2009 at 12:34 pm · Filed under Free software and related beasts

I am using the [[Glade Interface Designer]] to produce (very) small (and simple) graphical apps for my [[Neo FreeRunner]]. I produce the graphical layout in the form of an [[XML]] file (using Glade), then load this XML from a [[PyGTK]] program.

The thing is some defaults are not really usable for a device such as the NFR. For example, default fonts are in general too small for the tiny screen of the Neo, which favors apps with only a few, big and shinny buttons. In the case of Label widgets, you can use Pango markup format with the set_markup method, as follows:

mylabel  = self.glade.get_widget('label1')
txt  = '<span font_size="80000" color="red">%s</span>' % (text_string)
mylabel.set_markup(txt)

However, for other widgets it is not so evident. For example, in ComboBoxes (buttons with a drop-down list), you can’t put in the item list anything other than strings, which are displayed literally (markup is not interpreted). Moreover, CBs do not have a “set_font_style” method, or anything similar.

Searching the web did not provide immediate results, but I managed to find this FAQ item at eccentric.cx. I quote:

4.1.581 How do I change font properties on gtk.Labels and other widgets?
Easy:
 label = gtk.Label("MyLabel")
 label.modify_font(pango.FontDescription("sans 48"))
This method applies to all widgets that use text, so you can change the text of gtk.Entry and other widgets in the same manner.

Note that, some widgets are only containers for others, like gtk.Button. For those you’d have to get the child widget. For a gtk.Button do this:
  if button.get_use_stock():
     label = button.child.get_children()[1]
  elif isinstance(button.child, gtk.Label):
     label = button.child
  else:
     raise ValueError("button does not have a label")
Last changed on Thu Sep 1 14:46:30 2005 by Johan Dahlin (johan-at-gnome-org)

In the case of a CB, we have to pick its child (which is the list itself), and modify it thusly:

cbox = self.glade.get_widget("CBlist")
cblist  = cbox.child
cblist.modify_font(pango.FontDescription("sans 32"))

In my examples above, a class has been created in the script beforehand, and it binds to the Glade XML:

class whatever:

  def __init__(self):

    #Set the Glade file
    self.glade    = gtk.glade.XML(gladefile)
    self.glade.signal_autoconnect(self)

Of course, the CBlist and MyLabel mentioned in my code are the appropriate widget names defined in that XML.

Permalink Comments

Membership test: array versus dictionary

May 22, 2009 at 12:31 pm · Filed under Free software and related beasts

I guess this post is not going to reveal anything new: testing for an item’s membership in an array is slow, and dictionaries are much more CPU-efficient for that (albeit more RAM-hungry). I’m just restating the obvious here, plus showing some benchmarks.

Intro

Let’s define our problem first. We simply want to check whether some item (a string, number or whatever) is contained within some collection of items. For that, the simplest construct in [[Python (programming language)|Python]] would be:

if item in collection:
  do something

The above construct works regardless of “collection” being an array or a dictionary. However, the search for “item” in “collection” is different internally. In the case of a list, Python checks all its elements one by one, comparing them to “item”. If a match is found, True is returned, and the search aborted. For items not in the list, or appearing very late inside it, this search will take long.

However, in the case of dictionaries, the search is almost a one-step procedure: if collection[item] returns something other than an error, then item is in collection.

The tests

I’ve run two different test scripts, one for the array case, another for the dictionary case. In both cases I’ve searched for an item that was not in the collection, to maximize the searching efforts. The array script was as follows:

#!/usr/bin/python

import sys

nitems = int(sys.argv[1])

foo = []
bar = []

for i in range(nitems):
 foo.append(1)
 bar.append(2)

for i in foo:
  if i in bar:
    pass

Similarly, for dictionaries:

#!/usr/bin/python

import sys

nitems = int(sys.argv[1])

foo = {}
bar = {}

for i in range(nitems):
  j = i + nitems
  foo[i] = True
  bar[j] = True

for i in foo:
  if i in bar:
    pass

Both scripts accept (require) an integer number as argument, then build item collections of this size (initialization), then run the check loops. The loops are designed to look for every item of collection 1 in collection 2 (and all checks will fail, because no single item belongs to both sets).

Timing

The scripts were timed simply by measuring the execution [[wall clock time|walltime]] with the GNU time command, as follows:

% /usr/bin/time -f %e script nitems

Bear in mind that the computer was not otherwise idle during the tests. I was surfing the web with Firefox and listening to music with Amarok. Both programs are CPU- and (specially) memory-hungry, so take my results with a grain of salt. In any case, it was not my intention to get solid numbers, but just solid trends.

Memory profiling

I must confess my lack of knowledge around memory management of software, and how to profile it. I just used the [[Valgrind]] utility, with the massif tool, as follows:

% valgrind --tool=massif script nitems

Massif creates a log file (massif.out.pid) that contains “snapshots” of the process at different moments, and gives each of them a timestamp (the default timestamp being the number of instructions executed so far). The logged info that interests us is the [[dynamic memory allocation|heap]] size of the process. As far as I know (in my limited knowledge), this value corresponds to the RAM memory allotted to the process. This value can be digested out of the log file into a format suitable for printing heap size vs. execution time (instructions, really), by a Python script:

#!/usr/bin/python

import sys

try:
  fn = sys.argv[1]
except:
  sys.exit('Insert file name')

b2m = 1024*1024
e2m = 1000000

f = open(fn,'r')

for line in f:
  if 'time=' in line:
    aline = line.split('=')
    t     = aline[1].replace('\n','')
    t     = float(t)/e2m

  elif 'mem_heap_B' in line:
    aline = line.split('=')
    m     = aline[1].replace('\n','')
    m     = float(m)/b2m

    print t,m

f.close()

The above outputs heap MB vs million executions.

A much conciser form with [[AWK|awk]]:

% awk -F= '/time=/{t=$2/1000000};/mem_heap_B/{print t, $2/1048576}' massif.out.pid

Results

The execution times were so different, and the collection size (nitems) range so wide, I have used a [[logarithmic scale]] for both axes in the time vs collection size below:

At 64k items, the dictionary search is already 3 orders of magnitude faster, and the difference grows fast as the collection size increases.

With respect to memory use, we can see that in both cases increasing nitems increases the heap size, but in the case of the arrays, the increase is not so pronounced. Looking at the X axes in both following plots, you can see that the number of instructions executed during the run grows linearly with the number of items in the collection (recall that the array plot has a logarithmic X axis).

Finally, I compare memory usage of the array and dictionary case in the same plot, as you can see below, for the case of 64k items in the collection:

It wasn’t really an easy task, because I had to combine the biggest array case I could handle with the smallest dictionary the timing of which would be meaningful (smaller dictionaries would be equally “immediate”, according to time). Also notice how the X axis has a log scale. Otherwise the number of instructions in the array case would cross the right border of your monitor.

Permalink Comments

ogg2mp3 is out

October 21, 2008 at 19:09 pm · Filed under Free software and related beasts

The music loving community may rejoice, ogg2mp3 is out! OK, OK, that is too much to say, but nonetheless someone could find it useful.

Visit its site at: http://isilanes.org/soft/ogg2mp3

ogg2mp3 is a simple Python script I have made to make the task of converting OGG files to MP3 and the other way around easier. There might be other (better) tools out there for the same task, but I had some need, and this script fulfills it. ogg2mp3 can convert single files, lists of them, or even whole directory contents, and reads the [[ID3]] tags of the input OGG/MP3 files, saving them into the output MP3/OGG.

I basically convert bunches of OGG files to MP3 when I want to put them in portable players that don’t read OGG. I do the opposite when someone passes me an MP3 and I want to add it to my collection, which is in OGG format.

Enjoy!

Permalink Comments

Disabling autoscale in a Xmgrace agr file

October 1, 2008 at 12:38 pm · Filed under Free software and related beasts, howto

I am a heavy user of the [[Grace (plotting tool)|Xmgrace]] plotting program, and I love it. An operation very ofter used is to scale the X and Y axes to our liking, to show different parts of our data in the resulting plot. You can do that from the command line by setting the “world” of the graph, providing four numbers as X,Y boundaries:

% xmgrace -world xmin ymin xmax ymax file.dat

Apart from setting the maximum and minimum values for X and Y, we can make use of the autoscale option to selectively show some ranges. The four options to autoscale are:

none – show the X,Y ranges defined by the “world” variable (if not set, the default is “0 0 1 1”).
xy – forget about “world” data, make plot range in X and Y enough to plot all data in input.
x – autoscale X to show all data, but respect Y given by “world”. This means that if a point is not shown because it lies outside the Y range, then it doesn’t count to force X autoscale. This is a wee bit trickier than it sounds.
y – see previous point, with X and Y swapped.

But Xmgrace is not only about [[command-line interface|command line]], or even [[Graphical user interface|GUI]]. You can write a .agr file (for example by saving a plot from the Xmgrace GUI), and manipulate it so that the following command:

% xmgrace file.agr

will bring up a plot with all the data and formatting we have put into the .agr file. It’s really handy to save a file as-is.

Now, the syntax for inputting the world in the .agr is well known:

@ world xmin, ymin, xmax, ymax

where xmin etc. are floating point numbers.

The problem is how to hardcode the autoscale feature into the .agr. I had always been forced to do:

% xmgrace -autoscale none file.agr

from the command line, because I couldn’t find out how to include it in the .agr. Finally I did find it, and that’s the main reason of this post. The syntax is explained in the manual at the Xmrace site, but I found it after googling for agr files containing “autoscale” in them. The line to include seems to be:

@ autoscale onread none

A .agr containing the above line will produce, when called as follows:

% xmgrace file.agr

the same output as a file not containing it, when called as follows:

% xmgrace -autoscale none file.agr

Permalink Comments (3)

Making a PDF grayscale with ghostscript

September 30, 2008 at 18:56 pm · Filed under howto

A request from a friend made me face the problem of converting a color [[Portable Document Format|PDF]] into a [[grayscale]] one. Searching the web provided some ways of doing so with [[Adobe Acrobat]], via some obscure menu item somewhere.

However, the very same operation could be undertaken with free tools, such as [[ghostscript]]. I found a way to do it in the YANUB blog, and I will copy-paste it here, with a small modification.

Assuming we have a file called color.pdf, and we want to convert it into grayscale.pdf, we could run the following command (all in a single line, and omitting the “\” line continuation marks):

% gs -sOutputFile=grayscale.pdf -sDEVICE=pdfwrite \

  -sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray \

  -dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH color.pdf

I prefer the above to YANUB’s version below (in red what he lacks, in blue what I lack), because a shell operation is substituted by some option(s) of the command we are running:

% gs -sOutputFile=grayscale.pdf -sDEVICE=pdfwrite \

  -sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray \

  -dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH color.pdf < /dev/null

A sample [[Perl]] script to alleviate the tedious writing above:

#!/usr/bin/perl -w

use strict;

my $infile = $ARGV[0];

my $outfile = $infile;

$outfile =~ s/\.pdf$//;

$outfile = $outfile.”_gray.pdf”;

system “gs -sOutputFile=$outfile -sDEVICE=pdfwrite -sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray   -dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH $infile”

Assuming we call the Perl script “togray.pl”, and that we have a color file “input.pdf”, we could just issue the command:

% togray.pl input.pdf

and we would get a grayscale version of it, named “input_gray.pdf”.

Permalink Comments (27)

Gmail and browser discrimination

June 4, 2008 at 13:18 pm · Filed under This evil world

Due to [[Mozilla software rebranding|Iceweasel]] (Firefox) being so slow on my machine, I switched to [[Konqueror]], which is reasonably fast and full of features, but nowhere as good as Iceweasel, I must say. However, IW is unbearable, so I’m waiting for FF 3.0 to use IW again.

I use an [[e-mail client]] to read my e-mail over [[IMAP]], my main account being a [[Gmail]] one. However, I sometimes visit the Gmail site, for example to set it to fetch e-mail from some other accounts. I had always done it with IW, and everything worked fine, but now with Konqueror it doesn’t.

With Konqueror I get the message:

and some features are missing (specifically, the option set how to fetch e-mail from other accounts, and some others).

I could understand it if Konqueror were missing some functionality/plugin that IW has and Gmail requires. But it is not the case. I can tell Konqueror to identify itself as Firefox, and THEN the Gmail page shows up correctly, so obviously it’s not due to Konqueror’s limitations. It sounds like a case of sloppy programming from the guys at Google, with something like:

if browser is one of 'IE', 'Firefox', 'Safari':
  show this page
else:
  show dumbed down page

After years of discrimination to non-IE users, and a tremendous fight to make webmasters produce standards-compliant sites, instead of specific browser-compliant ones, we still have to suffer this shit. And from Google, the “don’t be evil” guys, supporters of free software and all that BS.

By the way, this issue is known, and mentioned, for example, in the Wikipedia page for Gmail.

Permalink Comments (5)

Project BHS

March 13, 2008 at 20:49 pm · Filed under Free software and related beasts

As outlined in some previous posts[1,2,3,4], I have been playing around with a piece of Python code to process some log files. The log files to process were actually host.gz files from some [[BOINC]] projects, and the data I want to extract from them is quite simple: the Windows, Linux and Mac shares in the number of computers contributing to them (and the [[BOINC Credit System|work they do]]). By logging this processed data myself, I can see the time evolution of this share, and hopefully show the slow but steady rise of GNU/Linux :^)

I figured out that the contribution to distributed computing projects could be a reasonable indicator of the Windows predominance status. There are many other indicators (for example the number of visits to a web site, e.g. this very one), and I don’t claim that this one is “better”. I just want to add it to the reference list for the reader.

There is a problem with “Windows vs. Linux” figures, and it is that they are not really “competing” products. When cars or soft drinks are the subject, one can figure out the [[market share]], looking at the number of items sold. Linux being [[free software]], one can hardly measure the amount of “sold copies”, and with Windows being pre-installed in most new computers, one can not really trust the “number of computers sold = number of Windows copies sold”, because some users even remove the Windows partition and install Linux on top of it.

Counting the visits to some sites is not without problems, either. Any web site will have a particular audience, and the result will be biased by that fact. When my blog was in WordPress.com, I had roughly as many visits from Windows users as from Linux users, and almost all of them used Firefox as a browser. Obviously this data is not an accurate reflection of the world at large. It so happened that free software users are more likely to surf to sites like mine, hence the bias.

So, without further ado, let me introduce the “BOINC Host Statistics” program (BHS). Here you are a link to its home page. You can find results I have harvested so far in the Screenshots section. For example, the SETI@home credit generation rate statistics follows:

What the plot tells us is that (at the time of writing this) 500 million [[BOINC Credit System|cobblestones]] are being granted to contributors each day. Of them, around 82% are being given to Windows computers, 9-10% to Mac, 8% to GNU/Linux, and the rest to computers running other OSs.

Permalink Comments

« Previous entries Next Page » Next Page »

Material

Version 0

Version 1

Version 2

Results

Meta