Blog Archives

Trivial use of md5sum
November 11th 2009

I just made use of the md5sum command in a rather simple situation which could have been more troublesome to handle with other means. The following scenario highlights, IMHO, how command line greatly simplifies some tasks.

I have a file file.txt, and a collection of files file.txt.N, where N = 1, 2, 3... I know that the former is a copy of one of the latter, but I don't know which. I could have run diff on all the possible matches, but I would have had to run it for every N until a match was found. However, md5sum comes to rescue. I can just run:

% md5sum file.txt*

And check which file.txt.N has a MD5 signature equal to file.txt, so that one would be the match. This solution is still a bit annoying, because I have to visually search matches for a long string. Not to worry! Unix is our friend once again. Recover the above command with a single press to the "up" arrow, then extend the command a tiny bit:

% md5sum file.txt* | sort

Now, since the MD5 signatures are sorted, the match for our file.txt (if there is any), will appear right after the line for file.txt.

I challenge the reader to accomplish the same task as readily, comfortably and successfully in Windows or Mac, or in Linux without the command line.

Tags: , , , , , ,


First impressions with Arch Linux
October 9th 2009

I have been considering for some time trying some Linux distro that would be a little faster than Ubuntu. I made the switch from Debian to Ubuntu some time ago, and I must say that I am very pleased with it, despite it being a bit bloated and slow. Ubuntu is really user-friendly. This term is often despised among geeks, but it does have a huge value. Often times a distro will disguise poor dependency-handling, lack of package tuning and absence of wise defaults as not having "fallen" for user-friendliness and "allowing the user do whatever she feels like".

However comfortable Ubuntu might be, my inner geek wanted to get his hands a little bit dirtier with configurations, and obtain a more responsive OS in return. And that's where Arch Linux fits in. Arch Linux is regarded as one of the fastest Linux distros, at least among the ones based on binary packages, not source code. Is this fame deserved? Well, in my short experience, it seem to be.

First off, let us clarify what one means with a "faster" Linux distro. There are as I see it, broadly speaking, three things that can be faster or slower in the users' interaction with a computer. The first one, and very often cited one, is the boot (and shutdown) time. Any period of time between a user deciding to use the computer and being able to do so is wasted time (from the user's point of view). Many computers stay on for long periods of time, but for a home user, short booting times are a must. A second speed-related item would be the startup time of applications. Booting would be a sub-section of this, if we consider the OS/kernel as an "app", but I refer here to user apps such as an e-mail client or text editor. Granted, most start within seconds at most, many below one second or apparently "instantly", but some others are renowned for their slugginess (, Firefox and Amarok come to mind). Even the not-very-slow apps that take a few seconds can become irritating if used with some frequency. The third speed-related item would be the execution of long-running CPU-intensive software, such as audio/video coding or scientific computation.

Of the three issues mentioned, it should be made clear that the third one (execution of CPU-intensive tasks) is seldom affected at all by the "speed" of the OS. Or it shouldn't be. Of course having the latest versions of the libraries used by the CPU-intensive software should make a difference, but I doubt that encoding a video with MEncoder is any faster in Gentoo than Ubuntu (for the same version of mencoder and libraries). However, the first two (booting and start up of apps) are different from OS to OS.


I did some timings in Ubuntu and Arch, both in the same (dual boot) machine. I measured the time from GRUB to GDM, and then the time from GDM to a working desktop environment (GNOME in both). The exact data might not be that meaningful, as some details could be different from one installation to the other (different choice of firewall, or (minimally) different autostarted apps in the DE). But the big numbers are significant: where Ubuntu takes slightly below 1 minute to GDM, and around half a minute to GNOME, Arch takes below 20 seconds and 10 seconds, respectively.

App start up

Of the three applications mentioned, and Firefox start faster in Arch than in Ubuntu. I wrote down the numbers, but don't have them now. Amarok, on the other hand, took equally long to start (some infamous 35 seconds) in both OSs. It is worth mentioning that all of them start up faster the second and successive times, and that the Ubuntu/Arch differences between second starts is correspondingly smaller (because both are fast). Still Arch is a bit faster (except for Amarok).

ABS, or custom compilation

But the benefits of Arch don't end in a faster boot, or a more responsive desktop (which it has). Arch Linux makes it really easy to compile and install any custom package the user wants, and I decided to take advantage of it. With Debian/Ubuntu, you can download the source code of a package quite easily, but the compilation is more or less left to you, and the installation is different from that of a "official" package. With Arch, generating a package from the source is quite easy, and then installing it with Pacman is trivial. For more info, refer to the Arch wiki entry for ABS.

I first compiled MEncoder (inside the mplayer package), and found out that the compiled version made no difference with respect to the stock binary package. I should have known that, because I say so in this very post, don't I? However, one always thinks that he can compile a package "better", so I tried it (and failed to get any improvement).

On the other hand, when I recompiled Amarok, I did get a huge boost in speed. A simple custom compilation produced an Amarok that took only 15 seconds to start up, less than half of the vanilla binary distributed with Arch (I measured the 15 seconds even after rebooting, which rules out any "second time is faster" effect).

Is it hard to use?

Leaving the speed issue aside, one of the possible drawbacks of a geekier Linux distro is that it could be harder to use. Arch is, indeed, but not much. A seasoned Linux user should hardly find any difficulty to install and configure Arch. It is certainly not for beginners, but it is not super-hard either.

One of the few gripes I have with it regards the installation of a graphical environment. As it turns out, installing a DE such as GNOME does not trigger the installation of any X Window System, such as Server, as dependencies are set only for really vital things. Well, that's not too bad, Arch is not assuming I want something until I tell it I do. Fine. Then, when I do install Xorg, the tools for configuring it are a bit lacking. I am so spoiled by the automagic configurations in Ubuntu, where you are presented a full-fledged desktop with almost no decision on your side, that I miss a magic script that will make X "just work". Anyway, I can live with that. But some thing that made me feel like giving up was that after following all the instruction in the really nice Arch Wiki, I was unable to start X (it would start as a black screen, then freeze, and I could only get out by rebooting the computer). The problem was that I have a Nvidia graphics card, and I needed the (proprietary) drivers. OK, of course I need them, but the default vesa driver should work as well!! In Ubuntu one can get a lower resolution, non-3D effect, desktop with the default vesa driver. Then the proprietary Nvidia drivers allow for more eye-candy and fanciness. But not in Arch. When I decided to skip the test with vesa, and download the proprietary drivers, the X server started without any problem.


I am quite happy with Arch so far. Yes, one has to work around some rough edges, but it is a nice experience as well, because one learns more than with other too user-friendly distros. I think Arch Linux is a very nice distro that is worth using, and I recommend it to any Linux user willing to learn and "get hands dirty".

Tags: , , , , , , , , , , , ,

3 Comments » a wrapper for parallel implementation of LZMA compression
July 23rd 2009

Update: this script has been superseded by ChopZip


I discovered the LZMA compression algorithm some time ago, and have been thrilled by its capacity since. It has higher compression ratios than even bzip2, with a faster decompression time. However, although decompressing is fast, compressing is not: LZMA is even slower than bzip2. On the other hand, gzip remains blazing fast in comparison, while providing a decent level of compression.

More recently I have discovered the interesting pbzip2, which is a parallel implementation of bzip2. With the increasing popularity of multi-core processors (I have a quad-core at home myself), parallelizing the compression tools is a very good idea. pbzip2 performs really well, producing bzip2-compatible files with near-linear scaling with the number of CPUs.

LZMA being such a high performance compressor, I wondered if its speed could be boosted by using it in parallel. Although the Wikipedia article states that the algorithm can be parallelized, I found no such implementation in Ubuntu 9.04, where the utility provided by the lzma package is exclusively serial. Not finding one, I set myself to produce it.


Any compression can be parallelized as follows:

  1. Split the original file into as many pieces as CPU cores available
  2. Compress (simultaneously) all the pieces
  3. Create a single file by joining all the compressed pieces, and call the result "the compressed file"

In a Linux environment, these three tasks can be carried out easily by split, lzma itself, and tar, respectively. I just made a Python script to automate these tasks, called it, and put it in my web site for anyone to download (it's GPLed). Please notice that has been superseded by chopzip, starting with revision 12, whereas latest plzma is revision 6.

I must remark that, while pbzip2 generates bzip2-compatible compressed files, that is not the case with plzma. The products of plzma compression must be decompressed with plzma as well. The actual format of a plzma file is just a TAR file containing as many LZMA-compressed chunks as CPUs used for compression. These chunks, once decompressed individually, can be concatenated (with the cat command) to form the original file.


What review of compression tools lacks benchmarks? No matter how inaccurate or silly, none of them do. And neither does mine :^)

I used three (single) files as reference:

  • molekel.tar - a 108 MB tar file of the (GPL) Molekel 5.0 source code
  • usr.bin.tar - 309 MB tar file of the contens of my /usr/bin/ dir
  • hackable.tar - a 782 MB tar file of the hackable:1 Debian-based distro for the Neo FreeRunner

The second case is intended as an example of binary file compression, whereas the other two are more of a "real-life" example. I didn't test text-only files... I might in the future, but don't expect the conclusions to change much. The testbed was my Frink desktop PC (Intel Q8200 quad-core).

The options for each tool were:

  • gzip/bzip/pbzip2: compression level 6
  • lzma/plzma: compression level 3
  • pbzip2/plzma: 4 CPUs

Compressed size

The most important feature of a compressor is the size of the resulting file. After all, we used it in first place to save space. No matter how fast an algorithm is, if the resulting file is bigger than the original file I wouldn't use it. Would you?

The graph below shows the compressed size ratio for compression of the three test files with each of the five tools considered. The compressed size ratio is defined as the compressed size divided by the original size for each file.

This test doesn't surprise much: gzip is the least effective and LZMA the most one. The point to make here is that the parallel implementations perform as well or badly as their serial counterparts.

If you are unimpressed by the supposedly higher performance of bzip2 and LZMA over gzip, when in the picture all final sizes do not look very different, recall that gzip compressed molekel.tar ~ 3 times (to a 0.329 ratio), whereas LZMA compressed it ~ 4.3 times (to a 0.233 ratio). You could stuff 13 LZMAed files where only 9 gzipped ones fit (and just 3 uncompressed ones).

Compression time

However important the compressed size is, compression time is also an important subject. Actually, that's the very issue I try to address parallelizing LZMA: to make it faster while keeping its high compression ratio.

The graph below shows the normalized times for compression of the three test files with each of the five tools considered. The normalized time is taken as the total time divided by the time it took gzip to finish (an arbitrary scale with t(gzip)=1.0).

Roughly speaking, we could say that in my setting pbzip2 makes bzip2 as fast as gzip, and plzma makes LZMA as fast as serial bzip2.

The speedups for bzip2/pbzip2 and LZMA/plzma are given in the following table:

File pbzip2 plzma
molekel.tar 4.00 2.72
usr.bin.tar 3.61 3.38
hackable.tar 3.80 3.04

The performance of plzma is nowere near pbzip2, but I'd call it acceptable (wouldn't I?, I'm the author!). There are two reasons I can think of to explain lower-than-linear scalability. The first one is the overhead imposed when cutting the file into pieces then assembling them back. The second one, maybe more important, is the disk performance. Maybe each core can compress each file independently, but the disk I/O for reading the chunks and writing them back compressed is done simultaneously on the same disk, which the four processes share.

Update: I think that a good deal of under-linearity comes from the fact that files of equal size will not be compressed in an equal time. Each chunk compression will take a slightly different time to complete, because some will be easier than others to compress. The program waits for the last compression to finish, so it's as slow as the slowest one. It is also true that pieces of 1/N size might take more than 1/N time to complete, so the more chunks, the slower the compression in total (the opposite could also be true, though).

Decompression times

Usually we pay less attention to it, because it is much faster (and because we often compress things never to open them again, in which case we had better deleted them in first place... but I digress).

The following graph shows the decompression data equivalent to the compression times graph above.

The most noteworthy point is that pbzip2 decompresses pbzip2-compressed files faster than bzip2 does with bzip2-compressed files. That is, both compression and decompression benefit from the parallelization. However, for plzma that is not the case: decompression is slower than with the serial LZMA. This is due to two effects: first, the decompression part is still not parallelized in my script (it will soon be). This would lead to decompression speeds near to the serial LZMA. However, it is slower due to the second effect: the overhead caused by splitting and then joining.

Another result worth noting is that, although LZMA is much slower than even bzip2 to compress, the decompression is actually faster. This is not random. LZMA was designed with fast uncompression time in mind, so that it could be used in, e.g. software distribution, where a single person compresses the original data (however painstakingly), then the users can download the result (the smaller, the faster), and uncompress it to use it.


While there is room for improvement, plzma seems like a viable option to speed up general compression tasks where a high compression ratio (LZMA level) is desired.

I would like to stress the point that plzma files are not uncompressable with just LZMA. If you don't use plzma to decompress, you can follow the these steps:

% tar -xf file.plz
% lzma -d file.0[1-4].lz
% cat file.0[1-4] > file
% rm file.0[1-4] file.plz
Tags: , , , , , , , , , , , , ,


Accessing Linux ext2/ext3 partitions from MS Windows
July 2nd 2009

Accessing both Windows FAT and NTFS file systems from Linux is quite easy, with tools like NTFS-3G. However (following with the MS tradition of making itself incompatible with everything else, to thwart competition), doing the opposite (accessing Linux file systems from Windows) is more complicated. One would have to guess why (and how!) closed and proprietary and technically inferior file systems can be read by free software tools, whereas proprietary software with such a big corporation behind is incapable (or unwilling) to interact with superior and free software file systems. Why should Windows users be deprived of the choice over JFS, XFS or ReiserFS, when they are free? MS techs are too dumb to implement them? Or too evil to give their users the choice? Or, maybe, too scared that if choice is possible, their users will dump NTFS? Neither explanation makes one feel much love for MS, does it?

This stupid inability of Windows to read any of the many formats Linux can use gives rise to problems for not only Windows users, but also Linux users. For example, when I format my external hard disks or pendrives, I end up wondering if I should reserve some space for a FAT partition, so I could put there data to share with hypothetical Windows users I could lend the disk to. And, seriously, I abhor wasting my hardware with such lousy file systems, when I could use Linux ones.

Anyway, there are some third-party tools to help us which such a task. I found at least two:

I have used the first one, but as some blogs point out (e.g. BloggUccio), ext2fsd is required if the inode size is bigger than 128 B (256 B in some modern Linux distros).

Getting Ext2IFS

It is a simple exe file you can download from Installing it consists on the typical windows next-next-finish click-dance. In principle the defaults are OK. It will ask you about activating "read-only" (which I declined. It's less safe, but I would like to be able to write too), and something about large file support (which I accepted, because it's only an issue with Linux kernels older than 2.2... Middle Age stuff).

Formatting the hard drive

In principle, Ext2IFS can read ext2/ext3 partitions with no problem. In practice, if the partition was created with an inode size of more than 128 bytes, Ext2IFS won't read it. To create a "compatible" partition, you can mkfs it with the -I flag, as follows:

# mkfs.ext3 -I 128 /dev/whatever

I found out about the 128 B inode thing from this forum thread [es].

Practical use

What I have done, and tested, is what follows: I format my external drives with almost all of it as ext3, as described, leaving a couple of gigabytes (you could cut down to a couple of megabytes if you really want to) for a FAT partition. Then copy the Ext2IFS_1_11a.exe executable to that partition.

Whenever you want to use that drive, Linux will see two partitions (the ext3 and the FAT one), the second one of which you can ignore. From Windows, you will see only a 2GB FAT partition. However, you will be able to open it, find the exe, double-click, and install Ext2IFS. After that, you can unplug the drive and plug it voilĂ , you will see the ext3 partition just fine.

Tags: , , , , , , , , , , , ,


Ubuntu error: the installer needs to remove operating system files
June 18th 2009

I started installing Ubuntu Netbook Remix 9.04 in my ASUS Eee PC, and after the partitioning step, I stumbled upon the following error:

The installer needs to remove operating system files from the install target, but was unable to do so. The install cannot continue

I was installing Ubuntu on top of a previous eeebuntu install, smashing the / partition, while reusing the /home. After minimal googling, I found this bug report at Launchpad, with the same problem (and one year old).

As it turns out, the problem was not with the root partition, as I assumed from the error message, but with the home one. Apparently, Ubuntu didn't like the idea that my home partition was JFS (maybe it couldn't mount it, because jfs_utils are not loaded by default). The solution: install the OS ignoring (not using) the home partition, and mount it afterwards.

Shame on you, Ubuntu, this solution is lame!

Tags: , , , , , , , , ,


Changing font style in PyGTK ComboBox
June 10th 2009

I am using the Glade Interface Designer to produce (very) small (and simple) graphical apps for my Neo FreeRunner. I produce the graphical layout in the form of an XML file (using Glade), then load this XML from a PyGTK program.

The thing is some defaults are not really usable for a device such as the NFR. For example, default fonts are in general too small for the tiny screen of the Neo, which favors apps with only a few, big and shinny buttons. In the case of Label widgets, you can use Pango markup format with the set_markup method, as follows:

mylabel  ='label1')
txt  = '<span font_size="80000" color="red">%s</span>' % (text_string)

However, for other widgets it is not so evident. For example, in ComboBoxes (buttons with a drop-down list), you can't put in the item list anything other than strings, which are displayed literally (markup is not interpreted). Moreover, CBs do not have a "set_font_style" method, or anything similar.

Searching the web did not provide immediate results, but I managed to find this FAQ item at I quote:

4.1.581 How do I change font properties on gtk.Labels and other widgets?

 label = gtk.Label("MyLabel")
 label.modify_font(pango.FontDescription("sans 48"))

This method applies to all widgets that use text, so you can change the text of gtk.Entry and other widgets in the same manner.

Note that, some widgets are only containers for others, like gtk.Button. For those you'd have to get the child widget. For a gtk.Button do this:

  if button.get_use_stock():
     label = button.child.get_children()[1]
  elif isinstance(button.child, gtk.Label):
     label = button.child
     raise ValueError("button does not have a label")

Last changed on Thu Sep 1 14:46:30 2005 by Johan Dahlin (johan-at-gnome-org)

In the case of a CB, we have to pick its child (which is the list itself), and modify it thusly:

cbox ="CBlist")
cblist  = cbox.child
cblist.modify_font(pango.FontDescription("sans 32"))

In my examples above, a class has been created in the script beforehand, and it binds to the Glade XML:

class whatever:

  def __init__(self):

    #Set the Glade file    =

Of course, the CBlist and MyLabel mentioned in my code are the appropriate widget names defined in that XML.

Tags: , , , , , , , , , , ,

No Comments yet »

Poor Intel graphics performance in Ubuntu Jaunty Jackalope, and a fix for it
April 29th 2009

Update: read second comment

I recently upgraded to Ubuntu Jaunty Jackalope, and have experienced a much slower response of my desktop since. The problem seems to be with Intel GMA chips, as my computer has. The reason for the poor performance is that Canonical Ltd. decided not to include the UXA acceleration in Jaunty, for stability reasons (read more at Phoronix).

The issue is discussed at the Ubuntu wiki, along with some solutions. For me, the fix involved just making use UXA, by including the following in the xorg.conf file, as they recommend in the wiki:

Section "Device"
        Identifier    "Configured Video Device"
        # ...
        Option        "AccelMethod" "uxa"
Tags: , , , , , , , , ,


My Ubuntu Jaunty Jackalope upgrade plan
April 27th 2009

Well, not much of a "plan", but bear with me.

Ever since using Debian and Ubuntu, I have installed the OS just once per computer. All software upgrades, including full releases, have been done through upgrades, not re-installations. This means that I have never actually had the need to download any ISO besides the first one used when I bought the computer.

This is fine, but I always felt the compulsion to share my bandwidth with fellow Linux users, and relieve some load from the Canonical Ltd. servers. So for every new Ubuntu release, I have downloaded one or more (amd64, i386, desktop, alternate...) Ubuntu CD ISOs via BitTorrent, and kept them uploading for some time. However, the full BT download of the ISO is a waste of bandwidth, and unless my later upload share is greater than 1.0, I will have been overloading the servers, not relieving them.

Now, with Jaunty Jackalope, I have a way to fix this. I could have done similarly with previous releases, but I didn't. Here's the deal: download the ISO and share it with BitTorrent, but don't upgrade from the Internet as well. Upgrade from the ISO I just downloaded! In the past I would be reluctant to do this, among other things because I don't want to waste a physical CD for that. However, the Ubuntu upgrade instructions say how to mount the ISO (yes, mounting ISOs is not new. I've done it in the past), then upgrade from the mounted image. Once the upgrade is done, I can keep seeding the ISO with BitTorrent.

With this procedure I can use bandwidth more efficiently (I download the required software just once), and I can still share the ISO with other people. Moreover, there is another plus: the ISO is just 699 MB, whereas the upgrade manager in Ubuntu tells me that for the upgrade I will need to download more than 1 GB! The difference is due to the ISO being somehow compressed, I think. I will report on the size of the file system mounted from the ISO (which should be much more than 1 GB).

Update: Well, actually the internet upgrade involves more packages. If you upgrade from the CD, you are still required to download 800 more MB to complete the upgrade, so no magic there.

Tags: , , , , , , , ,

No Comments yet »

Brief MoinMoin howto
April 19th 2009

I recently started looking for some system/format to dump personal stuff on. I checked my own comparison of wiki software, and chose MoinMoin.

I have already installed some MediaWiki wikis for personal use, and I consider it a really nice wiki system. However, one of its strengths is also a drawback for me: the backend is a database. I want to be able to migrate the wiki painlessly, and with MediaWiki this is not possible. There is no end to the files and database dumps one has to move around, and then it is never clear if there is still something missing (like edit history or some setting). I want to have a single dir with all the data required to replicate the wiki, and I want to rsync just this dir to another computer to have an instant clone of the wiki elsewhere. MoinMoin provides just that (I think, I might have to change my mind when I use it more).

So here you are the steps I took to have MM up and running in my Ubuntu 8.10 PC.


Ubuntu has packages for MM, so you can just install them:

% aptitude install python-moinmoin moinmoin-common


Create a dir to put your wiki. For example, if you want to build a wiki called wikiname:

% mkdir -p ~/MoinMoin/wikiname

We made it a subdir of a global dir "MoinMoin", so we can create a wiki farm in the future.

Next you have to copy some files over:

% cd ~/MoinMoin/wikiname
% cp -vr /usr/share/moin/data .
% cp -vr /usr/share/moin/underlay .
% cp /usr/share/moin/config/ .
% cp /usr/share/moin/server/ .

If installing a wiki farm, you could be interested in the contents of /usr/share/moin/config/wikifarm/, but this is out of the scope of this post.

The next step is to edit to our liking. The following lines could be of interest:

sitename = u'Untitled Wiki'
logo_string = u'MoinMoin Logo'
page_front_page = u"MyStartingPage"
data_dir = './data/'
data_underlay_dir = './underlay/'
superuser = [u"yourusername", ]
acl_rights_before = u"iyourusername:read,write,delete,revert,admin"


You just need to run, there is no need to have Apache running or anything (like with, e.g., MediaWiki):

% cd ~/MoinMoin/wikiname/
% python &

Then open your favourite browser and go to http://localhost:8080, and you will be greeted by the starting page.

Tags: , , , , , ,


Temperature and fan speed control on the Asus Eee PC
March 15th 2009

I noticed that after my second eeebuntu install (see a previous post for a why to this reinstall), my Eee PC was a wee bit more noisy. Most probably it has always been like that, but I just noticed after the reinstall.

I put some sensor output in my Xfce panel, and noticed that the CPU temperature hovered around 55 degrees C, and the fan would continuously spin at around 1200 rpm. I searched the web about it, and found out that usually fans are stopped at computer boot, then start spinning when temperature goes up. This is logic. The small catch is that when the temperature in the Eee PC goes down, the fan does not stop automatically. This means that the fans are almost always spinning in the long run.

I searched for methods to fix that, and I read this post at From there I took the idea of taking over the control of the fans, and making them spin according to the current temperature. For that, I wrote the following script:



# Get temperature:

# Choose fan speed:
if [ $TEMP -gt 65 ]
elif [ $TEMP -gt 60 ]
elif [ $TEMP -gt 55 ]

# Impose fan speed:
echo 1 > $MANFILE

The file /proc/eee/fan_manual controls whether fans are under manual (file contains a "1") or automatic (file contains a "0") control. File /proc/eee/fan_speed must contain an integer number from 0 to 100 (a percent of max fan speed).

I am running this script every minute with cron, and thus far it works OK.

Tags: , , , , , , ,


« Prev - Next »

  • The contents of this blog are under a Creative Commons License.

    Creative Commons License

  • Meta