Archive for Free software and related beasts

LWD – June 2009

This is a continuation post for my Linux World Domination project, started in this May 2008 post. You can read the previous post in the series here.

In the following data T2D means “time to domination” (the expected time for Windows/Linux shares to cross, counting from the present date). DT2D means difference (increase/decrease) in T2D, with respect to last report. CLP means “current Linux Percent”, as given by last logged data, and DD means domination day (in YYYY-MM-DD format).

For the first time, data for [[PrimeGrid]] is included.

Project T2D DT2D DD CLP Confidence %
Einstein 4.5 months +3.5 months 2009-10-14 44.51 (+2.42) 6.4
MalariaControl >10 years 12.64 (+0.09)
POEM >10 years 10.66 (+0.19)
PrimeGrid 75 months 2015-07-22 9.61 1.3
Rosetta >10 years 8.37 (+0.28)
QMC >10 years 7.92 (+0.05)
SETI >10 years 8.00 (+0.06)
Spinhenge >10 years 3.87 (+0.28)

Mmm, the numbers seem quite discouraging, but the data is what it is. On the bright side, all CLPs have gone up, some almost a 0.3% in 3 months. The Linux tide seems unstoppable, however its forward speed is not necessarily high.

As promised, today I’m showing the plots for PrimeGrid, in next issue QMC@home.

Number of hosts percent evolution for PrimeGrid (click to enlarge)

Accumulated credit percent evolution for PrimeGrid (click to enlarge)

Comments

Jamendo voted best music-related web in CNET contest

Remember the site I get all my free music from? Yes, Jamendo, a site for artists to share their music with their fans under [[Creative Commons licenses]].

Well, apparently (and, of course, with my vote), they won [[CNET Networks|CNET’s]] Webware 2009 competition, in the “Music” category. They also made it to the “Top 100” web sites (which is quite a feat). You can read about it at the Jamendo blog and CNet site.

Comments

Membership test: array versus dictionary

I guess this post is not going to reveal anything new: testing for an item’s membership in an array is slow, and dictionaries are much more CPU-efficient for that (albeit more RAM-hungry). I’m just restating the obvious here, plus showing some benchmarks.

Intro

Let’s define our problem first. We simply want to check whether some item (a string, number or whatever) is contained within some collection of items. For that, the simplest construct in [[Python (programming language)|Python]] would be:

if item in collection:
  do something

The above construct works regardless of “collection” being an array or a dictionary. However, the search for “item” in “collection” is different internally. In the case of a list, Python checks all its elements one by one, comparing them to “item”. If a match is found, True is returned, and the search aborted. For items not in the list, or appearing very late inside it, this search will take long.

However, in the case of dictionaries, the search is almost a one-step procedure: if collection[item] returns something other than an error, then item is in collection.

The tests

I’ve run two different test scripts, one for the array case, another for the dictionary case. In both cases I’ve searched for an item that was not in the collection, to maximize the searching efforts. The array script was as follows:

#!/usr/bin/python

import sys

nitems = int(sys.argv[1])

foo = []
bar = []

for i in range(nitems):
 foo.append(1)
 bar.append(2)

for i in foo:
  if i in bar:
    pass

Similarly, for dictionaries:

#!/usr/bin/python

import sys

nitems = int(sys.argv[1])

foo = {}
bar = {}

for i in range(nitems):
  j = i + nitems
  foo[i] = True
  bar[j] = True

for i in foo:
  if i in bar:
    pass

Both scripts accept (require) an integer number as argument, then build item collections of this size (initialization), then run the check loops. The loops are designed to look for every item of collection 1 in collection 2 (and all checks will fail, because no single item belongs to both sets).

Timing

The scripts were timed simply by measuring the execution [[wall clock time|walltime]] with the GNU time command, as follows:

% /usr/bin/time -f %e script nitems

Bear in mind that the computer was not otherwise idle during the tests. I was surfing the web with Firefox and listening to music with Amarok. Both programs are CPU- and (specially) memory-hungry, so take my results with a grain of salt. In any case, it was not my intention to get solid numbers, but just solid trends.

Memory profiling

I must confess my lack of knowledge around memory management of software, and how to profile it. I just used the [[Valgrind]] utility, with the massif tool, as follows:

% valgrind --tool=massif script nitems

Massif creates a log file (massif.out.pid) that contains “snapshots” of the process at different moments, and gives each of them a timestamp (the default timestamp being the number of instructions executed so far). The logged info that interests us is the [[dynamic memory allocation|heap]] size of the process. As far as I know (in my limited knowledge), this value corresponds to the RAM memory allotted to the process. This value can be digested out of the log file into a format suitable for printing heap size vs. execution time (instructions, really), by a Python script:

#!/usr/bin/python

import sys

try:
  fn = sys.argv[1]
except:
  sys.exit('Insert file name')

b2m = 1024*1024
e2m = 1000000

f = open(fn,'r')

for line in f:
  if 'time=' in line:
    aline = line.split('=')
    t     = aline[1].replace('\n','')
    t     = float(t)/e2m

  elif 'mem_heap_B' in line:
    aline = line.split('=')
    m     = aline[1].replace('\n','')
    m     = float(m)/b2m

    print t,m

f.close()

The above outputs heap MB vs million executions.

A much conciser form with [[AWK|awk]]:

% awk -F= '/time=/{t=$2/1000000};/mem_heap_B/{print t, $2/1048576}' massif.out.pid

Results

The execution times were so different, and the collection size (nitems) range so wide, I have used a [[logarithmic scale]] for both axes in the time vs collection size below:

times

At 64k items, the dictionary search is already 3 orders of magnitude faster, and the difference grows fast as the collection size increases.

With respect to memory use, we can see that in both cases increasing nitems increases the heap size, but in the case of the arrays, the increase is not so pronounced. Looking at the X axes in both following plots, you can see that the number of instructions executed during the run grows linearly with the number of items in the collection (recall that the array plot has a logarithmic X axis).

mem_array
mem_dict

Finally, I compare memory usage of the array and dictionary case in the same plot, as you can see below, for the case of 64k items in the collection:

mem_both

It wasn’t really an easy task, because I had to combine the biggest array case I could handle with the smallest dictionary the timing of which would be meaningful (smaller dictionaries would be equally “immediate”, according to time). Also notice how the X axis has a log scale. Otherwise the number of instructions in the array case would cross the right border of your monitor.

Comments

My music collection surpasses 10000 songs

Following the “report” series started with my first summary of info about the music collection I listen to, I will update that info in this post.

The data (in parentheses the difference with respect to last report, 8 months ago).


Files

Total files        10039 (+527)
  - Commercial     6533 (+372)
  - Jamendo        3381 (+155)
  - Other CC       71 (+0)
  - Other          54 (+0)
Total playtime     634h (+34h)
Disk usage         48GB (+3GB)
MP3 count          0 (+0)
OGG count          100039 (+527)

Last.fm

Playcount           56191 (+14657)

Most played artists Joaquín Sabina - 3233 (+522)
                    Ismael Serrano - 1820 (+1342)
                    The Beatles - 1632 (+286)
                    Extremoduro - 1611 (+917)
                    Silvio Rodríguez - 930 (+148)
                    David TMX - 891 (+38)
                    Siniestro Total - 847 (+197)
                    Bad Religion - 774 (+142)
                    Fito & Fitipaldis - 749 (+74)
                    La Polla Records - 710 (+145)
                    El Reno Renardo - 660
                    Joan Manuel Serrat - 635
                    La Fuga - 570
                    Platero y Tú - 554
                    Ska-P - 554 (+114)

Most played songs   Km. 0 (I. Serrano) - 82
                    Cuando aparezca el petróleo (E. Sánchez) - 74 (+8)
                    Salir (Extremoduro) - 68
                    Golfa (Extremoduro) - 66
                    Caperucita (I. Serrano) - 65
                    La extraña pareja (I. Serrano) - 61
                    Vértigo (I. Serrano) - 61
                    La del pirata cojo (J. Sabina) - 60 (+5)
                    Tirado en la calle (E. Sánchez) - 59 (+6)
                    Un muerto encierras (I. Serrano) - 58
                    Conductores suicidas (J. Sabina) - 57 (+6)
                    Medias Negras (J. Sabina) - 56
                    Y sin embargo (J. Sabina) - 55 (+6)
                    Tierna y dulce historia de amor (I. Serrano) - 53
                    You shook me all night long (AC/DC) - 52
                    So payaso (Extremoduro) - 52
                    Laztana (Latzen) - 50
                    Esperar (E. Sánchez) - 50
                    Pacto entre caballeros (J. Sabina) - 50 (+3)

Comments

John maddog Hall and OpenMoko at DebConf9 in Cáceres, Spain

The annual [[Debian]] developers meeting, DebConf is being held this year in Cáceres (Spain), from July 23 to 30. Apart from just promoting the event, I am posting this to mention that the Spanish OpenMoko distributor Tuxbrain will participate, and sell discounted [[Neo FreeRunner]] phones. As a masochistic proud owner of one such phone, I feel compelled to spread the word (and help infect other people with [[FLOSS]] virii).

You can read a post about it in the debconf-announce and debian-devel-announce lists, made by Martin Krafft. Also, Tuxbrain responsible David Samblas uploaded a video of maddog Hall promoting the event:

Comments

Poor Intel graphics performance in Ubuntu Jaunty Jackalope, and a fix for it

Update: read second comment

I recently upgraded to [[Ubuntu]] Jaunty Jackalope, and have experienced a much slower response of my desktop since. The problem seems to be with [[Intel GMA]] chips, as my computer has. The reason for the poor performance is that Canonical Ltd. decided not to include the [[UXA]] acceleration in Jaunty, for stability reasons (read more at Phoronix).

The issue is discussed at the Ubuntu wiki, along with some solutions. For me, the fix involved just making [[X.Org Server|X.org]] use UXA, by including the following in the xorg.conf file, as they recommend in the wiki:

Section "Device"
        Identifier    "Configured Video Device"
        # ...
        Option        "AccelMethod" "uxa"
EndSection

Comments (8)

My Ubuntu Jaunty Jackalope upgrade plan

Well, not much of a “plan”, but bear with me.

Ever since using [[Debian]] and [[Ubuntu]], I have installed the OS just once per computer. All software upgrades, including full releases, have been done through upgrades, not re-installations. This means that I have never actually had the need to download any ISO besides the first one used when I bought the computer.

This is fine, but I always felt the compulsion to share my bandwidth with fellow Linux users, and relieve some load from the [[Canonical Ltd.]] servers. So for every new Ubuntu release, I have downloaded one or more (amd64, i386, desktop, alternate…) Ubuntu CD ISOs via BitTorrent, and kept them uploading for some time. However, the full BT download of the ISO is a waste of bandwidth, and unless my later upload share is greater than 1.0, I will have been overloading the servers, not relieving them.

Now, with Jaunty Jackalope, I have a way to fix this. I could have done similarly with previous releases, but I didn’t. Here’s the deal: download the ISO and share it with BitTorrent, but don’t upgrade from the Internet as well. Upgrade from the ISO I just downloaded! In the past I would be reluctant to do this, among other things because I don’t want to waste a physical CD for that. However, the Ubuntu upgrade instructions say how to mount the ISO (yes, mounting ISOs is not new. I’ve done it in the past), then upgrade from the mounted image. Once the upgrade is done, I can keep seeding the ISO with BitTorrent.

With this procedure I can use bandwidth more efficiently (I download the required software just once), and I can still share the ISO with other people. Moreover, there is another plus: the ISO is just 699 MB, whereas the upgrade manager in Ubuntu tells me that for the upgrade I will need to download more than 1 GB! The difference is due to the ISO being somehow compressed, I think. I will report on the size of the file system mounted from the ISO (which should be much more than 1 GB).

Update: Well, actually the internet upgrade involves more packages. If you upgrade from the CD, you are still required to download 800 more MB to complete the upgrade, so no magic there.

Comments

Brief MoinMoin howto

I recently started looking for some system/format to dump personal stuff on. I checked my own comparison of wiki software, and chose [[MoinMoin]].

I have already installed some [[MediaWiki]] wikis for personal use, and I consider it a really nice wiki system. However, one of its strengths is also a drawback for me: the backend is a database. I want to be able to migrate the wiki painlessly, and with MediaWiki this is not possible. There is no end to the files and database dumps one has to move around, and then it is never clear if there is still something missing (like edit history or some setting). I want to have a single dir with all the data required to replicate the wiki, and I want to [[rsync]] just this dir to another computer to have an instant clone of the wiki elsewhere. MoinMoin provides just that (I think, I might have to change my mind when I use it more).

So here you are the steps I took to have MM up and running in my Ubuntu 8.10 PC.

Installation

Ubuntu has packages for MM, so you can just install them:

% aptitude install python-moinmoin moinmoin-common

Configuration

Create a dir to put your wiki. For example, if you want to build a wiki called wikiname:

% mkdir -p ~/MoinMoin/wikiname

We made it a subdir of a global dir “MoinMoin”, so we can create a wiki farm in the future.

Next you have to copy some files over:

% cd ~/MoinMoin/wikiname
% cp -vr /usr/share/moin/data .
% cp -vr /usr/share/moin/underlay .
% cp /usr/share/moin/config/wikiconfig.py .
% cp /usr/share/moin/server/wikiserver.py .

If installing a wiki farm, you could be interested in the contents of /usr/share/moin/config/wikifarm/, but this is out of the scope of this post.

The next step is to edit wikiconfig.py to our liking. The following lines could be of interest:

sitename = u’Untitled Wiki’
logo_string = u’MoinMoin Logo
page_front_page = u”MyStartingPage”
data_dir = ‘./data/’
data_underlay_dir = ‘./underlay/’
superuser = [u”yourusername“, ]
acl_rights_before = u”iyourusername:read,write,delete,revert,admin”

Using

You just need to run wikiserver.py, there is no need to have [[Apache HTTP Server|Apache]] running or anything (like with, e.g., MediaWiki):

% cd ~/MoinMoin/wikiname/
% python wikiserver.py &

Then open your favourite browser and go to http://localhost:8080, and you will be greeted by the starting page.

Comments (4)

Pay for online radio? Don’t think so.

Apparently the online radio service my (formerly?) beloved [[last.fm]] was providing will no longer be free in the future, according to a recent official blog entry. Due to marketing/commercial/licensing decisions, the service will remain free of charge in the UK, Germany and the USA. Subscribers in the rest of the world will have to pay 3 euros per month.

In principle, I couldn’t care less for online music. I exclusively listen to my private collection, and only use last.fm to publish the list of tracks I listen to. However, I have a couple of thoughts about it.

The first one is that I think that charging web users according to location should be made obsolete. In Internet each person is just that: a person, an individual, a user. A site could ask me what my preferred language is, to interact better with me (and I could answer whatever, true or false), but my nationality, religion or race should be irrelevant. So much talk about “globalization”, and they only use it when it suits them. For example the work market is “globalizable”, but Internet is not.

My second thought is that they have been forced to charge money to their users because they have to pay for the right of broadcasting licensed music. My position? Fuck them. Yes, seriously, screw paying for the broadcasting rights! I am seriously fed up with the morons in the music (and film) industry, trying to control the uncontrollable. If I were Last.fm, or a radio station in general, I would broadcast just [[Creative Commons]] music, such as that at [[Jamendo]]. If you are an artist and want me to broadcast your music, then you should pay me, not the other way around. However, if you provide me with your music for free, I might broadcast it for free, too. Quid pro quo.

I think that radio broadcast of music, or internet sharing, or the CD market, should be completely free of charge (or, in the case of physical formats like CDs, charge just for the price of the physical medium). The musicians should see this forms of broadcast as advertising. The distribution of their music should be as wide as possible, to make them as famous as possible, so that the revenue they get by doing actual work (like performing live) is maximized.

But, hey, that’s just my view. What can I do with an industry that asks me to either comply or fuck off? Well, I guess that we, the clients/users should be asking that to the industry, not the other way around. I certainly try to.

Comments

Don’t we love religion?

Religion ain’t bad. At least, it isn’t bad if the believer doesn’t try to impose her views upon others. My faith is harmless if I keep it to myself. Because, when did, for example, a good old praying hurt anyone?

Apparently, the answer is August 6, 2005, when the pilots of the Tuninter Flight 1153 decided to pray instead of following the security protocol in the event of fuel starvation. Sixteen people died.

Obviously it would be unfair to say that they died because of the praying. But it would be safe to assume that following the protocol instead of dumping the controls and praying could have made for a smoother landing, probably reducing the death toll.

Comments

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »