Minipunto para Arsys

Vaya por delante que no conozco nada de Arsys, y que (por ahora) no tengo nada que ver con ellos. Simplemente quería compartir el hecho de que he vistado su página (fantaseando con adquirir un dominio propio), y he visto esto:

arsys_ff.png

¿Nada raro? Pues fijáos en que, como buen servicio relacionado con Internet, tiene una fotico con un señor y un navegador web abierto… ¿Internet Explorer? Yo creo que no…

Comments (2)

Speeding up file processing with Unix commands

In my last post I commented some changes I made to a Python script to process a file reducing the memory overhead related to reading the file directly to RAM.

I realized that the script needed much optimizing, and resorted to reading the link a reader (Paddy3118) was kind enough to point me to, I realized I could save time by compiling my search expressions. Basically my script opens a gzipped file, searches for lines containing some keywords, and uses the info read from those lines. The original script would take 44 seconds to process a 6.9 MB file (49 MB uncompressed). Using compile on the search expressions, this time went down to 29 s. I tried using match instead of search, and expressions like “if pattern in line:“, instead of re.search(), but these didn’t make much of a difference.

Later I thought that Unix commands such as grep were specially suited for the task, so I gave them a try. I modified my script to run in two steps: in the first one I used zcat and awk (called from within the script) to create a much smaller temporary file with only the lines containing the information I wanted. In a second step, I would process this file with standard Python code. This hybrid approach reduced the processing time to just 12 s. Sometimes using the best tool really makes a difference, and it seems that the Unix utilities are hard to come close to in terms of performance.

It is only after programming exercises like this one that one realizes how important writing good code is (something I will probably never do, but I try). For some reason I always think of Windows, and how Microsoft refuses to make an efficient program, relying on improvementes on the hardware instead. It’s as if I tried to speed up my first script using a faster computer, instead of fixing the code to be more efficient.

Comments (3)

Python: speed vs. memory tradeoff reading files

I was making a script to process some log file, and I basically wanted to go line by line, and act upon each line if some condition was met. For the task of reading files, I generally use readlines(), so my first try was:

f = open(filename,'r')
for line in f.readlines():
  if condition:
    do something
f.close()

However, I realized that as the size of the file read increased, the memory footprint of my script increased too, to the point of almost halting my computer when the size of the file was comparable to the available RAM (1GB).

Of course, Python hackers will frown at me, and say that I was doing something stupid… Probably so. I decided to try a different thing to reduce the memory usage, and did the following:

f = open(filename,'r')
for line in f:
  if condition:
    do something
f.close()

Both pieces of code look very similar, but pay a bit of attention and you’ll see the difference.

The problem with “f.readlines()” is that it reads the whole file and assigns lines to the elements of an (anonymous, in this case) array. Then, the for loops through the array, which is in memory. This leads to faster execution, because the file is read once and then forgotten, but requires more memory, because an array of the size of the file has to be created in the RAM.

fileread_memory

Fig. 1: Memory vs file size for both methods of reading the file

When you do “for line in f:“, you are effectively reading the lines one by one when you do each cycle of the loop. Hence, the memory use is effectively constant, and very low, albeit the disk is accessed more often, and this usually leads to slower execution of the code.

fileread_time.png

Fig. 2: Execution time vs file size for both methods of reading the file

Comments (2)

Password cracking with John the Ripper

Following some security policy updates (not necessarily for better) in my workplace, a colleague and I discussed the vulnerability of user passwords in the accounts of our computers. He assured that an attack with a cracker program such as John the Ripper could potentially break into someone’s account, if only the cracker would have access to an initial user account.

I am by no means an expert on cryptography and computer security, but I would like to outline some ideas about the subject here, and explain why my colleague was partially wrong.

How authentication works

When we log in to an account in a computer, we enter a password. The computer checks it, and if it is the correct one, we are granted access. For the computer to check the password, we must have told it beforehand what the correct password is. Now, if the computer knows our password, anyone with access to the place where it is stored could retrieve our password.

We can avoid that by not telling the computer our password, but only an encrypted version. The encrypted version can be obtained from the password, but there is no operation to obtain the password from its encrypted form. When the computer asks for a password, it applies the encrypting algorithm, and compares the result with the stored encrypted form. If they are equal, it infers that the password was correct, since only from the correct password could one obtain the encrypted form.

On the other hand, no-one can possibly obtain the original password, even by inspection of the contents of the host computer, because only the encrypted form is available there.

How password cracking works

I will only deal with brute force attacks, i.e., trying many passwords, until the correct one is found.

Despite the “romantic” idea that a cracker will try to log in to an account once and again, until she gets access, this method is really lame, since such repeated access tries can be detected and blocked.

The ideal approach is to somehow obtain the encrypted password that the computer stores, and then try (in the cracker’s computer) to obtain the plain password from it. To do so, the cracker will make a guess, encrypt it with the known encrypting method, and compare the result with the encrypted key, repeating the process until a match is found. This task is the one performed by tools such as John the Ripper.

Why this shouldn’t work in a safe (Linux) system

The security of a password relies heavily on the difficulty of guessing it by the cracker. If our password is the same as our user name, this will be the first guess of the cracker, and she’ll find it immediately. If our password is a word that appears in a dictionary, they’ll find it quickly. If it is a string of 12 unrelated characters, plus digits, dots or exclamation marks, then it will take ages for the cracking program to reach the point where it guesses it.

The second hurdle for the cracker is that, even if she gets access to a regular user account, the file where the encrypted passwords are stored is only readable by the root (administrator) user (in a Linux system). Information about users and their passwords is stored in /etc/passwd (that any user can read) and /etc/shadow (that only root can read). The encrypted password is stored only in the latter. In the past all info was in /etc/passwd, but later on it was split, to increase the security.

In short: you need root access to start trying to crack passwords in a machine… but, if you have root access, why bother? You already have full access to all accounts!

Comments

Filelight makes my day

First of all: yes, this could have been made with du. Filelight is just more visual.

The thing is that yesterday I noticed that my root partition was a bit on the crowded side (90+%). I though it could be because of /var/cache/apt/archives/, where all the installed .deb files reside, and started purging some unneeded installed packages (very few… I only install what I need). However, I decided to double check, and Filelight has given me the clue:

Filelight_root

(click to enlarge)

Some utter disaster in a printing job filled the /var/spool/cups/tmp/ with 1.5GB of crap! After deleting it, my root partition is back to 69% full, which is normal (I partitioned my disk with 3 roots of 7.5GB (for three simultaneous OS installations, if need be), a /home of 55GB, and a secondary disk of 250GB).

Simple problem, simple solution.

Comments

App of the week: digiKam

As digital cameras get more and more common, and personal photo collections grow bigger, solutions for managing all these images are more and more needed.

I bought my first digital camera (a Nikon CoolPix 2500) almost 4 years ago (now I see the model was 1 year old when I bought my unit), and now I own a Panasonic Lumix DMC FX10 I’m so happy with. I obviously have the need outlined above, plus the desire to sometimes share some pictures over the web. I didn’t want to go for something like Picasa, and made a lengthy Perl/Tk script to generate HTML albums from some info I would introduce.

When I later discovered digiKam, I realized it had all the features I wanted. It is incredibly useful to tag your pictures, so that you can later on retrieve, say, “all the pictures in which my father appears”. It also has many other features, like easy access to image manipulation (of which I only use the rotation for photos requiring it), or ordering of the pictures by date, so you can see how many pictures were taken each month. The humble, but for me killer, features is that you can automatically generate HTML albums from a list of pictures, which can be selected e.g. by their tags.

Give it a try, and you’ll love it.

Comments

IMAP access to GMAIL with KMail

I recently discovered that Gmail offers IMAP access to the service. I must admit that I have never used IMAP, but it is a very good idea for simplifying the access to one’s account from anywhere, and having your e-mail always up to date in any number of computers. You can think of IMAP as all the good things of POP3 (custom UI, great flexibility) and web-mail (central repository of messages) together, without their drawbacks.

Although I think Google is an evil company that wants to take the world over, I have surrendered to their superb e-mail service, Gmail, with its huge inbox and fast and reliable access. I was happy with POP3, go figure with IMAP…

Of course, I have had to configure my e-mail client, KMail, to use IMAP. For that, I have followed the instructions, e.g., in linux.wordpress.org.

First, you have to allow IMAP connection to Gmail. For that, you just need to go to Settings in your Gmail account, then Forwarding and POP/IMAP, and Enable IMAP (I think it’s on by default).

Second, create an IMAP account in KMail: Settings -> Configure KMail -> Accounts -> Add -> IMAP. You will be prompted for some info:

  • Account name: anything to let you identify it.
  • Login: your full Gmail address.
  • Host: imap.gmail.com
  • Port: 993

Small trick: the default Trash folder is “Local Folders/trash”. If you keep this, when you “delete” a message from the IMAP account, it will be moved to the “General” KMail trash. The problem is that it means moving the message outside the IMAP tree, and I have found that the IMAP mechanism (probably as a security measure) keeps a copy of the message in the original location (i.e., it is actually not erased). To avoid that, you can put something like “Gmail IMAP/[Gmail]/Trash” as Trash folder, and make the deleted message be moved to the Trash inside the IMAP folder. There, it is deleted exactly as if you access your Gmail account from the web and click on “Delete”.

Third, in the Security tab of the dialog window we have just filled, choose “Use SSL for secure mail download” in Encryption and “Clear Text” in Authentication method.

That’s it, you’re done!

So far I have only used IMAP at home (lousy 300 kb connection), and I think it is a bit on the slow side of the scale, but except for that, I am starting to love IMAP.

Comments (1)

Reflexión repentina y aleatoria

Hay muy pocos problemas que los ordenadores no puedan solucionar. Y casi ninguno que no puedan crear.

Comments (1)

Xau Euskadi Gaztean

Euskadi Gazteko 17. maketa lehiaketa bukatzear dago, eta portadan ikus daitekeen moduan Xau-k abesti bat kolatu du bosgarren postuan (hau idazteko momentuan behintzat).

Aspaldi neraman Xau-ri buruz posteatzeko gogoekin, taldeko bateria, Julen, nire laguna baita. Baina Julenek ez dit eskatu diodan “prentsa rekortea” pasa (tío, give cuent!), ta beraz nire kabuz idaztera noa.

Xauren musika, WOFL estilokoa definitzen dutena, Jamendon topa dezakezute, taldea kultura librearen aldeko denez, beren kantak CC lizentzien pean jarri baitituzte. Horretaz aparte, MySpace-en orria dute, ta Web orri ofiziala ere badute.

Ez dut beraien musika ona edo txarra den esan nahi: niri gustatzen zait, baina Julenen laguna naizenez, beharbada nire iritzia ez da subjetiboa. Edozein kasutan, soinu freskoa eta alaia dute, eta gainera ez duzue zertan nire eritziari kasu egin: zoazte Jamendora ta entzun itzazue bertan!

Comments

Igalia en Telecinco

Esta mañana en La Mirada Crítica de Telecinco han hablado sobre conciliación de la vida laboral y personal, y sobre el “teletrabajo” (trabajar desde casa).

Como ejemplo han mencionado Igalia, y han entrevistado in situ a un par de trabajadores de dicha empresa. ¿Por qué lo menciono? Pues porque Igalia es una empresa dedicada al software libre (hecho que los entrevistados han mencionado dos veces en la breve entrevista), y porque T5 ha dicho que Igalia factura un millón de euros al año (o sea, que funciona bien).

Al describir las facilidades (horario flexible, ayudas para guarderías, etc.) que daba Igalia a sus trabajadores, me ha recordado, salvando las distancias, a Google, que repite como mejor empresa estadounidense donde trabajar, según Fortune.

Cerraba el presentador diciendo: “[…] claro, no todas las empresas trabajan en un sector que esto pueda hacerse”. Se refería a IT, obviamente, pero se hace extensivo a, concretamente, el software libre. ¡Trabajad con SL, que se vive mejor!

Comments

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »