Archive for howto

Speeding up file processing with Unix commands

In my last post I commented some changes I made to a Python script to process a file reducing the memory overhead related to reading the file directly to RAM.

I realized that the script needed much optimizing, and resorted to reading the link a reader (Paddy3118) was kind enough to point me to, I realized I could save time by compiling my search expressions. Basically my script opens a gzipped file, searches for lines containing some keywords, and uses the info read from those lines. The original script would take 44 seconds to process a 6.9 MB file (49 MB uncompressed). Using compile on the search expressions, this time went down to 29 s. I tried using match instead of search, and expressions like “if pattern in line:“, instead of re.search(), but these didn’t make much of a difference.

Later I thought that Unix commands such as grep were specially suited for the task, so I gave them a try. I modified my script to run in two steps: in the first one I used zcat and awk (called from within the script) to create a much smaller temporary file with only the lines containing the information I wanted. In a second step, I would process this file with standard Python code. This hybrid approach reduced the processing time to just 12 s. Sometimes using the best tool really makes a difference, and it seems that the Unix utilities are hard to come close to in terms of performance.

It is only after programming exercises like this one that one realizes how important writing good code is (something I will probably never do, but I try). For some reason I always think of Windows, and how Microsoft refuses to make an efficient program, relying on improvementes on the hardware instead. It’s as if I tried to speed up my first script using a faster computer, instead of fixing the code to be more efficient.

Comments (3)

Python: speed vs. memory tradeoff reading files

I was making a script to process some log file, and I basically wanted to go line by line, and act upon each line if some condition was met. For the task of reading files, I generally use readlines(), so my first try was:

f = open(filename,'r')
for line in f.readlines():
  if condition:
    do something
f.close()

However, I realized that as the size of the file read increased, the memory footprint of my script increased too, to the point of almost halting my computer when the size of the file was comparable to the available RAM (1GB).

Of course, Python hackers will frown at me, and say that I was doing something stupid… Probably so. I decided to try a different thing to reduce the memory usage, and did the following:

f = open(filename,'r')
for line in f:
  if condition:
    do something
f.close()

Both pieces of code look very similar, but pay a bit of attention and you’ll see the difference.

The problem with “f.readlines()” is that it reads the whole file and assigns lines to the elements of an (anonymous, in this case) array. Then, the for loops through the array, which is in memory. This leads to faster execution, because the file is read once and then forgotten, but requires more memory, because an array of the size of the file has to be created in the RAM.

fileread_memory

Fig. 1: Memory vs file size for both methods of reading the file

When you do “for line in f:“, you are effectively reading the lines one by one when you do each cycle of the loop. Hence, the memory use is effectively constant, and very low, albeit the disk is accessed more often, and this usually leads to slower execution of the code.

fileread_time.png

Fig. 2: Execution time vs file size for both methods of reading the file

Comments (2)

Password cracking with John the Ripper

Following some security policy updates (not necessarily for better) in my workplace, a colleague and I discussed the vulnerability of user passwords in the accounts of our computers. He assured that an attack with a cracker program such as John the Ripper could potentially break into someone’s account, if only the cracker would have access to an initial user account.

I am by no means an expert on cryptography and computer security, but I would like to outline some ideas about the subject here, and explain why my colleague was partially wrong.

How authentication works

When we log in to an account in a computer, we enter a password. The computer checks it, and if it is the correct one, we are granted access. For the computer to check the password, we must have told it beforehand what the correct password is. Now, if the computer knows our password, anyone with access to the place where it is stored could retrieve our password.

We can avoid that by not telling the computer our password, but only an encrypted version. The encrypted version can be obtained from the password, but there is no operation to obtain the password from its encrypted form. When the computer asks for a password, it applies the encrypting algorithm, and compares the result with the stored encrypted form. If they are equal, it infers that the password was correct, since only from the correct password could one obtain the encrypted form.

On the other hand, no-one can possibly obtain the original password, even by inspection of the contents of the host computer, because only the encrypted form is available there.

How password cracking works

I will only deal with brute force attacks, i.e., trying many passwords, until the correct one is found.

Despite the “romantic” idea that a cracker will try to log in to an account once and again, until she gets access, this method is really lame, since such repeated access tries can be detected and blocked.

The ideal approach is to somehow obtain the encrypted password that the computer stores, and then try (in the cracker’s computer) to obtain the plain password from it. To do so, the cracker will make a guess, encrypt it with the known encrypting method, and compare the result with the encrypted key, repeating the process until a match is found. This task is the one performed by tools such as John the Ripper.

Why this shouldn’t work in a safe (Linux) system

The security of a password relies heavily on the difficulty of guessing it by the cracker. If our password is the same as our user name, this will be the first guess of the cracker, and she’ll find it immediately. If our password is a word that appears in a dictionary, they’ll find it quickly. If it is a string of 12 unrelated characters, plus digits, dots or exclamation marks, then it will take ages for the cracking program to reach the point where it guesses it.

The second hurdle for the cracker is that, even if she gets access to a regular user account, the file where the encrypted passwords are stored is only readable by the root (administrator) user (in a Linux system). Information about users and their passwords is stored in /etc/passwd (that any user can read) and /etc/shadow (that only root can read). The encrypted password is stored only in the latter. In the past all info was in /etc/passwd, but later on it was split, to increase the security.

In short: you need root access to start trying to crack passwords in a machine… but, if you have root access, why bother? You already have full access to all accounts!

Comments

Re-partitioning a disk infected with Vista to dual-boot with Linux

Some time ago I helped a friend to install Linux into a Vista laptop (incidentally, another friend asked me about the subject today). The only aspect I’m covering in this post is the re-partitioning of the disk, which is a wee bit trickier than with XP and previous Windows versions.

With my laptop (one with XP preinstalled), I just inserted my favorite Linux CD, rebooted, and used the built-in partition utility that all Linux installation CDs have to downsize the Windows partition, and then make the Linux partitions in the remaining disk space. With Vista this is not the case. You have to be very careful, because Linux can not resize the Vista partitions (at least at the time of writing these lines). The problem is that Vista uses a modified NTFS format, and Linux can not cope with it yet (read more at my source for this info: pronetworks.org).

You can also find at pronetworks.org a detailed HowTo for making the resizing of a partition. In summary (e.g., for shrinking a partition to make room for Linux):

  1. Go to Control Panel -> Administrative Tools -> Computer Management
  2. Click on Disk Management (under Storage in left hand panel)
  3. Locate partition to shrink, right click on it, and from the context menu choose Shrink Volume
  4. Fill in the self-explanatory dialog box. Basically, enter amount of MB you want the partition to be reduced by.

You will thus end up with a smaller Vista partition, and some empty space. Now, you can insert the Linux CD, reboot, and install Linux in that empty space, without touching the Vista partition.

Comments

Extracting audio from a YouTube video

This HowTo really has two parts:

1 – How to download a video from YouTube
2 – How to extract audio from any video

The second step is not limited to videos obtained in the first step, and the first step can obviously be made for the sake of it.

How to download a video from YouTube

When you play a video on YouTube, the contents of a FLV file are streamed to your screen. Downloading this FLV file is a bit more tricky than it should be, because there is no direct indication of the URL of this file in the code of the page of the video.

Apparently some guys got over it, and they made the software I use to do the job: a Firefox extension called DownloadHelper. Using it is so easy: a three-sphere icon appears to the right of the URL bar of Firefox. When a page contains material that can be downloaded with DownloadHelper (such as a YouTube video), the spheres in the icon are colorful and move (otherwise they are grayed-out, and still). You can then click on the icon to see a list of items to download, and choose the one you want (usually the .flv file).

How to extract audio from any video

It is so easy to do from the command line. First we use MPlayer to extract the audio in PCM/WAV format:

% mplayer filename.flv -vo null -ao pcm:fast:file=filename.wav

Then, we make use of oggenc to encode the WAV into Ogg Vorbis. For example, to encode with quality level 7 (a reasonable tradeoff between quality and size):

% oggenc filename.wav -q 7 -o filename.ogg

And that’s all to it!

Comments (2)

Flash: better without Flash

Remember my previous post about a problem with Flash in Firefox/Iceweasel? Now the second part.

After following my own instructions, I ended up with a Flash instalation that could play YouTube videos correctly, but some other Flash animations would not work. By chance, my computer at work would reproduce any Flash animation just fine, so… why would that be?

To find out the reason, I have compared what Flash-related packages I have installed in Homer (my computer at work) and Heracles (the one at home). The result is quite surprising:

Homer[~]: aptitude search flash
p   flashplayer-mozilla       - Macromedia Flash Player
p   flashrom                  - Universal flash programming utility
p   flashybrid                - automates use of a flash disk as the root filesystem
p   libflash-dev              - GPL Flash (SWF) Library - development files
p   libflash-mozplugin        - GPL Flash (SWF) Library - Mozilla-compatible plugin
p   libflash-swfplayer        - GPL Flash (SWF) Library - stand-alone player
p   libflash0c2               - GPL Flash (SWF) Library - shared library
p   libroxen-flash2           - Flash2 module for the Roxen Challenger web server
p   m16c-flash                - Flash programmer for Renesas M16C and R8C microcontrollers
p   vrflash                   - tool to flash kernels and romdisks to Agenda VR
Homer[~]: aptitude search swf
p   libflash-swfplayer        - GPL Flash (SWF) Library - stand-alone player
p   libswf-perl               - Ming (SWF) module for Perl
p   libswfdec-0.5-4           - SWF (Macromedia Flash) decoder library
p   libswfdec-0.5-4-dbg       - SWF (Macromedia Flash) decoder library
p   libswfdec-0.5-dev         - SWF (Macromedia Flash) decoder library
v   libswfdec-dev             -
p   pyvnc2swf                 - screen recording tool to SWF movie
v   swf-player                -
p   swfdec-mozilla            - Mozilla plugin for SWF files (Macromedia Flash)
p   swfmill                   - xml2swf and swf2xml processor

Yes, Flash works perfectly at Homer because it has no package installed with swf or flash in their name! And I don’t have any Gnash package installed, either. I removed all swf/flash-related packages on Heracles, and now Flash works perfectly in my home computer too.

Comments (11)

Ignoring some file in a Subversion working copy

Sometimes you have files littering a svn working copy, but you don’t want to put them under version management. Often times it is impossible, or plain painful, to delete them, and having them appear in all svn status calls is uncomfortable. This can happen, for example, if you keep a repository of Python files. When a Python script is called from another script (a module, for example), a .pyc file is generated (a “compiled” version of the called .py module, that is just faster to load next time).

Clearly, you don’t want to put the .pyc files in the repository, but deleting them everytime you manage the working copy is painful. For that, you can make svn ignore some files, like e.g. .pyc files. You can do that in at least two ways: specifically for a directory, or as a general preference.

Setting ignore property on a single directory

You can set a property on any directory of a working copy, so that the chosen files in that directory will be ignored by svn. For example:

% svn propset 'svn:ignore' '*.pyc' pythonfiles/

will make svn ignore all .pyc files in the directory “pythonfiles”, but not those in other directories.

Setting ignore patterns globally

You can edit the ~/.subversion/config file, and add “*.pyc” in the line starting with "global-ignores =". For example, my such line is:

global-ignores = *.o *.lo *.la #*# .*.rej *.rej .*~ *~ .#* .DS_Store *.pyc *.swp

It works immediately, and for all the calls to svn status you do in any working copy on that machine.

Comments

Flash player problem in Debian Lenny: “This SWF file is known to trigger bugs in the swfdec decoder.”

[Update: read a more recent post]

I have recently come accross this problem, and I am posting the solution I just found. The problem is the following: when trying to watch any online flash animation with Iceweasel on my Debian Lenny (e.g. in YouTube), instead of the video, I got a grey window, with the following black letters on it:

This SWF file is known to trigger bugs in the swfdec decoder. Playback is cancelled.

This bug can be followed in this August 2007 thread at donarmstrong.com. I am amazed why it didn’t affect me sooner than this week, but oh well…

The steps that fixed the issue for me where to install two packages. I am not 100% sure the first one was needed, though. I first installed the package flashplayer-mozilla, which uninstalled flashplugin-nonfree as a side effect. This step alone fixed nothing.

The second step was to actually read the donarmstrong.com bug report, and see that it says that “This is fixed in swfdec-0.4.2 […]”. Great! I did a aptitude search swf, and found out that Lenny has libswfdec-0.5-4, and a swfdec-mozilla that depends on it. I installed the latter, which obviously also installed the former, plus it removed swf-player and libswfdec0.3.

After that, I opened Iceweasel again, and now flash works!

Comments (2)

« Previous Page « Previous Page Next entries »