Archive for the 'howto' Category

Making iSight camera work in Ubuntu
July 4th 2008

As I said in a previous post, I bought a MacBook, and I am making all bits work correctly. Out-of-the-box support from Ubuntu (the only GNU/Linux I tried on the MacBook so far) is excellent, but some things (camera, WiFi…) need proprietary drivers, so some more tweaks are needed.

I have followed the instructions in the Ubuntu community site, as with the procedures detailed in the previous post.

Basically, it all boils down to:

Fetch the Apple drivers for the camera

As root (if, unlike me, you like sudo, then run the following as user, but prepended with sudo), mount the Mac OSX partition (you didn’t delete it, right?) and copy the relevant file somewhere else (the cp command should be all in one line):

# cd
# mkdir /mnt/macosx
# mount /dev/sda2 /mnt/macosx
# cp /mnt/macosx/System/Library/Extensions/
     IOUSBFamily.kext/Contents/PlugIns/AppleUSBVideoSupport.kext/
     Contents/MacOS/AppleUSBVideoSupport .
# umount /mnt/macosx

You might have noticed that the Mac OSX partition is not sda1, but sda2. Don’t ask me. It turns out like this after following my own installation instructions. Apple must have decided to install the OS in the second partition for some reason.

Install the required packages

We need a package called isight-firmware-tools. Unfortunately it is not present in the Hardy repos at the moment (it was in the Gutsy ones, I think). You can add a Launchpad repo, editing /etc/apt/sources.list to add:

deb http://ppa.launchpad.net/mactel-support/ubuntu hardy main
deb-src http://ppa.launchpad.net/mactel-support/ubuntu hardy main

Then, as root:

# aptitude update
# aptitude install isight-firmware-tools

You will be prompted for a path to the driver you copied before. You can press Enter without paying much attention, then execute (assuming you copied the driver to your root home):

# cd
# ift-extract -a ./AppleUSBVideoSupport

To activate the driver, restart HAL:

# /etc/init.d/hal restart

Test it with Ekiga

As explained in the Ubuntu community site, you can run Ekiga as user (after installing the ekiga package). Choose V4L2 as video plugin, and Built-in iSight should appear among the Input device list. If it does, the process worked.

Tags: , , , , , , , , , ,

No Comments yet »

Installing Ubuntu Hardy Heron on a MacBook
June 25th 2008

Yes, dear reader, I committed the heresy of purchasing an Apple MacBook. I obviously didn’t do it for MacOS X, for which I couldn’t care less, but for the hardware, which is quite good. I was looking for a laptop as small as possible, keeping price low (it cost 799 eur), and screen not too small (this one has a 13″ one. Maybe even 12″ is acceptable. 13″ sure is).

You can see some pictures of it at my MacBook gallery.

If you, like me, are used to PCs, then there are a few things to note:

  • It has a different layout in the keyboard. Most prominently, some keys are missing: Del, PgUp, PgDn, Home, End. Some others (Win key, AltGr) have substitutes that can be mapped. Also the equivalent to AltGr and right Ctrl are kind of swapped: the key closest to the SpaceBar is right “cmd” (could be right Ctrl), and the farthest one is left “alt” (could be AltGr)
  • The touchpad has a single button, and tapping on it won’t click. There is no zone on it to use as vertical scroll, either. Luckily the latter can be fixed via software, so that in Ubuntu the touchpad does behave correctly: you can tap-click, and you can scroll with a smooth movement of a finger. The single-button issue is not present in USB mice: they work “normally”.

I would like to outline here the process of installing Ubuntu (Hardy Heron) in this machine. For that, I recommend reading (as I did), the following links:

Repartition of the hard disk

My Mac came with 120 GB (109 real) of HD, all of it devoted to OS X. Unfortunately, the Ubuntu installer can not cope with resizing of HFS+ partitions. Fortunately, OS X itself can. You can make use of Boot Camp as follows: go to Go->Utilities->Boot Camp Assistant. There you can (should) reduce the existing HFS+ partition to the bare minimum (in my machine it was 22GB, because OSX already uses 17GB, and it won’t accept less than 5GB of free disk). Leave the rest unassigned, and quit.

Installation of multi-boot system

The first hurdle in our Linux installation is that the Mac machines do not have a “normal” BIOS. The BIOS is important for Linux/Windows installations, so this is a drawback. Macs come with a thingie called Extensible Firmware Interface (EFI), instead. However, there is a nice little tool called rEFIt that can help us with it.

To install rEFIt, you can follow the instructions at its Sourceforge site. I followed the Automatic Installation with the Installer Package instructions. Basically I downloaded the Mac disk image from the download page, opened in the Mac OSX file browser, double-clicked it to open it, then double-clicked on the rEFIt.mpkg file inside, and followed the instructions.

This will make the rEFIt menu appear in the next reboot, but only if you hold some key while booting (I think it’s “C”). If you want the menu to always appear, do the following in a terminal, inside Mac OSX:

% cd /efi/refit
% ./enable-always.sh

Installation of Linux OS

After doing the above, you should reboot with an Ubuntu installation CD inserted. If the EFI installation was correct, you will be presented with the rEFIt menu, in which you will have two big icons (OSX and the Linux CD), and five small ones below (”Start EFI Shell”, “Start Partitioning Tool”, “About rEFIt”, “Shut down computer” and “Restart computer”).

Use the left-rigth arrow keys to select the Ubuntu CD, and press Enter. At that moment, or after installing Ubuntu (I don’t recall), the computer could complain saying: “No bootable device — insert boot disk and press any key”. If so, reboot and, in the aforementioned rEFIt menu, choose the second small icon, “Start Partitioning Tool”. This tool will prompt you to update the MBR. Accept, and let it do its magic.

When booting with the CD, you will have the option to make an absolutely normal Ubuntu installation. The Ubuntu MacBook page says that Boot Camp will complain if you make more than two partitions in total. It will, but for me this is ridiculous, since OSX is already eating up one. There’s no way I will install any Linux in a single partition (withouth even swap!). If you do not care about opening Boot Camp ever again (I don’t), do a totally normal install. I created two 8.5GB partitions for / (one for Ubuntu, another one unused for the future), a 750MB swap partition, and the rest (73GB) as /home (potentially shared among the two Linux I could install).

After the installation, reboot and you will find the aforementioned rEFIt menu. Choosing the penguin icon on the right side will take you to the GRUB screen you probably are accustomed to. What this means is that you have to go through two boot menus when booting, but that’s a minor issue, I think. The first menu is an EFI menu, in which you choose OSX or GRUB. The second one is the GRUB menu that lets you choose among different installed kernels.

And I think that’s it…

I will keep on writing when I have time, at least about how to make WiFi work, and also how to configure Compiz Fusion. Yes, the X3100 graphics chip that the MacBooks carry is blacklisted, as not working with CF. But, believe me, it does work!

Tags: , , , , , , , ,

4 Comments »

This blog is my OpenID provider
March 2nd 2008

I really like the idea behind OpenID, and I already have an account at Weblogs SL. Of course, my WordPress.com also was a valid OpenID provider. Moroever, my isilanes.org site (and before that my EHU page) was turned into an OpenID provider by adding the following lines (extra blank added before “link”, to make text visible):

< link rel="openid.server" href="http://openid.blogs.es/index.php/serve" />
< link rel="openid.delegate" href="http://openid.blogs.es/isilanes" />

But I was not completely happy with that. I when signing a comment in a blog (for example) with my WP blog URL, my nickname would appear as “handyfloss” (the name of the blog), not “isilanes” (my nick). If I used the Weblog URL (or that of www.ehu.es/isilanes), my nick would be “isilanes”, but clicking on my nick would take the reader to that URL, instead of to my blog.

With this WordPress.org blog these issues are gone. I have installed the Yadis plugin, and now I can sign with the “isilanes” nick, and give a link to this blog.

The configuration of the plugin is really simple: go to Options->Yadis->Add New Service, and select “Other…“. You will be asked for two data: “OpenID Server” and “OpenID Delegate” (both provided by your OpenID account, with Weblog or whoever). Fill in the requests, click “submit”, and you’re done!

Tags: , , , ,

2 Comments »

Some more tweaks to my Python script
February 19th 2008

Update: you can find the outcome of all this in a latter post: Project BHS

All the comments to my previous post have provided me with hints to increase further the efficiency of a script I am working on. Here I present the advices I have followed, and the speed gain they provided me. I will speak of “speedup”, instead of timing, because this second set of tests has been made in a different computer. The “base” speed will be the last value of my previous test set (1.5 sec in that computer, 1.66 in this one). A speedup of “2″ will thus mean half an execution time (0.83 s in this computer).

Version 6: Andrew Dalke suggested the substitution of:

line = re.sub('>','<',line)

with:

line = line.replace('>','<')

Avoiding the re module seems to speed up things, if we are searching for fixed strings, so the additional features of the re module are not needed.

This is true, and I got a speedup of 1.37.

Version 7: Andrew Dalke also suggested substituting:

search_cre = re.compile(r'total_credit').search
if search_cre(line):

with:

if 'total_credit' in line:

This is more readable, more concise, and apparently faster. Doing it increases the speedup to 1.50.

Version 8: Andrew Dalke also proposed flattening some variables, and specifically avoiding dictionary search inside loops. I went further than his advice, even, and substituted:

stat['win'] = [0,0]

loop
  stat['win'][0] = something
  stat['win'][1] = somethingelse

with:

win_stat_0 = 0
win_stat_1 = 0

loop
  win_stat_0 = something
  win_stat_1 = somethingelse

This pushed the speedup futher up, to 1.54.

Version 9: Justin proposed reducing the number of times some patterns were matched, and extract some info more directly. I attained that by substituting:

loop:
  if 'total_credit' in line:
    line   = line.replace('>','<')
    aline  = line.split('<')
    credit = float(aline[2])

with:

pattern    = r'total_credit>([^<]+)<';
search_cre = re.compile(pattern).search

loop:
  if 'total_credit' in line:
    cre    = search_cre(line)
    credit = float(cre.group(1))

This trick saved enough to increase the speedup to 1.62.

Version 10: The next tweak was an idea of mine. I was diggesting a huge log file with zcat and grep, to produce a smaller intermediate file, which Python would process. The structure of this intermediate file is of alternating lines with “total_credit” then “os_name” then “total_credit”, and so on. When processing this file with Python, I was searching the line for “total_credit” to differentiate between these two lines, like this:

for line in f:
  if 'total_credit' in line:
    do something
  else:
    do somethingelse

But the alternating structure of my input would allow me to do:

odd = True
for line in f:
  if odd:
    do something
    odd = False
  else:
    do somethingelse
    odd = True

Presumably, checking falsity of a boolean is faster than matching a pattern, although in this case the gain was not huge: the speedup went up to 1.63.

Version 11: Another clever suggestion by Andrew Dalke was to avoid using the intermediate file, and use os.popen to connect to and read from the zcat/grep command directly. Thus, I substituted:

os.system('zcat host.gz | grep -F -e total_credit -e os_name > '+tmp)

f = open(tmp)
for line in f:
  do something

with:

f = os.popen('zcat host.gz | grep -F -e total_credit -e os_name')

for line in f:
  do something

This saves disk I/O time, and the performance is increased accordingly. The speedup goes up to 1.98.

All the values I have given are for a sample log (from MalariaControl.net) with 7 MB of gzipped info (49 MB uncompressed). I also tested my scripts with a 267 MB gzipped (1.8 GB uncompressed) log (from SETI@home), and a plot of speedups vs. versions follows:

versions2.png

Execution speedup vs. version
(click to enlarge)

Notice how the last modification (avoiding the temporary file) is of much more importance for the bigger file than for the smaller one. Recall also that the odd/even modification (version 10) is of very little importance for the small file, but quite efficient for the big file (compare it with Version 9).

The plot doesn’t tell (it compares versions with the same input, not one input with the other), but my eleventh version of the script runs the 267 MB log faster than the 7 MB one with Version 1! For the 7 MB input, the overall speedup from Version 1 to Version 11 is above 50.

Tags: , , , ,

11 Comments »

Summary of my Python optimization adventures
February 17th 2008

This is a follow up to two previous posts. In the first one I spoke about saving memory by reading line-by-line, instead of all-at-once, and in the second one I recommended using Unix commands.

The script reads a host.gz log file from a given BOINC project (more precisely one I got from MalariaControl.net, because it is a small project, so its logs are also smaller), and extracts how many computers are running the project, and how much credit they are getting. The statistics are separated by operating system (Windows, Linux, MacOS and other).

Version 0

Here I read the whole file to RAM, then process it with Python alone. Running time: 34.1s.

#!/usr/bin/python

import os
import re
import gzip

credit  = 0
os_list = ['win','lin','dar','oth']

stat = {}
for osy in os_list:
  stat[osy] = [0,0]

# Process file:
f = gzip.open('host.gz','r')
for line in f.readlines():
  if re.search('total_credit',line):
    credit = float(re.sub('/?total_credit>',' ',line.split()[0])
  elif re.search('os_name',line):
    if re.search('Windows',line):
      stat['win'][0] += 1
      stat['win'][1] += credit
    elif re.search('Linux',line):
        stat['lin'][0] += 1
        stat['lin'][1] += credit
    elif re.search('Darwin',line):
      stat['dar'][0] += 1
      stat['dar'][1] += credit
    else:
      stat['oth'][0] += 1
      stat['oth'][1] += credit
f.close()

# Return output:
nstring = ''
cstring = ''
for osy in os_list:
  nstring +=   "%15.0f " % (stat[osy][0])
  try:
    cstring += "%15.0f " % (stat[osy][1])
  except:
    print osy,stat[osy]

print nstring
print cstring

Version 1

The only difference is a “for line in f:“, instead of “for line in f.readlines():“. This saves a LOT of memory, but is slower. Running time: 44.3s.

Version 2

In this version, I use precompiled regular expresions, and the time-saving is noticeable. Running time: 26.2s

#!/usr/bin/python

import os
import re
import gzip

credit  = 0
os_list = ['win','lin','dar','oth']

stat = {}
for osy in os_list:
  stat[osy] = [0,0]

pattern    = r'total_credit'
match_cre  = re.compile(pattern).match
pattern    = r'os_name';
match_os   = re.compile(pattern).match
pattern    = r'Windows';
search_win = re.compile(pattern).search
pattern    = r'Linux';
search_lin = re.compile(pattern).search
pattern    = r'Darwin';
search_dar = re.compile(pattern).search

# Process file:
f = gzip.open('host.gz','r')

for line in f:
  if match_cre(line,5):
    credit = float(re.sub('/?total_credit>',' ',line.split()[0])
  elif match_os(line,5):
    if search_win(line):
      stat['win'][0] += 1
      stat['win'][1] += credit
    elif search_lin(line):
      stat['lin'][0] += 1
      stat['lin'][1] += credit
    elif search_dar(line):
      stat['dar'][0] += 1
      stat['dar'][1] += credit
    else:
      stat['oth'][0] += 1
      stat['oth'][1] += credit
f.close()

# etc.

Version 3

Later I decided to use AWK to perform the heaviest part: parsing the big file, to produce a second, smaller, file that Python will read. Running time: 14.8s.

#!/usr/bin/python

import os
import re

credit  = 0
os_list = ['win','lin','dar','oth']

stat = {}
for osy in os_list:
  stat[osy] = [0,0]

pattern    = r'Windows';
search_win = re.compile(pattern).search
pattern    = r'Linux';
search_lin = re.compile(pattern).search
pattern    = r'Darwin';
search_dar = re.compile(pattern).search

# Distile file with AWK:
tmp = 'bhs.tmp'
os.system('zcat host.gz | awk \'/total_credit/{printf $0}/os_name/{print}\' > '+tmp)

stat = {}
for osy in os_list:
  stat[osy] = [0,0]
# Process tmp file:
f = open(tmp)
for line in f:
  line = re.sub('>','<',line)
  aline = line.split('<')
  credit = float(aline[2])
  os_str = aline[6]
  if search_win(os_str):
    stat['win'][0] += 1
    stat['win'][1] += credit
  elif search_lin(os_str):
    stat['lin'][0] += 1
    stat['lin'][1] += credit
  elif search_dar(os_str):
    stat['dar'][0] += 1
    stat['dar'][1] += credit
  else:
    stat['oth'][0] += 1
    stat['oth'][1] += credit
f.close()

# etc

Version 4

Instead of using AWK, I decided to use grep, with the idea that nothing can beat this tool, when it comes to pattern matching. I was not disappointed. Running time: 5.4s.

#!/usr/bin/python

import os
import re

credit  = 0
os_list = ['win','lin','dar','oth']

stat = {}
for osy in os_list:
  stat[osy] = [0,0]

pattern    = r'total_credit'
search_cre = re.compile(pattern).search

pattern    = r'Windows';
search_win = re.compile(pattern).search
pattern    = r'Linux';
search_lin = re.compile(pattern).search
pattern    = r'Darwin';
search_dar = re.compile(pattern).search

# Distile file with grep:
tmp = 'bhs.tmp'
os.system('zcat host.gz | grep -e total_credit -e os_name > '+tmp)

# Process tmp file:
f = open(tmp)
for line in f:
  if search_cre(line):
    line = re.sub('>','<',line)
    aline = line.split('<')
    credit = float(aline[2])
  else:
    if search_win(line):
      stat['win'][0] += 1
      stat['win'][1] += credit
    elif search_lin(line):
      stat['lin'][0] += 1
      stat['lin'][1] += credit
    elif search_dar(line):
      stat['dar'][0] += 1
      stat['dar'][1] += credit
    else:
      stat['oth'][0] += 1
      stat['oth'][1] += credit

f.close()

# etc

Version 5

I was not completely happy yet. I discovered the -F flag for grep (in the man page), and decided to use it. This flag tells grep that the pattern we are using is a literal, so no expansion of it has to be made. Using the -F flag I further reduced the running time to: 1.5s.

time_vs_version.png

Running time vs. script version (Click to enlarge)

Tags: , , , ,

12 Comments »

Speeding up file processing with Unix commands
February 17th 2008

In my last post I commented some changes I made to a Python script to process a file reducing the memory overhead related to reading the file directly to RAM.

I realized that the script needed much optimizing, and resorted to reading the link a reader (Paddy3118) was kind enough to point me to, I realized I could save time by compiling my search expressions. Basically my script opens a gzipped file, searches for lines containing some keywords, and uses the info read from those lines. The original script would take 44 seconds to process a 6.9 MB file (49 MB uncompressed). Using compile on the search expressions, this time went down to 29 s. I tried using match instead of search, and expressions like “if pattern in line:“, instead of re.search(), but these didn’t make much of a difference.

Later I thought that Unix commands such as grep were specially suited for the task, so I gave them a try. I modified my script to run in two steps: in the first one I used zcat and awk (called from within the script) to create a much smaller temporary file with only the lines containing the information I wanted. In a second step, I would process this file with standard Python code. This hybrid approach reduced the processing time to just 12 s. Sometimes using the best tool really makes a difference, and it seems that the Unix utilities are hard to come close to in terms of performance.

It is only after programming exercises like this one that one realizes how important writing good code is (something I will probably never do, but I try). For some reason I always think of Windows, and how Microsoft refuses to make an efficient program, relying on improvementes on the hardware instead. It’s as if I tried to speed up my first script using a faster computer, instead of fixing the code to be more efficient.

Tags: , , ,

1 Comment »

Python: speed vs. memory tradeoff reading files
February 15th 2008

I was making a script to process some log file, and I basically wanted to go line by line, and act upon each line if some condition was met. For the task of reading files, I generally use readlines(), so my first try was:

f = open(filename,'r')
for line in f.readlines():
  if condition:
    do something
f.close()

However, I realized that as the size of the file read increased, the memory footprint of my script increased too, to the point of almost halting my computer when the size of the file was comparable to the available RAM (1GB).

Of course, Python hackers will frown at me, and say that I was doing something stupid… Probably so. I decided to try a different thing to reduce the memory usage, and did the following:

f = open(filename,'r')
for line in f:
  if condition:
    do something
f.close()

Both pieces of code look very similar, but pay a bit of attention and you’ll see the difference.

The problem with “f.readlines()” is that it reads the whole file and assigns lines to the elements of an (anonymous, in this case) array. Then, the for loops through the array, which is in memory. This leads to faster execution, because the file is read once and then forgotten, but requires more memory, because an array of the size of the file has to be created in the RAM.

fileread_memory

Fig. 1: Memory vs file size for both methods of reading the file

When you do “for line in f:“, you are effectively reading the lines one by one when you do each cycle of the loop. Hence, the memory use is effectively constant, and very low, albeit the disk is accessed more often, and this usually leads to slower execution of the code.

fileread_time.png

Fig. 2: Execution time vs file size for both methods of reading the file

Tags: , , , ,

2 Comments »

Password cracking with John the Ripper
February 10th 2008

Following some security policy updates (not necessarily for better) in my workplace, a colleague and I discussed the vulnerability of user passwords in the accounts of our computers. He assured that an attack with a cracker program such as John the Ripper could potentially break into someone’s account, if only the cracker would have access to an initial user account.

I am by no means an expert on cryptography and computer security, but I would like to outline some ideas about the subject here, and explain why my colleague was partially wrong.

How authentication works

When we log in to an account in a computer, we enter a password. The computer checks it, and if it is the correct one, we are granted access. For the computer to check the password, we must have told it beforehand what the correct password is. Now, if the computer knows our password, anyone with access to the place where it is stored could retrieve our password.

We can avoid that by not telling the computer our password, but only an encrypted version. The encrypted version can be obtained from the password, but there is no operation to obtain the password from its encrypted form. When the computer asks for a password, it applies the encrypting algorithm, and compares the result with the stored encrypted form. If they are equal, it infers that the password was correct, since only from the correct password could one obtain the encrypted form.

On the other hand, no-one can possibly obtain the original password, even by inspection of the contents of the host computer, because only the encrypted form is available there.

How password cracking works

I will only deal with brute force attacks, i.e., trying many passwords, until the correct one is found.

Despite the “romantic” idea that a cracker will try to log in to an account once and again, until she gets access, this method is really lame, since such repeated access tries can be detected and blocked.

The ideal approach is to somehow obtain the encrypted password that the computer stores, and then try (in the cracker’s computer) to obtain the plain password from it. To do so, the cracker will make a guess, encrypt it with the known encrypting method, and compare the result with the encrypted key, repeating the process until a match is found. This task is the one performed by tools such as John the Ripper.

Why this shouldn’t work in a safe (Linux) system

The security of a password relies heavily on the difficulty of guessing it by the cracker. If our password is the same as our user name, this will be the first guess of the cracker, and she’ll find it immediately. If our password is a word that appears in a dictionary, they’ll find it quickly. If it is a string of 12 unrelated characters, plus digits, dots or exclamation marks, then it will take ages for the cracking program to reach the point where it guesses it.

The second hurdle for the cracker is that, even if she gets access to a regular user account, the file where the encrypted passwords are stored is only readable by the root (administrator) user (in a Linux system). Information about users and their passwords is stored in /etc/passwd (that any user can read) and /etc/shadow (that only root can read). The encrypted password is stored only in the latter. In the past all info was in /etc/passwd, but later on it was split, to increase the security.

In short: you need root access to start trying to crack passwords in a machine… but, if you have root access, why bother? You already have full access to all accounts!

Tags: ,

No Comments yet »

Re-partitioning a disk infected with Vista to dual-boot with Linux
January 9th 2008

Some time ago I helped a friend to install Linux into a Vista laptop (incidentally, another friend asked me about the subject today). The only aspect I’m covering in this post is the re-partitioning of the disk, which is a wee bit trickier than with XP and previous Windows versions.

With my laptop (one with XP preinstalled), I just inserted my favorite Linux CD, rebooted, and used the built-in partition utility that all Linux installation CDs have to downsize the Windows partition, and then make the Linux partitions in the remaining disk space. With Vista this is not the case. You have to be very careful, because Linux can not resize the Vista partitions (at least at the time of writing these lines). The problem is that Vista uses a modified NTFS format, and Linux can not cope with it yet (read more at my source for this info: pronetworks.org).

You can also find at pronetworks.org a detailed HowTo for making the resizing of a partition. In summary (e.g., for shrinking a partition to make room for Linux):

  1. Go to Control Panel -> Administrative Tools -> Computer Management
  2. Click on Disk Management (under Storage in left hand panel)
  3. Locate partition to shrink, right click on it, and from the context menu choose Shrink Volume
  4. Fill in the self-explanatory dialog box. Basically, enter amount of MB you want the partition to be reduced by.

You will thus end up with a smaller Vista partition, and some empty space. Now, you can insert the Linux CD, reboot, and install Linux in that empty space, without touching the Vista partition.

Tags: , , ,

No Comments yet »

Extracting audio from a YouTube video
January 7th 2008

This HowTo really has two parts:

1 - How to download a video from YouTube
2 - How to extract audio from any video

The second step is not limited to videos obtained in the first step, and the first step can obviously be made for the sake of it.

How to download a video from YouTube

When you play a video on YouTube, the contents of a FLV file are streamed to your screen. Downloading this FLV file is a bit more tricky than it should be, because there is no direct indication of the URL of this file in the code of the page of the video.

Apparently some guys got over it, and they made the software I use to do the job: a Firefox extension called DownloadHelper. Using it is so easy: a three-sphere icon appears to the right of the URL bar of Firefox. When a page contains material that can be downloaded with DownloadHelper (such as a YouTube video), the spheres in the icon are colorful and move (otherwise they are grayed-out, and still). You can then click on the icon to see a list of items to download, and choose the one you want (usually the .flv file).

How to extract audio from any video

It is so easy to do from the command line. First we use MPlayer to extract the audio in PCM/WAV format:

% mplayer filename.flv -vo null -ao pcm:fast:file=filename.wav

Then, we make use of oggenc to encode the WAV into Ogg Vorbis. For example, to encode with quality level 7 (a reasonable tradeoff between quality and size):

% oggenc filename.wav -q 7 -o filename.ogg

And that’s all to it!

Tags: , , ,

2 Comments »

Next »