Avoiding time_increment_bits problem when encoding bad header MPEG4 videos to Ogg Theora

There is some debate going on lately about the migration of YouTube to [[HTML5]], and whether they (i.e. YouTube’s owner, Google) should support [[H.264]] or [[Theora]] as standard codecs for the upcoming <video> tag. See, for example, how the FSF asks for support for Theora.

The thing is, I discovered [[x264]] not so long ago, and I thought it was a “free version” of H.264. I began using it to reencode the medium-to-low quality videos I keep (e.g., movies and series). The resulting quality/file size ratio stunned me. I could reencode most material downloaded from e.g. p2p sources to 2/3 of their size, keeping the copy indistinguishable from the original with the bare eye.

However, after realizing that x264 is just a free implementation of the proprietary H.264 codec, and in the wake of the H.264/Theora debate, I decided to give Ogg Theora a go. I expected a fair competitor to H.264, although still noticeably behind in quality/size ratio. And that I found. I for one do not care if I need a 10% larger file to attain the same quality, if it means using free formats, so I decided to adopt Theora for everyday reencoding.

After three paragraphs of introduction, let’s get to the point. Which is that reencoding some files with [[ffmpeg2theora]] I would get the following error:

% ffmpeg2theora -i example_video.avi -o output.ogg
[avi @ 0x22b7560]Something went wrong during header parsing, I will ignore it and try to continue anyway.
[NULL @ 0x22b87f0]hmm, seems the headers are not complete, trying to guess time_increment_bits
[NULL @ 0x22b87f0]my guess is 15 bits ;)
[NULL @ 0x22b87f0]looks like this file was encoded with (divx4/(old)xvid/opendivx) -> forcing low_delay flag
Input #0, avi, from 'example_video.avi':
  Metadata:
    Title           : example_video.avi
  Duration: 00:44:46.18, start: 0.000000, bitrate: 1093 kb/s
    Stream #0.0: Video: mpeg4, yuv420p, 624x464, 23.98 tbr, 23.98 tbn, 23.98 tbc
    Stream #0.1: Audio: mp3, 48000 Hz, 2 channels, s16, 32 kb/s
  .

[mpeg4 @ 0x22b87f0]hmm, seems the headers are not complete, trying to guess time_increment_bits
[mpeg4 @ 0x22b87f0]my guess is 16 bits ;)
[mpeg4 @ 0x22b87f0]hmm, seems the headers are not complete, trying to guess time_increment_bits
[mpeg4 @ 0x22b87f0]my guess is 16 bits ;)
[mpeg4 @ 0x22b87f0]looks like this file was encoded with (divx4/(old)xvid/opendivx) -> forcing low_delay flag
    Last message repeated 1 times
[mpeg4 @ 0x22b87f0]warning: first frame is no keyframe

I searched the web for solutions, but to no avail. Usually pasting literal errors in Google yields good results, but in this case I only found developer forums where this bug was discussed. What I haven’t found is simple instructions on how to avoid it in practice.

Well, here it goes my simple solution: pass it through [[MEncoder]] first. Where the following fails:

% ffmpeg2theora -i input.avi -o output.ogg

the following succeeds:

% mencoder input.avi -ovc copy -oac copy -o filtered.avi
% ffmpeg2theora -i filtered.avi -o output.ogg

I guess that what happens is basically that mencoder takes the “raw” video data in input.avi and makes a copy into filtered.avi (which ends up being exactly the same video), building sane headers in the process.

Comments (3)

Amarok WTF

Warning: the following rant could be caused by my idiocy, more than by Amarok’s fault. See comments.

I have been using [[Amarok (software)|Amarok]] as music player even since I had first contact with it. I was really delighted with its capabilities, and everything was intuitive and useful in its [[User Interface|UI]]. That was until version 1.4.x.

Version 2.0 was an almost complete rewrite of the code, and as such many things changed. The UI suffered a large redesign, in my opinion for worse… but that’s just an opinion. There are, however, other issues that are facts, not opinion. Amarok 2.0 lacked many of the features of Amarok 1.x, as the developers themselves admitted (not much room to deny). Fine, I have no problem with that. It is understandable: until version 2.x things will not settle down. The only problem is that Linux distros (at least Ubuntu) adopted Amarok 2.0 almost immediately, leaving us users with a broken toy. Not nice.

My latest gripe with Amarok? I run Ubuntu 9.10 at work (Amarok 2.2.0), and latest Arch at home. In the latter, I just updated Amarok 2.2.1 to 2.2.2 in the weekend (Arch is much more up to date than Ubuntu, since it’s based in almost bleeding-edge rolling releases). Well, unlike Amarok 2.2.1 before (or Amarok 2.2.0 at work), the new Amarok 2.2.2 does not have an option for random play. Yes, you read correctly. There is no way I know of to avoid playing all the songs in the playlist in the exact order (in principle, alphabetical) they are laid on. In older versions, you could play songs or albums randomly. With 2.2.2, they lost this capability. Amazing feature regression, if you ask me.

Comments (3)

Reverse SSH to twart over-zealous firewalls

I guess it is not very uncommon, since it has happened twice to me, in two sites I have worked. “Over-cautious” sysadmins decide that the University, Institute, Corporation, or whatever, would be safer if connections to the [[LAN]] from outside of it were banned, including the [[Secure Shell|port 22]]. In an effort to avoid making security trample service (how considerate!) the usual solution to allow remote conection is to use [[VPN]].

While VPN might have some advantages over SSH, I prefer the latter by far, and don’t think a proper SSH setup has any lack of security, specially comparing to poorly implemented VPNs. For example, I would never trust something as vital as VPN software to a private company, yet most popular VPNs are proprietary (at least the University of the Basque Country uses the Cisco VPN). It is at least paradoxical that a free and open SSH implementation as e.g. [[OpenSSH]], tested in such a throughout way, and for so long, is dumped, and a black-box solution developed by a profit-driven organization is used instead.

But I digress. I am not interesting in justifying why I want SSH. What I want to show here is a trick I learned reading tuxradar.com. Esentially, allows one to connect (with SSH) from machine A to machine B, even if machine B has all ports closed (so SSH-ing using another port would be useless either).

The idea (see below) is to connect from machine B to A, which is allowed (and is also the exact reverse of what we actually want to do), in a way that opens a canal for a “reverse” connection from A to B:

(In machine_B)
% ssh -R 1234:localhost:22 username_in_A@machine_A

Then we will be able to use port 1234 (or whatever port we specify in the ssh -R command above) in machine A to connect to machine B, as long as the original ssh -R holds:

(In machine_A)
% ssh username_in_B@localhost -p 1234

The picture shows it better:

SSHing from A to B (dashed red arrow) is disallowed, but the reverse (in black) is not. The ssh -R command line (see code above), opens up the link between ports 22 and 1234 (two-headed black arrow), so that a ssh -p to port 1234 in machine A will redirect us to machine B. If we are asked for a password (at the ssh -p stage), they are requesting the one for machine B, since we are being redirected to machine B.

Please, recall that the above recipe is no less secure than a regular SSH from A to B (if it were allowed), since anyone SSHing to port 1234 in machine A will be automatically redirected to machine B, but undergoing the same security checks as usual (password, public/private key…). Note also that I am talking about what is possible, not necessarily desirable or comfortable. It’s just another tool if you want to use it.

Comments

USA is different

Yes, “Spain is different!” they use to say, mostly in Spain. But our friends across the ocean are different to most civilized countries, in we-know-what issue… I was taking a peek to the trailers for [[A Perfect Getaway]] (La escapada perfecta in Spain), in the FilmAffinity site (a site I really recommend), and look below what I found:

Explanation: the above is a still image at 00:49 of the Spanish trailer (I guess it is the same for all Europe, with or without voice translation, depending on the country. In Spain, voice is translated to Spanish). The below, is a still image at 00:33 of the original (American) trailer. Notice any difference in Miss Jovovich’s costume? Exactly, the friggin’ film is rated R, and boasts plenty of murder and violence. But somehow the viewer is not allowed to take a look at a quite candid image of a butt. At least in the trailer.

Of course it doesn’t strike me as a big surprise. I was surprise-proofed the day I watched kid [[Son_Goku_(Dragon_Ball)|Son Goku]] have his genitalia covered by pious briefs in a scene of the American version of [[Dragon Ball]], where he is seen swimming naked (in the original Japanese version, and also the ones aired in Spain).

Comments (2)

Hardware compatibility is better with Windows… not

One of the (few, but legitimate) reasons given by some Windows users to not switch to Linux is that many pieces of hardware are not recognized by the latter. Sure enough, 99.9%, if not all, of the devices sold in shops are “Windows compatible”. The manufacturers of devices make damn sure their device, be it a pendrive or a printer, a computer screen or a keyboard, will work on any PC running Windows. They will even ship a CD with the drivers in the same package, so that installation of the device is as smooth as possible in Microsoft’s platform. Linux compatibility? Well, they usually just don’t care. Those hackers will make it work anyway, so why bother? And their market share is too small to take them into account.

Now, let’s pass to some personal experience with a webcam. I bought a webcam for my girlfriend’s laptop, which doesn’t have one integrated. The webcam was a cheap Logitech USB one, with “Designed for Skype” and “Windows compatible” written all around on the box. It even came with a CD, marked prominently as “Windows drivers”. My girlfriend’s laptop runs Windows Vista, and I decided to give it a chance, and plugged the webcam without further consideration. A message from our beloved OS informed me that a new device had been plugged (brilliant!) but Windows lacked the necessary drivers to make it work (bummer!). OK, no problem. We had the drivers, right? I unplugged the camera, inserted the CD, and followed the instructions to get the drivers installed. Everything went fine, except that the progress bar with the installation percent went on for more than 12 minutes (checked on the watch) before reaching 100%. After installation, Windows informed me that a system reboot was necessary, and so I did. After reboot, the camera would work.

As I had my Asus Eee at hand, I decided to try the webcam on it. I plugged it, and nothing happened. I just saw the green light on the camera turn on. Well, maybe it worked… I opened Cheese, a Linux program to show the output of webcams. I was a bit wary, because the Eee has an integrated webcam, so maybe there would be some interference or something. Not so. Cheese showed me immediately the output of the webcam I had just plugged, and offered me a menu with two entries (USB webcam and integrated one), so I could choose. That’s it. No CD with drivers, no 12-minute installation, no reboot, no nothing. Just plug and play.

Perhaps it is worth mentioning that the next time I tried to use the webcam on the Vista laptop, it would ask me for driver installation again! I don’t know why… I must have done something wrong in the first installation… With Windows, who knows?

Comments (10)

ChopZip: a parallel implementation of arbitrary compression algorithms

Remember plzma.py? I made a wrapper script for running [[LZMA]] in parallel. The script could be readily generalized to use any compression algorithm, following the principle of breaking the file in parts (one per CPU), compressing the parts, then [[tar (file format)|tarring]] them together. In other words, chop the file, zip the parts. Hence the name of the program that evolved from plzma.py: ChopZip.

Introduction

Currently ChopZip supports [[LZMA|lzma]], [[XZ Utils|xz]], [[gzip]] and lzip. Of them, lzip deserves a brief comment. It was brought to my attention by the a reader of this blog. It is based on the LZMA algorithm, as are lzma and xz. Apparently unlike them, multiple files compressed with lzip can be concatenated to form a single valid lzip-compressed file. Uncompressing the latter generates a concatenation of the formers.

To illustrate the point, check the following shell action:

% echo hello > head
% echo bye > tail
% lzip head
% lzip tail
% cat head.lz tail.lz > all.lz
% lzip -d all.lz
% cat all
hello
bye

However, I just discovered that all gzip, bzip2 and xz do that already! It seems that lzma is advertised as capable of doing it, but it doesn’t work for me. Sometimes it will uncompress the concatenated file to the original file just fine, others it will decompress it to just the first chunk of the set, yet other times it will complain that the “data is corrupt” and refuse to uncompress. For that reason, chopzip will accept two working modes: simple concatenation (gzip, lzip, xz) and tarring (lzma). The relevant mode will be used transparently for the user.

Also, if you use Ubuntu, this bug will apply to you, making it impossible to have xz-utils, lzma and lzip installed at the same time.

The really nice thing about concatenability is that it allows for trivial parallelization of the compression, while maintaining compatibility with the serial compression tool, which can still uncompress the product of a parallel compression. Unfortunatelly, for non-concatenatable compression formats, the output of chopzip will be a tar file of the compressed chunks, making it imposible to uncompress with the original compressor alone (first an untar would be needed, then uncompressing, then concatenation of chunks. Or just use chopzip to decompress).

The rationale behind plzma/chopzip is simple: multi-core computers are commonplace nowadays, but still the most common compression programs do not take advantage of this fact. At least the ones that I know and use don’t. There are at least two initiatives that tackle the issue, but I still think ChopZip has a niche to exploit. The most consolidated one is pbzip2 (which I mention in my plzma post). pbzip2 is great, if you want to use bzip2. It scales really nicely (almost linearly), and pbzipped files are valid bzip2 files. The main drawback is that it uses bzip2 as compression method. bzip2 has always been the “extreme” bother of gzip: compresses more, but it’s so slow that you would only resort to it if compression size is vital. LZMA-based programs (lzma, xz, lzip) are both faster, and even compress more, so for me bzip2 is out of the equation.

A second contender in parallel compression is pxz. As its name suggests, it compresses in using xz. Drawbacks? it’s not in the official repositories yet, and I couldn’t manage to compile it, even if it comprises a single C file, and a Makefile. It also lacks ability to use different encoders (which is not necessarily bad), and it’s a compiled program, versus chopzip, which is a much more portable script.

Scalability benchmark

Anyway, let’s get into chopzip. I have run a simple test with a moderately large file (a 374MB tar file of the whole /usr/bin dir). A table follows with the speedup results for running chopzip on that file, using various numbers of chunks (and consequently, threads). The tests were conducted in a 4GB RAM Intel Core 2 Quad Q8200 computer. Speedups are calculated as how many times faster did #chunks perform with respect to just 1 chunk. It is noteworthy that in every case running chopzip with a single chunk is virtually identical in performance to running the orginal compressor directly. Also decompression times (not show) were identical, irrespective of number of chunks. ChopZip version vas r18.

#chunks xz gzip lzma lzip
1 1.000 1.000 1.000 1.000
2 1.862 1.771 1.907 1.906
4 3.265 1.910 3.262 3.430
8 3.321 1.680 3.247 3.373
16 3.248 1.764 3.312 3.451

Note how increasing the number of chunks beyond the amount of actual cores (4 in this case) can have a small benefit. This happens because N equal chunks of a file will not be compressed with equal speed, so the more chunks, the smaller overall effect of the slowest-compressing chunks.

Conclusion

ChopZip speeds up quite noticeably the compression of arbitrary files, and with arbitrary compressors. In the case of concatenatable compressors (see above), the resulting compressed file is an ordinary compressed file, apt to be decompressed with the regular compressor (xz, lzip, gzip), as well as with ChopZip. This makes ChopZip a valid alternative to them, with the parallelization advantage.

Comments (6)

LWD – December 2009

This is a continuation post for my Linux World Domination project, started in this May 2008 post. You can read the previous post in the series here.

In the following data T2D means “time to domination” (the expected time for Windows/Linux shares to cross, counting from the present date). DT2D means difference (increase/decrease) in T2D, with respect to last report. CLP means “current Linux Percent”, as given by last logged data, and DD means domination day (in YYYY-MM-DD format), and DCLP means “difference in CLP”, with respect to last logged data. I have dropped the “Confidence” column, for it gave little or no info.

Project T2D DT2D DD CLP DCLP
Einstein already crossed September 2009 51.35 +4.24
MalariaControl >10 years 11.95 -0.32
POEM 83.4 months 2016-10-08 11.52 +0.69
PrimeGrid >10 years 10.31 +0.46
Rosetta >10 years 8.60 +0.10
QMC >10 years 8.23 +0.15
SETI >10 years 8.07 +0.05
Spinhenge >10 years 4.37 +0.15

Except for the good news that Einstein@home has succumbed to the Linux hordes, the numbers (again) seem quite discouraging, but the data is what it is. All CLPs but MalariaControl have gone up (which goes down less than in previous report). The Linux tide seems unstoppable, however its forward speed is not necessarily high.

As promised, today I’m showing the plots for Rosetta@home, in next issue Spinhenge@home.

Number of hosts percent evolution for Rosetta@home (click to enlarge)

Accumulated credit percent evolution for Rosetta@home (click to enlarge)

Comments (1)

Trivial use of md5sum

I just made use of the [[md5sum]] command in a rather simple situation which could have been more troublesome to handle with other means. The following scenario highlights, IMHO, how [[command line]] greatly simplifies some tasks.

I have a file file.txt, and a collection of files file.txt.N, where N = 1, 2, 3… I know that the former is a copy of one of the latter, but I don’t know which. I could have run [[diff]] on all the possible matches, but I would have had to run it for every N until a match was found. However, md5sum comes to rescue. I can just run:

% md5sum file.txt*

And check which file.txt.N has a MD5 signature equal to file.txt, so that one would be the match. This solution is still a bit annoying, because I have to visually search matches for a long string. Not to worry! Unix is our friend once again. Recover the above command with a single press to the “up” arrow, then extend the command a tiny bit:

% md5sum file.txt* | sort

Now, since the MD5 signatures are sorted, the match for our file.txt (if there is any), will appear right after the line for file.txt.

I challenge the reader to accomplish the same task as readily, comfortably and successfully in Windows or Mac, or in Linux without the command line.

Comments (4)

First impressions with Arch Linux

I have been considering for some time trying some Linux distro that would be a little faster than [[Ubuntu (operating system)|Ubuntu]]. I made the switch from [[Debian]] to Ubuntu some time ago, and I must say that I am very pleased with it, despite it being a bit bloated and slow. Ubuntu is really user-friendly. This term is often despised among geeks, but it does have a huge value. Often times a distro will disguise poor dependency-handling, lack of package tuning and absence of wise defaults as not having “fallen” for user-friendliness and “allowing the user do whatever she feels like”.

However comfortable Ubuntu might be, my inner geek wanted to get his hands a little bit dirtier with configurations, and obtain a more responsive OS in return. And that’s where [[Arch Linux]] fits in. Arch Linux is regarded as one of the fastest Linux distros, at least among the ones based on [[executable|binary]] [[Software package (installation)|packages]], not [[source code]]. Is this fame deserved? Well, in my short experience, it seem to be.

First off, let us clarify what one means with a “faster” Linux distro. There are as I see it, broadly speaking, three things that can be faster or slower in the users’ interaction with a computer. The first one, and very often cited one, is the [[booting|boot]] (and shutdown) time. Any period of time between a user deciding to use the computer and being able to do so is wasted time (from the user’s point of view). Many computers stay on for long periods of time, but for a home user, short booting times are a must. A second speed-related item would be the startup time of applications. Booting would be a sub-section of this, if we consider the OS/[[kernel (computing)|kernel]] as an “app”, but I refer here to user apps such as an e-mail client or text editor. Granted, most start within seconds at most, many below one second or apparently “instantly”, but some others are renowned for their slugginess ([[OpenOffice.org]], [[Firefox]] and [[Amarok (software)|Amarok]] come to mind). Even the not-very-slow apps that take a few seconds can become irritating if used with some frequency. The third speed-related item would be the execution of long-running CPU-intensive software, such as audio/video coding or scientific computation.

Of the three issues mentioned, it should be made clear that the third one (execution of CPU-intensive tasks) is seldom affected at all by the “speed” of the OS. Or it shouldn’t be. Of course having the latest versions of the libraries used by the CPU-intensive software should make a difference, but I doubt that encoding a video with [[MEncoder]] is any faster in [[Gentoo Linux|Gentoo]] than Ubuntu (for the same version of mencoder and libraries). However, the first two (booting and start up of apps) are different from OS to OS.

Booting

I did some timings in Ubuntu and Arch, both in the same (dual boot) machine. I measured the time from [[GRUB]] to [[GNOME Display Manager|GDM]], and then the time from GDM to a working [[desktop environment]] ([[GNOME]] in both). The exact data might not be that meaningful, as some details could be different from one installation to the other (different choice of firewall, or (minimally) different autostarted apps in the DE). But the big numbers are significant: where Ubuntu takes slightly below 1 minute to GDM, and around half a minute to GNOME, Arch takes below 20 seconds and 10 seconds, respectively.

App start up

Of the three applications mentioned, OpenOffice.org and Firefox start faster in Arch than in Ubuntu. I wrote down the numbers, but don’t have them now. Amarok, on the other hand, took equally long to start (some infamous 35 seconds) in both OSs. It is worth mentioning that all of them start up faster the second and successive times, and that the Ubuntu/Arch differences between second starts is correspondingly smaller (because both are fast). Still Arch is a bit faster (except for Amarok).

ABS, or custom compilation

But the benefits of Arch don’t end in a faster boot, or a more responsive desktop (which it has). Arch Linux makes it really easy to compile and install any custom package the user wants, and I decided to take advantage of it. With Debian/Ubuntu, you can download the source code of a package quite easily, but the compilation is more or less left to you, and the installation is different from that of a “official” package. With Arch, generating a package from the source is quite easy, and then installing it with [[Pacman (package manager)|Pacman]] is trivial. For more info, refer to the Arch wiki entry for ABS.

I first compiled MEncoder (inside the mplayer package), and found out that the compiled version made no difference with respect to the stock binary package. I should have known that, because I say so in this very post, don’t I? However, one always thinks that he can compile a package “better”, so I tried it (and failed to get any improvement).

On the other hand, when I recompiled Amarok, I did get a huge boost in speed. A simple custom compilation produced an Amarok that took only 15 seconds to start up, less than half of the vanilla binary distributed with Arch (I measured the 15 seconds even after rebooting, which rules out any “second time is faster” effect).

Is it hard to use?

Leaving the speed issue aside, one of the possible drawbacks of a geekier Linux distro is that it could be harder to use. Arch is, indeed, but not much. A seasoned Linux user should hardly find any difficulty to install and configure Arch. It is certainly not for beginners, but it is not super-hard either.

One of the few gripes I have with it regards the installation of a graphical environment. As it turns out, installing a DE such as GNOME does not trigger the installation of any [[X Window System]], such as [[X.org Server]], as dependencies are set only for really vital things. Well, that’s not too bad, Arch is not assuming I want something until I tell it I do. Fine. Then, when I do install Xorg, the tools for configuring it are a bit lacking. I am so spoiled by the automagic configurations in Ubuntu, where you are presented a full-fledged desktop with almost no decision on your side, that I miss a magic script that will make X “just work”. Anyway, I can live with that. But some thing that made me feel like giving up was that after following all the instruction in the really nice Arch Wiki, I was unable to start X (it would start as a black screen, then freeze, and I could only get out by rebooting the computer). The problem was that I have a [[Nvidia]] graphics card, and I needed the (proprietary) drivers. OK, of course I need them, but the default vesa driver should work as well!! In Ubuntu one can get a lower resolution, non-3D effect, desktop with the default vesa driver. Then the proprietary Nvidia drivers allow for more eye-candy and fanciness. But not in Arch. When I decided to skip the test with vesa, and download the proprietary drivers, the X server started without any problem.

Conclusions

I am quite happy with Arch so far. Yes, one has to work around some rough edges, but it is a nice experience as well, because one learns more than with other too user-friendly distros. I think Arch Linux is a very nice distro that is worth using, and I recommend it to any Linux user willing to learn and “get hands dirty”.

Comments (3)

No market for Linux games? The Koonsolo case

I’ve read via Phoronix the case of the indie PC game producer Koonsolo, which sells a game for both Windows, Mac and Linux. The interesting thing is that, as you can read on Koonsolo’s blog, the Linux version is being sold in larger numbers than the Windows one!

Apparently a 40% of the visitors or the Koonsolo site use Windows, vs less than 23% for Linux. However, despite the majority of visitors using Windows (there are even more Mac visitors than Linux ones), the Linux version sales amount to a 34% of the total sales, whereas Windows sales are only 23%. Visit the site for some more numbers and comments.

Comments

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »