Avoiding time_increment_bits problem when encoding bad header MPEG4 videos to Ogg Theora

There is some debate going on lately about the migration of YouTube to [[HTML5]], and whether they (i.e. YouTube’s owner, Google) should support [[H.264]] or [[Theora]] as standard codecs for the upcoming <video> tag. See, for example, how the FSF asks for support for Theora.

The thing is, I discovered [[x264]] not so long ago, and I thought it was a “free version” of H.264. I began using it to reencode the medium-to-low quality videos I keep (e.g., movies and series). The resulting quality/file size ratio stunned me. I could reencode most material downloaded from e.g. p2p sources to 2/3 of their size, keeping the copy indistinguishable from the original with the bare eye.

However, after realizing that x264 is just a free implementation of the proprietary H.264 codec, and in the wake of the H.264/Theora debate, I decided to give Ogg Theora a go. I expected a fair competitor to H.264, although still noticeably behind in quality/size ratio. And that I found. I for one do not care if I need a 10% larger file to attain the same quality, if it means using free formats, so I decided to adopt Theora for everyday reencoding.

After three paragraphs of introduction, let’s get to the point. Which is that reencoding some files with [[ffmpeg2theora]] I would get the following error:

% ffmpeg2theora -i example_video.avi -o output.ogg
[avi @ 0x22b7560]Something went wrong during header parsing, I will ignore it and try to continue anyway.
[NULL @ 0x22b87f0]hmm, seems the headers are not complete, trying to guess time_increment_bits
[NULL @ 0x22b87f0]my guess is 15 bits ;)
[NULL @ 0x22b87f0]looks like this file was encoded with (divx4/(old)xvid/opendivx) -> forcing low_delay flag
Input #0, avi, from 'example_video.avi':
  Metadata:
    Title           : example_video.avi
  Duration: 00:44:46.18, start: 0.000000, bitrate: 1093 kb/s
    Stream #0.0: Video: mpeg4, yuv420p, 624x464, 23.98 tbr, 23.98 tbn, 23.98 tbc
    Stream #0.1: Audio: mp3, 48000 Hz, 2 channels, s16, 32 kb/s
  .

[mpeg4 @ 0x22b87f0]hmm, seems the headers are not complete, trying to guess time_increment_bits
[mpeg4 @ 0x22b87f0]my guess is 16 bits ;)
[mpeg4 @ 0x22b87f0]hmm, seems the headers are not complete, trying to guess time_increment_bits
[mpeg4 @ 0x22b87f0]my guess is 16 bits ;)
[mpeg4 @ 0x22b87f0]looks like this file was encoded with (divx4/(old)xvid/opendivx) -> forcing low_delay flag
    Last message repeated 1 times
[mpeg4 @ 0x22b87f0]warning: first frame is no keyframe

I searched the web for solutions, but to no avail. Usually pasting literal errors in Google yields good results, but in this case I only found developer forums where this bug was discussed. What I haven’t found is simple instructions on how to avoid it in practice.

Well, here it goes my simple solution: pass it through [[MEncoder]] first. Where the following fails:

% ffmpeg2theora -i input.avi -o output.ogg

the following succeeds:

% mencoder input.avi -ovc copy -oac copy -o filtered.avi
% ffmpeg2theora -i filtered.avi -o output.ogg

I guess that what happens is basically that mencoder takes the “raw” video data in input.avi and makes a copy into filtered.avi (which ends up being exactly the same video), building sane headers in the process.

Comments (3)

MediaWiki: URL beautification HowTo

The default [[MediaWiki]] installation will leave you with [[URL]]s of the type:

http://mywiki.site.tld/wiki/wiki/index.php?title=Article_name

This is ugly! Following instructions at the MediaWiki.org site, you can make it simpler and nicer:

http://mywiki.site.tld/wiki/Article_name

To achieve that, add the following to /etc/apache2/httpd.conf:

AcceptPathInfo On
Alias /wiki /usr/share/mediawiki/index.php

Then add/modify the following at /var/lib/mediawiki/LocalSettings.php (again, Debian default path):

$wgScriptPath = '/w'; # Path to the actual files. This should already be there
$wgArticlePath = '/wiki/$1'; # This directory MUST be different from $wgScriptPath
$wgUsePathInfo = true;

Recall that you must have two “directories”, which in the example above are /w and /wiki. The former is “real” and the latter is “virtual”.

The real dir (the one used as value for $wgScriptPath) must contain the MediaWiki files, thus it must point to the /usr/share/mediawiki dir. To this end, it must either exist in the [[Apache HTTP Server|Apache]] root (usually /var/www/), or be an alias. If you follow the first route, you can make a link, like in the following example:

% ln -s /usr/share/mediawiki /var/www/w

The second route would imply adding this line to /etc/apache2/httpd.conf:

Alias /w /usr/share/mediawiki

The latter requires restarting the Apache daemon, but I personally prefer it.

The virtual dir (the one used as value for $wgArticlePath) will be our path to get rid of the URL ugliness, and point directly to an article’s title. As such, it must be aliased in /etc/apache2/httpd.conf adding the following line to it, as mentioned above:

Alias /wiki /usr/share/mediawiki/index.php

Finally, you shold enable the rewrite PHP module, if it’s not enable already, and reload Apache:

% cd /etc/apache2/mods-enable/
% ln -s ../mods-available/rewrite.load .
% /etc/init.d/apache2 reload

After that, pointing to website/wiki/somearticle should lead you to the wiki page for somearticle. For more information, refer to the MediaWiki.org site.

Comments (2)

SpyPig: another annoyance against your privacy

I’ve read in a post in Genbeta [es], about a “service” for e-mail senders called SpyPig. It basically boils down to sending a notification to the sender of an e-mail, when the recipient opens it. This way, the recipient can not say that she hasn’t read it.

I will deal with two issues: moral and technological. Morally, I think this kind of things suck. I have received these e-mails asking for confirmation of having been read, and I never found appealing to answer. But at least you were asked politely. What these pigs SpyPigs do is provide a sneaky way of doing it without the recipient knowing. Would you consider someone doing it on you a friend? Not me.

Now, technologically, the system is more than simple, and anyone with access to a web server could do it. The idea is that the sender writes the e-mail in HTML mode, and inserts a picture (can be a blank image) hosted at some SpyPig server. When the recipient opens the HTML message, the image is loaded from the server, and the logs of the server will reflect when the image was loaded, and hence the e-mail opened. When this happens, the server notifies the sender.

The bottom line of this story is that HTML IS BAD for e-mails. My e-mail readers never allow displaying HTML messages, and show me the source HTML code instead (of course, I can allow HTML, but why would I?). So this SpyPig thing will never work for against me. And this SpyPig story is just one more reason not to allow displaying HTML in the messages you read. Of course, for the e-mails you send, consider sending them in plain text. Your recipients will be a bit happier.

For more tips on what NOT to do on web/e-mail issues, check the e-mail/web tips section in this blog.

Comments (5)

Tips for creating Web content

From my own site at ehu.es/isilanes.

Comply with the standards

Much like in spoken languages, Internet information exchange requires a common language, understood by everyone. In this case, our browsers will be the ones making the translation from that language (HTML) into images, colors and human-readable text. Much like spoken languages, there is an “Academy” taking care of what is and what is not correct. In this case, the academy is the World Wide Web Consortium (W3C).

Much like in spoken languages, HTML evolves and changes, but when changes are not incorporated in the standards, misunderstandings happen. To assure a Web page is correctly displayed by any browser, first standards are encouraged (I wish they could be enforced), then standard-compliant browsers are made. The Web should not be designed for a specific browser, rather browsers should be made to comply with openly known standards. For a discussion on that, jump to the Viewable with any browser site.

If you want to make your site as widely viewable as possible, and keep your visitors happy, you should really follow the W3C standards, because this way you really know that any standard-complying browser will display the page correctly. Is making a standard-compliant page difficult? Not really. First, you could follow the Accessible design guide at the Viewable with any browser site. Then, learning some HTML programming could help. Finally, you are encouraged to put a “W3C correct HTML” button on the product page, as you can see I have done on my ehu.es/isilanes page (orange buttons on the left hand side above). This page, for example, has been correctly coded in HTML, and its CSS is also correct, as you can test clicking the aforementioned buttons.

The code for the HTML-correctness verification:

<a href="http://validator.w3.org/check?uri=referer"><img
    src="http://www.w3.org/Icons/valid-html401"
    alt="Valid HTML 4.01 Transitional" height="31" width="88"></a>

The CSS button:

<a href="http://jigsaw.w3.org/css-validator/">
  <img style="border:0;width:88px;height:31px"
         src="http://jigsaw.w3.org/css-validator/images/vcss"
         alt="Valid CSS!"></a>

Recall that you can put the above buttons in your pages at early stages of page creation (when they are still incorrect), and use them yourself to see if what you have done so far is W3C-compliant. The resulting validation page (saying “OK” or “Not OK”), usually explains the errors you might have done rather understandably, and help in fixing them.

Minimize the size

Every time a web page is visited, the client (the browser of the visiting person) has to download the contents of the page to be able to display them. The more fancy pictures there are in your page, the longer it’ll take to load. The more crap that the client has to download, the slower the visiting experience for that person. If a page takes too long to load in your browser, what do you do? Exactly, you quit and go somewhere else. Human attention time span is short, and more so in the Internet, so don’t ask your visitors for the patience you didn’t have when making the page in first place.

Avoid Java, JavaScript and Flash

The problem here is dual: size and accessibility. The problem of the size is summarized in the previous section.

The issue with the accessibility is related to the fact that you should ask your visitors for as few resources as possible to view your content. If they need to get and install some fancy software to access your functionality, then this fact might discourage them and make them go away.

Remember always that it is your task to make it easy to access your content, not the visitors’ to find the suitable tools for that. Use JS or Flash if you absolutely need to, but only if you absolutely need to. Usually a clever use of plain HTML resources will give satisfactory results, and will be much more visitor-friendly.

Avoid proprietary formats

Innocuous as they might look, formats like MP3 and GIF are patent-encumbered, which means that their use should/could be restricted by the patent holders. Also, using them forces the browser (in the case of GIFs) and MP3 player makers to comply with restrictive patent requirements (like e.g. paying royalties).

The best way to eliminate software patents is to dump patented material altogether. Use Ogg Vorbis format to encode your music, and substitute your GIFs for PNGs. Recall that patented software is illegal to modify, improve or have security holes/bugs patched by third parties, without permission of patent holders, which makes the openly developed formats evolve much faster, and eventually become better.

For specific reasons to dump MP3s and GIFs, see the Wikipedia pages for PNG and Ogg Vorbis. In short: CompuServe developed the GIF format without knowing that the LZW compression algorithm it used was patented (by Unisys). Later, after GIF became popular, Unisys announced that they’d start enforcing the patent (charge royalties) to commercial programs capable of displaying GIFs. Something similar happened to MP3 and the Fraunhofer Society.

With Open formats you’ll never have any such problem, and the visitors to your page will never have to pay royalties for programs capable of displaying the contents of your site.

Comments