Please, choose the right format to send me that text. Thanks.

I just received an e-mail with a very interesting text (recipies for [[Pincho|pintxos]]), and it prompted some experiment. The issue is that the text was inside of a [[DOC (computing)|DOC]] file (of course!), which rises some questions and concerns on my side. The size of the file was 471 kB.

I thought that one could make the document more portable by exporting it to [[PDF]] (using [[OpenOffice.org]]). Doing so, the resulting file has a size of 364 kB (1.29 times smaller than the original DOC).

Furthermore, text formatting could be waived, by using a [[plain text]] format. A copy/paste of the contents of the DOC into a TXT file yielded a 186 kB file (2.53x smaller).

Once in the mood, we can go one step further, and compress the TXT file: with [[gzip]] we get a 51 kb file (9.24x), and with [[xz]] a 42 kB one (11.2x)

So far, so good. No surprise. The surprise came when, just for fun, I exported the DOC to [[OpenDocument|ODT]]. I obtained a document equivalent to the original one, but with a 75 kB size! (6.28x smaller than the DOC).

So, for summarizing:

DOC

Pros

  • Editable.
  • Allows for text formatting.

Cons

  • Proprietary. In principle only MS Office can open it. OpenOffice.org can, but because of reverse engineering.
  • If opened with OpenOffice.org, or just a different version of MS Office, the reader can not be sure of seeing the same formatting the writer intended.
  • Size. 6 times bigger than ODT. Even bigger than PDF.
  • MS invented and owns it. You need more reasons?

PDF

Pros

  • Portability. You can open it in any OS (Windows, Linux, Mac, BSD…), on account of there being so many free PDF readers.
  • Smaller than the DOC.
  • Allows for text formatting, and the format the reader sees will be exactly the one the writer intended.

Cons

  • Not editable (I really don’t see the point in editing PDFs. For me the PDF is a product of an underlying format (e.g. LaTeX), as what you see on your browser is the product of some HTML/PHP, or an exe is the product of some source code. But I digress.)
  • Could be smaller

TXT

Pros

  • Portability. You can’t get much more portable than a plain text file. You can edit it anywhere, with your favorite text editor.
  • Size. You can’t get much smaller than a plain text file (as it contains the mere text content), and you can compress it further with ease.

Cons

  • Formatting. If you need text formatting, or including pictures or content other than text, then plain text is not for you.

ODT

Pros

  • Portability. It can be edited with OpenOffice.org (and probably others), which is [[free software]], and has versions for Windows, Linux, and Mac.
  • Editability. Every bit as editable as DOC.
  • Size. 6 times smaller files than DOC.
  • It’s a free standard, not some proprietary rubbish.

Cons

  • None I can think of.

So please, if you send me some text, first consider if plain text will suffice. If not, and no edition is intended on my side, PDF is fine. If edition is important (or size, because it’s smaller than PDF), the ODT is the way to go.

Comments (7)

Avoiding time_increment_bits problem when encoding bad header MPEG4 videos to Ogg Theora

There is some debate going on lately about the migration of YouTube to [[HTML5]], and whether they (i.e. YouTube’s owner, Google) should support [[H.264]] or [[Theora]] as standard codecs for the upcoming <video> tag. See, for example, how the FSF asks for support for Theora.

The thing is, I discovered [[x264]] not so long ago, and I thought it was a “free version” of H.264. I began using it to reencode the medium-to-low quality videos I keep (e.g., movies and series). The resulting quality/file size ratio stunned me. I could reencode most material downloaded from e.g. p2p sources to 2/3 of their size, keeping the copy indistinguishable from the original with the bare eye.

However, after realizing that x264 is just a free implementation of the proprietary H.264 codec, and in the wake of the H.264/Theora debate, I decided to give Ogg Theora a go. I expected a fair competitor to H.264, although still noticeably behind in quality/size ratio. And that I found. I for one do not care if I need a 10% larger file to attain the same quality, if it means using free formats, so I decided to adopt Theora for everyday reencoding.

After three paragraphs of introduction, let’s get to the point. Which is that reencoding some files with [[ffmpeg2theora]] I would get the following error:

% ffmpeg2theora -i example_video.avi -o output.ogg
[avi @ 0x22b7560]Something went wrong during header parsing, I will ignore it and try to continue anyway.
[NULL @ 0x22b87f0]hmm, seems the headers are not complete, trying to guess time_increment_bits
[NULL @ 0x22b87f0]my guess is 15 bits ;)
[NULL @ 0x22b87f0]looks like this file was encoded with (divx4/(old)xvid/opendivx) -> forcing low_delay flag
Input #0, avi, from 'example_video.avi':
  Metadata:
    Title           : example_video.avi
  Duration: 00:44:46.18, start: 0.000000, bitrate: 1093 kb/s
    Stream #0.0: Video: mpeg4, yuv420p, 624x464, 23.98 tbr, 23.98 tbn, 23.98 tbc
    Stream #0.1: Audio: mp3, 48000 Hz, 2 channels, s16, 32 kb/s
  .

[mpeg4 @ 0x22b87f0]hmm, seems the headers are not complete, trying to guess time_increment_bits
[mpeg4 @ 0x22b87f0]my guess is 16 bits ;)
[mpeg4 @ 0x22b87f0]hmm, seems the headers are not complete, trying to guess time_increment_bits
[mpeg4 @ 0x22b87f0]my guess is 16 bits ;)
[mpeg4 @ 0x22b87f0]looks like this file was encoded with (divx4/(old)xvid/opendivx) -> forcing low_delay flag
    Last message repeated 1 times
[mpeg4 @ 0x22b87f0]warning: first frame is no keyframe

I searched the web for solutions, but to no avail. Usually pasting literal errors in Google yields good results, but in this case I only found developer forums where this bug was discussed. What I haven’t found is simple instructions on how to avoid it in practice.

Well, here it goes my simple solution: pass it through [[MEncoder]] first. Where the following fails:

% ffmpeg2theora -i input.avi -o output.ogg

the following succeeds:

% mencoder input.avi -ovc copy -oac copy -o filtered.avi
% ffmpeg2theora -i filtered.avi -o output.ogg

I guess that what happens is basically that mencoder takes the “raw” video data in input.avi and makes a copy into filtered.avi (which ends up being exactly the same video), building sane headers in the process.

Comments (3)

Accessing Linux ext2/ext3 partitions from MS Windows

Accessing both Windows [[File Allocation Table|FAT]] and [[NTFS]] file systems from Linux is quite easy, with tools like [[NTFS-3G]]. However (following with the [[shit|MS]] tradition of making itself incompatible with everything else, to thwart competition), doing the opposite (accessing Linux file systems from Windows) is more complicated. One would have to guess why (and how!) [[closed source software|closed]] and [[proprietary software|proprietary]] and technically inferior file systems can be read by free software tools, whereas proprietary software with such a big corporation behind is incapable (or unwilling) to interact with superior and [[free software]] file systems. Why should Windows users be deprived of the choice over [[JFS (file system)|JFS]], [[XFS]] or [[ReiserFS]], when they are free? MS techs are too dumb to implement them? Or too evil to give their users the choice? Or, maybe, too scared that if choice is possible, their users will dump NTFS? Neither explanation makes one feel much love for MS, does it?

This stupid inability of Windows to read any of the many formats Linux can use gives rise to problems for not only Windows users, but also Linux users. For example, when I format my external hard disks or pendrives, I end up wondering if I should reserve some space for a FAT partition, so I could put there data to share with hypothetical Windows users I could lend the disk to. And, seriously, I abhor wasting my hardware with such lousy file systems, when I could use Linux ones.

Anyway, there are some third-party tools to help us which such a task. I found at least two:

I have used the first one, but as some blogs point out (e.g. BloggUccio), ext2fsd is required if the [[inode]] size is bigger than 128 B (256 B in some modern Linux distros).

Getting Ext2IFS

It is a simple exe file you can download from fs-driver.org. Installing it consists on the typical windows next-next-finish click-dance. In principle the defaults are OK. It will ask you about activating “read-only” (which I declined. It’s less safe, but I would like to be able to write too), and something about large file support (which I accepted, because it’s only an issue with Linux kernels older than 2.2… Middle Age stuff).

Formatting the hard drive

In principle, Ext2IFS can read ext2/ext3 partitions with no problem. In practice, if the partition was created with an [[inode]] size of more than 128 bytes, Ext2IFS won’t read it. To create a “compatible” partition, you can mkfs it with the -I flag, as follows:

# mkfs.ext3 -I 128 /dev/whatever

I found out about the 128 B inode thing from this forum thread [es].

Practical use

What I have done, and tested, is what follows: I format my external drives with almost all of it as ext3, as described, leaving a couple of gigabytes (you could cut down to a couple of megabytes if you really want to) for a FAT partition. Then copy the Ext2IFS_1_11a.exe executable to that partition.

Whenever you want to use that drive, Linux will see two partitions (the ext3 and the FAT one), the second one of which you can ignore. From Windows, you will see only a 2GB FAT partition. However, you will be able to open it, find the exe, double-click, and install Ext2IFS. After that, you can unplug the drive and plug it again…et voilà, you will see the ext3 partition just fine.

Comments (2)

Microsoft and MP3 patents

I read in the Diario Vasco newspaper (online article[es]), that Microsoft has been recently sued by Alcatel-Lucent over some MP3 patent infringement, and found liable for a fine of around 1.100M euro. MS alleges that they did pay the Fraunhofer Society $16M for these very rights.

All this rubbish is typical of Microsoft and their obsolete proprietary model. They are the dinosaurs of the XXI century. MS could include Ogg Vorbis support in their music player(s), and forget about patent issues, since Ogg Vorbis is open and free. However, all the friends I have that use MS Windows complain if I share music in Vorbis format with them, and I am forced to convert it to MP3 (actually MP2 Layer 2, not Layer 3, which is the patented one) if I want them to listen to it.

The choice of MP3 is an unfortunate one, because it traps MS in a legal nightmare of patents and licenses, yet they’d rather face it than switch to something that “stinks” of freedom. One more example of the absurd ways of the Redmond smartasses.

Comments (2)

My public and open University II (es)

Copy-paste de un e-mail que recibido de la UPV/EHU, y la respuesta que he mandado. Para información adicional, leer anterior post (en).

Estimada XXXX,

No es cierto que me sea imposible remitir la ficha Teseo en formáto electrónico. De hecho, ya se la he mandado a uds., tanto en RTF como en DOC, como en texto plano y en un PDF escaneado. La Normativa, si no me equivoco, señala que debe cumplimentarse y enviarse dicha ficha, no que deba hacerse en cierto formato concreto (RTF).

El problema es que el RTF (el único formato que uds. hacen accesible), es un formato PRIVATIVO, que solo es correctamente leido por programas NO LIBRES, como Microsoft Word, bajo Microsoft Windows. He hecho todo lo que he podido para leer ese RTF correctamente con programas libres (OpenOffice), y el resultado (lamentable) es lo que mandé a YYYY (doc17_Teseo.rtf), el 19 de diciembre.

Aunque dispuesto a poner buena voluntad, no deseo utilizar productos privados, con licencias abusivas y precios elevados, para acceder a material de una Universidad PÚBLICA, finaciada con dinero PÚBLICO, de todos los contribuyentes. Para el intercambio de ficheros como ese, ya existe un estándar abierto y libre (ISO/IEC 26300), que es el Open Document Format:

http://es.wikipedia.org/wiki/OpenDocument

Me causan uds. una molestia tremenda al no poder acceder libremente al material de la Universidad en la que llevo años estudiando, y que mis conciudadanos y yo pagamos con impuestos.

Cediendo a sus presiones, he accedido a un ordenador con Windows y he generado el RTF que les mando. Compruebo consternado que ni siquiera con Windows el formateo del fichero es correcto, y algunas cosas salen fuera de sitio.

Lamento esta circunstancia, y reitero mi interés en hacer las cosas lo más cómodamente posible, también para uds. Pero también reitero que no estoy dispuesto a ceder mis derechos, como uds. comprenderán.

Finalmente, desearía me informaran de una dirección electrónica a donde dirigirme para quejarme de esta lamentable actitud de la UPV/EHU, ya que si la gente traga y nadie se queja (como uds. bien señalan), parece que no hay problema, y este se perpetuará.

Atentamente,

Iñaki Silanes

On Monday 08 January 2007 10:03, you wrote:
> Estimado Iñaki,
>
> Habiéndose recibido en esta Sección de Master y Doctorado, correo
> electrónico en el que nos indica que le es imposible remitir la ficha
> Teseo correspondiente en un formato electrónico, lamento comunicarle
> que esto es necesario, tal y como indica el artículo 51 de la Normativa
> de Gestión de Doctorado. Hasta el momento, no se han recibido quejas
> sobre la imposiblidad de rellenar dicha ficha Teseo, ni por por
> personal propio de la UPV/EHU, ni por aquellos alumnos ajenos a ésta,
> recibiéndose en todos los casos. Por lo tanto, le solicito que remita
> este documento.
>
> Atentamente
>
> XXXX
> Sección Master y Doctorado

Comments

French National Assembly Embraces Open Source

Finishing my Ph.D. Thesis really hindered my touch with news, so here it goes, with 2 months’ delay: The French National Assembly switched to GNU/Linux. I read about it in Menéame.net, referring to Barrapunto, referring to Slashdot, referring to PC Advisor, who covers the new (usual cycle of news for the Spanish audience, reading mostly Menéame.net).

The French Goverment also said that all Gov. documents should be available in ODF. This nicely relates to the shameful case of the UPV/EHU (my University).

The Spanish Congress, meanwhile, is still hooked to Windows XP.

Comments