Please, choose the right format to send me that text. Thanks.

I just received an e-mail with a very interesting text (recipies for [[Pincho|pintxos]]), and it prompted some experiment. The issue is that the text was inside of a [[DOC (computing)|DOC]] file (of course!), which rises some questions and concerns on my side. The size of the file was 471 kB.

I thought that one could make the document more portable by exporting it to [[PDF]] (using [[OpenOffice.org]]). Doing so, the resulting file has a size of 364 kB (1.29 times smaller than the original DOC).

Furthermore, text formatting could be waived, by using a [[plain text]] format. A copy/paste of the contents of the DOC into a TXT file yielded a 186 kB file (2.53x smaller).

Once in the mood, we can go one step further, and compress the TXT file: with [[gzip]] we get a 51 kb file (9.24x), and with [[xz]] a 42 kB one (11.2x)

So far, so good. No surprise. The surprise came when, just for fun, I exported the DOC to [[OpenDocument|ODT]]. I obtained a document equivalent to the original one, but with a 75 kB size! (6.28x smaller than the DOC).

So, for summarizing:

DOC

Pros

  • Editable.
  • Allows for text formatting.

Cons

  • Proprietary. In principle only MS Office can open it. OpenOffice.org can, but because of reverse engineering.
  • If opened with OpenOffice.org, or just a different version of MS Office, the reader can not be sure of seeing the same formatting the writer intended.
  • Size. 6 times bigger than ODT. Even bigger than PDF.
  • MS invented and owns it. You need more reasons?

PDF

Pros

  • Portability. You can open it in any OS (Windows, Linux, Mac, BSD…), on account of there being so many free PDF readers.
  • Smaller than the DOC.
  • Allows for text formatting, and the format the reader sees will be exactly the one the writer intended.

Cons

  • Not editable (I really don’t see the point in editing PDFs. For me the PDF is a product of an underlying format (e.g. LaTeX), as what you see on your browser is the product of some HTML/PHP, or an exe is the product of some source code. But I digress.)
  • Could be smaller

TXT

Pros

  • Portability. You can’t get much more portable than a plain text file. You can edit it anywhere, with your favorite text editor.
  • Size. You can’t get much smaller than a plain text file (as it contains the mere text content), and you can compress it further with ease.

Cons

  • Formatting. If you need text formatting, or including pictures or content other than text, then plain text is not for you.

ODT

Pros

  • Portability. It can be edited with OpenOffice.org (and probably others), which is [[free software]], and has versions for Windows, Linux, and Mac.
  • Editability. Every bit as editable as DOC.
  • Size. 6 times smaller files than DOC.
  • It’s a free standard, not some proprietary rubbish.

Cons

  • None I can think of.

So please, if you send me some text, first consider if plain text will suffice. If not, and no edition is intended on my side, PDF is fine. If edition is important (or size, because it’s smaller than PDF), the ODT is the way to go.

7 Comments »

  1. sylvainulg said,

    April 13, 2010 @ 11:35 am

    I haven’t launched MS office any recently, but one obvious drawback of ODT format immediately appears to me : it’s heavy piece of software for just reading a text. Beside that, would you find an ODT reader e.g. on your pocket device ? can you safely forward the file to anyone and be sure (s)he will read it finely as well ?

    Any reason why you haven’t mentioned web content at all in the comparison ?

  2. isilanes said,

    April 13, 2010 @ 12:23 pm

    Well, putting the text online (e.g. a blog) and then sending just the link is a nice option, as well. But not all content is suitable for that. The subject matter here is a 94-page recipe book, comprising just text, and being shared as a digital file. It is not particularly private, but other similar texts could be (putting it online is not good for that). Besides, putting the text online as HTML is hardly any different from sending a plain text file, except for the formatting. The same would go for LaTeX, which I had considered to mention. But there is no reasonable way I am asking my friends to send me material in LaTeX or HTML (much less to have a web server where they can put the HTML), and besides any reader of this post who would consider LaTeX or HTML (to keep the file size small and allow for text formatting), does not really need to be told that DOC is a bad idea.

    About ODT being overkill, recall that I recommend to first consider plain text. ODT is recommended for when plain text will not do (e.g., formatting is important, or includes rich material such as tables and images). If that is the case, but extreme portability is compulsory (your reader will need to read it in her pocket device), then PDF might be your option (or HTML). In my example, I really don’t expect the recipients of the e-mail to resort to their pocket devices to read a 94-page recipe book. If they wanted to, they could convert it the way they see fit.

    My main point is that DOC is a bad idea always. There is no use case where DOC would be recommended for any reason. If you want a “direct” substitute, use ODT. If you are open to alternatives, there are other ones, depending on what you want.

  3. Greghor said,

    October 4, 2010 @ 7:42 am

    Hi, glad to visit you..

  4. Super Jamie said,

    November 7, 2011 @ 16:27 pm

    You forgot RTF! Everyone does these days. I think it’s a good medium between plaintext and a full-blown ODT/DOC file. It used to be a closed, proprietary format but Microsoft have published it for years now. It would be nice if they opened it up, it’s not like it’ll lose them any money. WordPad under Windows supports RTF by default but it would still require a “heavy” word processor in Linux so we may as well use ODT on there.

    @sylvainulg: You can use Google Docs to view and edit ODT files if you don’t wish to install OpenOffice.org/LibreOffice.

  5. Marcello said,

    June 4, 2012 @ 9:36 am

    I think it would be fair to put the need of installing OpenOffice or one of its youger variants as a con of the ODT format.
    Of course it’s free, and it’s really just a ca. 150 MB download away, but stil…

    Excellent post, btw.

  6. isilanes said,

    June 4, 2012 @ 9:52 am

    Thanks, Marcello. To be fair, I haven’t considered download size as a con of DOC or PDF either. It also depends on your OS of choice. I don’t think Windows has a built-in PDF reader, and most people will download Acrobat, which is quite fat, I believe. Windows doesn’t actually include MS Office either, so you must purchase it separately. On the other hand most Linux distros have quite light PDF readers included (e.g. most GNOMEs come with evince), and some even include LibreOffice in the installation CD (Ubuntu does, if I’m not mistaken).

  7. Marcello said,

    June 5, 2012 @ 16:26 pm

    Hmmm… I think you’re perfectly right. I must admit when writing my previous comment I was thinking about all the times I had to download and install OOo on friends’ or co-workers’ computers… But I should have thought instead of all the times I had to re-install windows xp, and the length of the usual checklist: AV, Firefox, OOo, Acrobat Reader, Gimp or similar… (*) Whereas with Linux (that’s Ubuntu for me, since a couple of years) it’s essentially: install OS, done. Yes, Ubuntu does include LibO (they had OOo in previous releases).

    (*) and that’s because I’m a FLOSS fan, otherwise that list would’ve been way more “illegal” (how many people have MSO and Photoshop just because that’s the only two “names” they (or their tech-friend) know ?) – and what about the OS license ?

    Thanks for making me think again about my comment.

RSS feed for comments on this post · TrackBack URI

Leave a Comment