File compression: gzip vs. bzip2

I just found out that my regular backups at a couple of computers are filling up the corresponding disks (for the Spanish readers: ¡están petaos!), and I realized that it is because I was keeping a bunch of 200MB files uncompressed. Since the files are ASCII, full of numbers, most of which are actually zeros, they are perfect candidates for compressing them with tools like gzip or bzip2. Everybody knows that the latter is more efficient, but slower, so I made a small comparison:

Original file: 211MB
gzip: 4.5MB in 11 s (compress), 6.5 s (uncompress)
bzip2: 2.4MB in 1323 s (compress), 27 s (uncompress)

Yes, the compression with bzip2 is impressing: 88x compression, where gzip gets 47x (almost a 90% better compression). But the timing is poor: bzip2 is 120 times slower than gzip. For uncompression, bzip2 fares better: “only” 4 times slower than gzip. Where gzip can uncompress a file in about half the time it took to compress, bzip2 does the same almost 50 times faster (because compressing was soooo slow).

This case is anecdotal, but it nicely illustrates my experience in general.

Leave a Comment