Save HD space by using compressed files directly

Maybe the constant increases in hard disk capacity provide us with more space we can waste with our files, but there is always a situation in which we would like to squeeze as much data in as little space as possible. Besides, it is always a good practice to keep disk usage as low as possible, just for tidiness.

The first and most important advice for saving space: for $GOD’s sake, delete the stuff you don’t need!

Now, assuming you want to keep all you presently have, the second tool is [[data compression]]. Linux users have long time friends in the [[gzip]] and [[bzip2]] commands. One would use the former for fast (and reasonably good) compression, and the latter for when saving space is really vital (although bzip2 is really slow). A more recent entry in the “perfect compression tool” contest would be [[Lempel-Ziv-Markov chain algorithm]] (LZMA). This one can compress even more than bzip2, being usually faster (although never as fast as gzip).

One problem with compression is that it is a good way of storing files, but they usually have to be uncompressed to modify, and then re-compressed, and this is very slow. However, we have some tools to interact with the compressed files directly (internally decompressing “on the fly” only the part that we need to edit). I would like to just mention them here:

Shell commands

We can use zcat, zgrep and zdiff as replacements for cat, grep and diff, but for gzipped files. These account for a huge fraction of all the interaction I do with text files from the command line. If you are like me, they can save you tons of time.

Vim

[[Vim (text editor)|Vim]] can be instructed to open some files making use of some decompression tool, to show the contents of the file and work on them transparently. Once we :wq out of the file, we will get the original compressed file. The speed to do this cycle is incredibly fast: almost as fast as opening the uncompressed file, and nowhere near as slow as gunzipping, viming and gzipping sequentially.

You can add the following to your .vimrc config file for the above:

" Only do this part when compiled with support for autocommands.
if has("autocmd")

 augroup gzip
  " Remove all gzip autocommands
  au!

  " Enable editing of gzipped files
  " set binary mode before reading the file
  autocmd BufReadPre,FileReadPre	*.gz,*.bz2,*.lz set bin

  autocmd BufReadPost,FileReadPost	*.gz call GZIP_read("gunzip")
  autocmd BufReadPost,FileReadPost	*.bz2 call GZIP_read("bunzip2")
  autocmd BufReadPost,FileReadPost	*.lz call GZIP_read("unlzma -S .lz")

  autocmd BufWritePost,FileWritePost	*.gz call GZIP_write("gzip")
  autocmd BufWritePost,FileWritePost	*.bz2 call GZIP_write("bzip2")
  autocmd BufWritePost,FileWritePost	*.lz call GZIP_write("lzma -S .lz")

  autocmd FileAppendPre			*.gz call GZIP_appre("gunzip")
  autocmd FileAppendPre			*.bz2 call GZIP_appre("bunzip2")
  autocmd FileAppendPre			*.lz call GZIP_appre("unlzma -S .lz")

  autocmd FileAppendPost		*.gz call GZIP_write("gzip")
  autocmd FileAppendPost		*.bz2 call GZIP_write("bzip2")
  autocmd FileAppendPost		*.lz call GZIP_write("lzma -S .lz")

  " After reading compressed file: Uncompress text in buffer with "cmd"
  fun! GZIP_read(cmd)
    let ch_save = &ch
    set ch=2
    execute "'[,']!" . a:cmd
    set nobin
    let &ch = ch_save
    execute ":doautocmd BufReadPost " . expand("%:r")
  endfun

  " After writing compressed file: Compress written file with "cmd"
  fun! GZIP_write(cmd)
    if rename(expand(""), expand(":r")) == 0
      execute "!" . a:cmd . " :r"
    endif
  endfun

  " Before appending to compressed file: Uncompress file with "cmd"
  fun! GZIP_appre(cmd)
    execute "!" . a:cmd . " "
    call rename(expand(":r"), expand(""))
  endfun

 augroup END
endif " has("autocmd")

I first found the above in my (default) .vimrc file, allowing gzipped and bzipped files to be edited. I added the “support” for LZMAed files quite trivially, as can be seen in the lines containign “lz” in the code above (I use .lz as termination for LZMAed files, instead of the default .lzma. See man lzma for more info).

Non-plaintext files

Other files that I have been able to successfully use in compressed form are [[PostScript]] and [[Portable Document Format|PDF]]. Granted, PDFs are already quite compact, but sometimes gzipping them saves space. In general, PS and EPS files save a lot of space by gzipping.

As far as I have tried, the [[Evince]] document viewer can read gzipped PS, EPS and PDF files with no problem (probably [[Device_independent_file_format|DVI]] files as well).

3 Comments »

  1. sylvainulg said,

    January 14, 2009 @ 18:56 pm

    yep. good point.
    and don’t forget : kdirstat (or windirstat depending on your OS) is your true friend for locating those fat files that eat up all your space.

  2. isilanes said,

    January 14, 2009 @ 21:34 pm

    Thanks, sylvanulg. I personally use Filelight when I want something graphical. More usually, I use just “du”, which is really really great. For GNOME, you also have Baobab (renamed “Disk Usage Analizer”).

  3. Super Coco said,

    January 14, 2009 @ 22:49 pm

    Great tip to edit gzipped files easily with vim! Thank you!

RSS feed for comments on this post · TrackBack URI

Leave a Comment