disk usage - handyfloss

Please, choose the right format to send me that text. Thanks.

April 13, 2010 at 10:03 am · Filed under Free software and related beasts

I just received an e-mail with a very interesting text (recipies for [[Pincho|pintxos]]), and it prompted some experiment. The issue is that the text was inside of a [[DOC (computing)|DOC]] file (of course!), which rises some questions and concerns on my side. The size of the file was 471 kB.

I thought that one could make the document more portable by exporting it to [[PDF]] (using [[OpenOffice.org]]). Doing so, the resulting file has a size of 364 kB (1.29 times smaller than the original DOC).

Furthermore, text formatting could be waived, by using a [[plain text]] format. A copy/paste of the contents of the DOC into a TXT file yielded a 186 kB file (2.53x smaller).

Once in the mood, we can go one step further, and compress the TXT file: with [[gzip]] we get a 51 kb file (9.24x), and with [[xz]] a 42 kB one (11.2x)

So far, so good. No surprise. The surprise came when, just for fun, I exported the DOC to [[OpenDocument|ODT]]. I obtained a document equivalent to the original one, but with a 75 kB size! (6.28x smaller than the DOC).

So, for summarizing:

DOC

Pros

Editable.
Allows for text formatting.

Cons

Proprietary. In principle only MS Office can open it. OpenOffice.org can, but because of reverse engineering.
If opened with OpenOffice.org, or just a different version of MS Office, the reader can not be sure of seeing the same formatting the writer intended.
Size. 6 times bigger than ODT. Even bigger than PDF.
MS invented and owns it. You need more reasons?

PDF

Pros

Portability. You can open it in any OS (Windows, Linux, Mac, BSD…), on account of there being so many free PDF readers.
Smaller than the DOC.
Allows for text formatting, and the format the reader sees will be exactly the one the writer intended.

Cons

Not editable (I really don’t see the point in editing PDFs. For me the PDF is a product of an underlying format (e.g. LaTeX), as what you see on your browser is the product of some HTML/PHP, or an exe is the product of some source code. But I digress.)
Could be smaller

TXT

Pros

Portability. You can’t get much more portable than a plain text file. You can edit it anywhere, with your favorite text editor.
Size. You can’t get much smaller than a plain text file (as it contains the mere text content), and you can compress it further with ease.

Cons

Formatting. If you need text formatting, or including pictures or content other than text, then plain text is not for you.

ODT

Pros

Portability. It can be edited with OpenOffice.org (and probably others), which is [[free software]], and has versions for Windows, Linux, and Mac.
Editability. Every bit as editable as DOC.
Size. 6 times smaller files than DOC.
It’s a free standard, not some proprietary rubbish.

Cons

None I can think of.

So please, if you send me some text, first consider if plain text will suffice. If not, and no edition is intended on my side, PDF is fine. If edition is important (or size, because it’s smaller than PDF), the ODT is the way to go.

Permalink Comments (7)

ChopZip: a parallel implementation of arbitrary compression algorithms

December 20, 2009 at 18:59 pm · Filed under Free software and related beasts

Remember plzma.py? I made a wrapper script for running [[LZMA]] in parallel. The script could be readily generalized to use any compression algorithm, following the principle of breaking the file in parts (one per CPU), compressing the parts, then [[tar (file format)|tarring]] them together. In other words, chop the file, zip the parts. Hence the name of the program that evolved from plzma.py: ChopZip.

Introduction

Currently ChopZip supports [[LZMA|lzma]], [[XZ Utils|xz]], [[gzip]] and lzip. Of them, lzip deserves a brief comment. It was brought to my attention by ~~the~~ a reader of this blog. It is based on the LZMA algorithm, as are lzma and xz. Apparently unlike them, multiple files compressed with lzip can be concatenated to form a single valid lzip-compressed file. Uncompressing the latter generates a concatenation of the formers.

To illustrate the point, check the following shell action:

% echo hello > head

% echo bye > tail

% lzip head

% lzip tail

% cat head.lz tail.lz > all.lz

% lzip -d all.lz

% cat all

hello

bye

However, I just discovered that all gzip, bzip2 and xz do that already! It seems that lzma is advertised as capable of doing it, but it doesn’t work for me. Sometimes it will uncompress the concatenated file to the original file just fine, others it will decompress it to just the first chunk of the set, yet other times it will complain that the “data is corrupt” and refuse to uncompress. For that reason, chopzip will accept two working modes: simple concatenation (gzip, lzip, xz) and tarring (lzma). The relevant mode will be used transparently for the user.

Also, if you use Ubuntu, this bug will apply to you, making it impossible to have xz-utils, lzma and lzip installed at the same time.

The really nice thing about concatenability is that it allows for trivial parallelization of the compression, while maintaining compatibility with the serial compression tool, which can still uncompress the product of a parallel compression. Unfortunatelly, for non-concatenatable compression formats, the output of chopzip will be a tar file of the compressed chunks, making it imposible to uncompress with the original compressor alone (first an untar would be needed, then uncompressing, then concatenation of chunks. Or just use chopzip to decompress).

The rationale behind plzma/chopzip is simple: multi-core computers are commonplace nowadays, but still the most common compression programs do not take advantage of this fact. At least the ones that I know and use don’t. There are at least two initiatives that tackle the issue, but I still think ChopZip has a niche to exploit. The most consolidated one is pbzip2 (which I mention in my plzma post). pbzip2 is great, if you want to use bzip2. It scales really nicely (almost linearly), and pbzipped files are valid bzip2 files. The main drawback is that it uses bzip2 as compression method. bzip2 has always been the “extreme” bother of gzip: compresses more, but it’s so slow that you would only resort to it if compression size is vital. LZMA-based programs (lzma, xz, lzip) are both faster, and even compress more, so for me bzip2 is out of the equation.

A second contender in parallel compression is pxz. As its name suggests, it compresses in using xz. Drawbacks? it’s not in the official repositories yet, and I couldn’t manage to compile it, even if it comprises a single C file, and a Makefile. It also lacks ability to use different encoders (which is not necessarily bad), and it’s a compiled program, versus chopzip, which is a much more portable script.

Scalability benchmark

Anyway, let’s get into chopzip. I have run a simple test with a moderately large file (a 374MB tar file of the whole /usr/bin dir). A table follows with the speedup results for running chopzip on that file, using various numbers of chunks (and consequently, threads). The tests were conducted in a 4GB RAM Intel Core 2 Quad Q8200 computer. Speedups are calculated as how many times faster did #chunks perform with respect to just 1 chunk. It is noteworthy that in every case running chopzip with a single chunk is virtually identical in performance to running the orginal compressor directly. Also decompression times (not show) were identical, irrespective of number of chunks. ChopZip version vas r18.

#chunks	xz	gzip	lzma	lzip
1	1.000	1.000	1.000	1.000
2	1.862	1.771	1.907	1.906
4	3.265	1.910	3.262	3.430
8	3.321	1.680	3.247	3.373
16	3.248	1.764	3.312	3.451

Note how increasing the number of chunks beyond the amount of actual cores (4 in this case) can have a small benefit. This happens because N equal chunks of a file will not be compressed with equal speed, so the more chunks, the smaller overall effect of the slowest-compressing chunks.

Conclusion

ChopZip speeds up quite noticeably the compression of arbitrary files, and with arbitrary compressors. In the case of concatenatable compressors (see above), the resulting compressed file is an ordinary compressed file, apt to be decompressed with the regular compressor (xz, lzip, gzip), as well as with ChopZip. This makes ChopZip a valid alternative to them, with the parallelization advantage.

Permalink Comments (6)

plzma.py: a wrapper for parallel implementation of LZMA compression

July 23, 2009 at 14:39 pm · Filed under Free software and related beasts

Update: this script has been superseded by ChopZip

Introduction

I discovered the [[Lempel-Ziv-Markov chain algorithm|LZMA]] compression algorithm some time ago, and have been thrilled by its capacity since. It has higher compression ratios than even [[bzip2]], with a faster decompression time. However, although decompressing is fast, compressing is not: LZMA is even slower than bzip2. On the other hand, [[gzip]] remains blazing fast in comparison, while providing a decent level of compression.

More recently I have discovered the interesting pbzip2, which is a parallel implementation of bzip2. With the increasing popularity of multi-core processors (I have a quad-core at home myself), parallelizing the compression tools is a very good idea. pbzip2 performs really well, producing bzip2-compatible files with near-linear scaling with the number of CPUs.

LZMA being such a high performance compressor, I wondered if its speed could be boosted by using it in parallel. Although the [[Lempel-Ziv-Markov chain algorithm|Wikipedia article]] states that the algorithm can be parallelized, I found no such implementation in Ubuntu 9.04, where the utility provided by the lzma package is exclusively serial. Not finding one, I set myself to produce it.

About plzma.py

Any compression can be parallelized as follows:

Split the original file into as many pieces as CPU cores available
Compress (simultaneously) all the pieces
Create a single file by joining all the compressed pieces, and call the result “the compressed file”

In a Linux environment, these three tasks can be carried out easily by split, lzma itself, and tar, respectively. I just made a [[Python (programming language)|Python]] script to automate these tasks, called it plzma.py, and put it in my web site for anyone to download (it’s GPLed). Please notice that plzma.py has been superseded by chopzip, starting with revision 12, whereas latest plzma is revision 6.

I must remark that, while pbzip2 generates bzip2-compatible compressed files, that is not the case with plzma. The products of plzma compression must be decompressed with plzma as well. The actual format of a plzma file is just a TAR file containing as many LZMA-compressed chunks as CPUs used for compression. These chunks, once decompressed individually, can be concatenated (with the cat command) to form the original file.

Benchmarks

What review of compression tools lacks benchmarks? No matter how inaccurate or silly, none of them do. And neither does mine :^)

I used three (single) files as reference:

molekel.tar – a 108 MB tar file of the (GPL) [[Molekel]] 5.0 source code
usr.bin.tar – 309 MB tar file of the contens of my /usr/bin/ dir
hackable.tar – a 782 MB tar file of the hackable:1 [[Debian]]-based distro for the [[Neo FreeRunner]]

The second case is intended as an example of binary file compression, whereas the other two are more of a “real-life” example. I didn’t test text-only files… I might in the future, but don’t expect the conclusions to change much. The testbed was my Frink desktop PC (Intel Q8200 quad-core).

The options for each tool were:

gzip/bzip/pbzip2: compression level 6
lzma/plzma: compression level 3
pbzip2/plzma: 4 CPUs

Compressed size

The most important feature of a compressor is the size of the resulting file. After all, we used it in first place to save space. No matter how fast an algorithm is, if the resulting file is bigger than the original file I wouldn’t use it. Would you?

The graph below shows the compressed size ratio for compression of the three test files with each of the five tools considered. The compressed size ratio is defined as the compressed size divided by the original size for each file.

This test doesn’t surprise much: gzip is the least effective and LZMA the most one. The point to make here is that the parallel implementations perform as well or badly as their serial counterparts.

If you are unimpressed by the supposedly higher performance of bzip2 and LZMA over gzip, when in the picture all final sizes do not look very different, recall that gzip compressed molekel.tar ~ 3 times (to a 0.329 ratio), whereas LZMA compressed it ~ 4.3 times (to a 0.233 ratio). You could stuff 13 LZMAed files where only 9 gzipped ones fit (and just 3 uncompressed ones).

Compression time

However important the compressed size is, compression time is also an important subject. Actually, that’s the very issue I try to address parallelizing LZMA: to make it faster while keeping its high compression ratio.

The graph below shows the normalized times for compression of the three test files with each of the five tools considered. The normalized time is taken as the total time divided by the time it took gzip to finish (an arbitrary scale with t(gzip)=1.0).

Roughly speaking, we could say that in my setting pbzip2 makes bzip2 as fast as gzip, and plzma makes LZMA as fast as serial bzip2.

The speedups for bzip2/pbzip2 and LZMA/plzma are given in the following table:

File	pbzip2	plzma
molekel.tar	4.00	2.72
usr.bin.tar	3.61	3.38
hackable.tar	3.80	3.04

The performance of plzma is nowere near pbzip2, but I’d call it acceptable (wouldn’t I?, I’m the author!). There are two reasons I can think of to explain lower-than-linear scalability. The first one is the overhead imposed when cutting the file into pieces then assembling them back. The second one, maybe more important, is the disk performance. Maybe each core can compress each file independently, but the disk I/O for reading the chunks and writing them back compressed is done simultaneously on the same disk, which the four processes share.

Update: I think that a good deal of under-linearity comes from the fact that files of equal size will not be compressed in an equal time. Each chunk compression will take a slightly different time to complete, because some will be easier than others to compress. The program waits for the last compression to finish, so it’s as slow as the slowest one. It is also true that pieces of 1/N size might take more than 1/N time to complete, so the more chunks, the slower the compression in total (the opposite could also be true, though).

Decompression times

Usually we pay less attention to it, because it is much faster (and because we often compress things never to open them again, in which case we had better deleted them in first place… but I digress).

The following graph shows the decompression data equivalent to the compression times graph above.

The most noteworthy point is that pbzip2 decompresses pbzip2-compressed files faster than bzip2 does with bzip2-compressed files. That is, both compression and decompression benefit from the parallelization. However, for plzma that is not the case: decompression is slower than with the serial LZMA. This is due to two effects: first, the decompression part is still not parallelized in my script (it will soon be). This would lead to decompression speeds near to the serial LZMA. However, it is slower due to the second effect: the overhead caused by splitting and then joining.

Another result worth noting is that, although LZMA is much slower than even bzip2 to compress, the decompression is actually faster. This is not random. LZMA was designed with fast uncompression time in mind, so that it could be used in, e.g. software distribution, where a single person compresses the original data (however painstakingly), then the users can download the result (the smaller, the faster), and uncompress it to use it.

Conclusions

While there is room for improvement, plzma seems like a viable option to speed up general compression tasks where a high compression ratio (LZMA level) is desired.

I would like to stress the point that plzma files are not uncompressable with just LZMA. If you don’t use plzma to decompress, you can follow the these steps:

% tar -xf file.plz
% lzma -d file.0[1-4].lz
% cat file.0[1-4] > file
% rm file.0[1-4] file.plz

Permalink Comments (4)

Save HD space by using compressed files directly

January 14, 2009 at 18:25 pm · Filed under Free software and related beasts

Maybe the constant increases in hard disk capacity provide us with more space we can waste with our files, but there is always a situation in which we would like to squeeze as much data in as little space as possible. Besides, it is always a good practice to keep disk usage as low as possible, just for tidiness.

The first and most important advice for saving space: for $GOD’s sake, delete the stuff you don’t need!

Now, assuming you want to keep all you presently have, the second tool is [[data compression]]. Linux users have long time friends in the [[gzip]] and [[bzip2]] commands. One would use the former for fast (and reasonably good) compression, and the latter for when saving space is really vital (although bzip2 is really slow). A more recent entry in the “perfect compression tool” contest would be [[Lempel-Ziv-Markov chain algorithm]] (LZMA). This one can compress even more than bzip2, being usually faster (although never as fast as gzip).

One problem with compression is that it is a good way of storing files, but they usually have to be uncompressed to modify, and then re-compressed, and this is very slow. However, we have some tools to interact with the compressed files directly (internally decompressing “on the fly” only the part that we need to edit). I would like to just mention them here:

Shell commands

We can use zcat, zgrep and zdiff as replacements for cat, grep and diff, but for gzipped files. These account for a huge fraction of all the interaction I do with text files from the command line. If you are like me, they can save you tons of time.

Vim

[[Vim (text editor)|Vim]] can be instructed to open some files making use of some decompression tool, to show the contents of the file and work on them transparently. Once we :wq out of the file, we will get the original compressed file. The speed to do this cycle is incredibly fast: almost as fast as opening the uncompressed file, and nowhere near as slow as gunzipping, viming and gzipping sequentially.

You can add the following to your .vimrc config file for the above:

" Only do this part when compiled with support for autocommands.
if has("autocmd")

 augroup gzip
  " Remove all gzip autocommands
  au!

  " Enable editing of gzipped files
  " set binary mode before reading the file
  autocmd BufReadPre,FileReadPre	*.gz,*.bz2,*.lz set bin

  autocmd BufReadPost,FileReadPost	*.gz call GZIP_read("gunzip")
  autocmd BufReadPost,FileReadPost	*.bz2 call GZIP_read("bunzip2")
  autocmd BufReadPost,FileReadPost	*.lz call GZIP_read("unlzma -S .lz")

  autocmd BufWritePost,FileWritePost	*.gz call GZIP_write("gzip")
  autocmd BufWritePost,FileWritePost	*.bz2 call GZIP_write("bzip2")
  autocmd BufWritePost,FileWritePost	*.lz call GZIP_write("lzma -S .lz")

  autocmd FileAppendPre			*.gz call GZIP_appre("gunzip")
  autocmd FileAppendPre			*.bz2 call GZIP_appre("bunzip2")
  autocmd FileAppendPre			*.lz call GZIP_appre("unlzma -S .lz")

  autocmd FileAppendPost		*.gz call GZIP_write("gzip")
  autocmd FileAppendPost		*.bz2 call GZIP_write("bzip2")
  autocmd FileAppendPost		*.lz call GZIP_write("lzma -S .lz")

  " After reading compressed file: Uncompress text in buffer with "cmd"
  fun! GZIP_read(cmd)
    let ch_save = &ch
    set ch=2
    execute "'[,']!" . a:cmd
    set nobin
    let &ch = ch_save
    execute ":doautocmd BufReadPost " . expand("%:r")
  endfun

  " After writing compressed file: Compress written file with "cmd"
  fun! GZIP_write(cmd)
    if rename(expand(""), expand(":r")) == 0
      execute "!" . a:cmd . " :r"
    endif
  endfun

  " Before appending to compressed file: Uncompress file with "cmd"
  fun! GZIP_appre(cmd)
    execute "!" . a:cmd . " "
    call rename(expand(":r"), expand(""))
  endfun

 augroup END
endif " has("autocmd")

I first found the above in my (default) .vimrc file, allowing gzipped and bzipped files to be edited. I added the “support” for LZMAed files quite trivially, as can be seen in the lines containign “lz” in the code above (I use .lz as termination for LZMAed files, instead of the default .lzma. See man lzma for more info).

Non-plaintext files

Other files that I have been able to successfully use in compressed form are [[PostScript]] and [[Portable Document Format|PDF]]. Granted, PDFs are already quite compact, but sometimes gzipping them saves space. In general, PS and EPS files save a lot of space by gzipping.

As far as I have tried, the [[Evince]] document viewer can read gzipped PS, EPS and PDF files with no problem (probably [[Device_independent_file_format|DVI]] files as well).

Permalink Comments (3)

DreamHost makes me happy again: free backups

September 15, 2008 at 11:38 am · Filed under This evil world

Perhaps you are aware of my first (and last so far) gripe with [[DreamHost]]: as I wrote a couple of months ago, they wouldn’t let me use my account space for non-web content.

Well, it seems that they really work to make their users happy, and probably other people requested something like that, and read what the August DH newsletter says about it:

In keeping with my no-theme theme, uh oh, I think I just made a destroy-the-universe-LHC-style self-contradiction, here’s a new feature that pretty much has nothing to do with anything I said in the introduction!

https://panel.dreamhost.com/?tree=users.backup

Now, you know how we give out a LOT of disk space with our hosting? Well technically that space is only supposed to be used for your _actual_ web site (and email / database stuff) .. not as an online backup for your music, pictures, videos, other servers, etc!

Well, just like every other web host does, we’ve been sort of cracking down on that some lately, and it seems to catch some people by surprise! Nobody likes being surprised, especially in the shower, which is where we typically brought it up, and so now we offer a solution:

You CAN use 50GB of your disk space for backups now! The only caveat is, it’s a separate ftp (or sftp) user on a separate server and it can’t serve any web pages. There are also NO BACKUPS kept of THESE backups (they should already BE your backups, not your only copy), and if you go over 50GB, extra space is only 10 cents a GB a month (a.k.a. cheap)!

Thanks, DreamHost, for showing me that I made a good choice when I chose you!
Update: apparently only [[SSH file transfer protocol|SFTP]] works (or [[File Transfer Protocol|FTP]] if you are idiot enough to enable it), but not scp or any [[Secure Shell|SSH]]-related thing (rsync, …). I hope I find some workaround, because if not that would be a showstopper for me.

Permalink Comments (2)

First DreamHost disappointment

July 11, 2008 at 12:39 pm · Filed under This evil world

I will simply copy&paste an e-mail interchange between [[DreamHost]] and me, with a few extra comments (some data substituted by “xxxxx”):

DreamHost:

Dear IÃ±aki,

Our system has noticed what seems to be a large amount of “backup/non-web” content on your account (#xxxxx), mostly on user “xxxxx” on the web server “xxxxx”.

Some of that content specifically is in /home/xxxxx (although there may be more in other locations as well.)

Unfortunately, our terms of service (http://www.dreamhost.com/tos.html) state:

The customer agrees to make use of DreamHost Web Hosting servers primarily for the purpose of hosting a website, and associated email functions. Data uploaded must be primarily for this purpose; DreamHost Web Hosting servers are not intended as a data backup or archiving service. DreamHost Web Hosting reserves the right to negotiate additional charges with the Customer and/or the discontinuation of the backups/archives at their discretion.

At this point, we must ask you to do one of three things:

* You can delete all backup/non-web files on your account.

* You can close your account from our panel at:
https://panel.dreamhost.com/?tree=billing.accounts
(We are willing to refund to you any pre-paid amount you have remaining, even if you’re past the 97 days. Just reply to this email after closing your account from the panel).

OR!

* You may now enable your account for backup/non-web use!

If you’d like to enable your account to be used for non-web files, please visit the link below. You will be given the option to be charged $0.20 a month per GB of usage (the monthly average, with daily readings) across your whole account.

We don’t think there exists another online storage service that has anything near the same features, flexibility, and redundancy for less than this, so we sincerely hope you take us up on this offer!

In the future, we plan to allow the creation of a single “storage” user on your account which will have no web sites (or email). For now though, if you choose to enable your account for backups, nothing will change (apart from the charges). If you want to enable backup/non-web use on this account, please go here:

https://panel.dreamhost.com/backups.cgi?xxxxxxxxxxx

If you choose not to enable this, you must delete all your non-web files by 2008-07-16 or your account will be suspended.

If you have any questions about this or anything at all, please don’t hesitate to contact us by replying to this email.

Thank you very much for your understanding,
The Happy DreamHost Backup/Non-Web Use Team

My answer:

Dear DreamHost Support Team,

I fully understand your point. Though apparently sensible, a detailed analysis shows that the policy you cite from the TOS makes little sense.

Right now I have a 5920 GB/month bandwidth limit, and a 540 GB disk quota in my account, both applied to web use. My current use in this regard is less than 4 GB disk space (0.7% of my quota), and my estimated bw use at the end of the present billing period will be around 0.2 GB (33 ppm (parts per million) of my current (and increasing) bw quota).

Now, on the other hand, I have some 50-100 GB of data (less than 20% of my disk quota!!) that I want to keep at the servers (for whatever private interest, that I do not need to disclose, but I will: backup and data sharing among my different PCs). Keeping this data up to date could cause between 1 MB and 1 GB worth of transfers per day (30 GB/month at most, or 0.5% of my bw quota).

All of the above raises some questions:

1) Why on Earth am I granted such a huge amount of resources that I will never conceivably use? Maybe just because of that: because I will never use them?

2) Why am I prevented of using my account in the only way that would allow me to take advantage of even a tiny part of those resources?

3) In what respect is the HD space and bw used up by a backup different from that used up by web content? Isn’t all data a collection of 0s and 1s? How can a Hosting Service, ISP, or any other provider of digital means DISCRIMINATE private data according to content?

4) Regarding the previous point, how is DH to tell if I simply move the backup dirs to the isilanes.org/ folder? I have to assume that if I make my backups visible through the web (which I can prevent with file permissions), then it makes them 100% kosher, since they become “web content” that I am allowed to host at DH?

It seems to me that you are renting me a truck to transport people, then frown at me if I take advantage of it to carry furniture. Moreover, you are advising me to keep the truck for people and rent small vans for the furniture.

[snip irrelevant part]

Believe me, I am willing to be a nice user. I just want to be able to use the resources I pay the way I need.

IÃ±aki

Their answer:

Hello IÃ±aki,

1) Why on Earth am I granted such a huge amount of resources that I will never conceivably use? Maybe just because of that: because I will never use them?

Some people will. Admittedly, very few do, but to be perfectly blunt, overselling is actually a vital part of our (and ANY) web host’s business model:

http://blog.dreamhost.com/2006/05/18/the-truth-about-overselling/

2) Why am I prevented of using my account in the only way that would allow me to take advantage of even a tiny part of those resources?

That’s an exaggeration, to be honest. Anyone can use up to the entire amount of their bandwidth and space, providing they use it for the purpose intended. If we ever open DreamStorage, you’d be welcome to use that space for backing up your data.

3) In what respect is the HD space and bw used up by a backup different from that used up by web content? sn’t all data a collection of 0s and 1s? How can a Hosting Service, ISP, or any other provider of digital means DISCRIMINATE private data according to content?

Well, just as we have…there’s a ton of data in a non-web-accessible directory. That’s a pretty good tip that something’s up. By your argument, we couldn’t take down someone for copyright, or even child porn violations, as it’s just “a collection of 0s and 1s”, and who are we to “discriminate”? Our Terms of Service, which you agreed to 2008-02-22 at 3:39pm. If you didn’t agree, this simply wasn’t the service for you.

4) Regarding the previous point, how is DH to tell if I simply move the backup dirs to the isilanes.org/ folder? I have to assume that if I make my backups visible through the web (which I can prevent with file permissions), then it makes them 100% kosher, since they become “web content” that I am allowed to host at DH?

Honestly, we’re not going to let you off on some weak technicality. If you don’t wish to comply with the ToS, we’ve even allowed you the option of receiving a prorated refund, regardless of how far out from your 97 day guarantee you are. We have no desire to lose your business, but your truck analogy is almost there. We’re offering you trucks for transporting furniture…and we’re doing it at a nice low rate. But we do require you actually use them. We count on the fact that very few people are going to be moving furniture 24/7, but if someone wanted to use it to it’s fullest, they could. However, that doesn’t mean you get to rent the truck, park it somewhere, and use it as a free self-storage unit. We want the truck if you’re not using it for it’s intended
purpose.

[snip irrelevant part]

Let me know if you have any other questions.

Thanks!

Jeff H

My final answer:

Hi Jeff,

Thanks for the kind answer! This kind of support is what gives DH an edge over other hosting providers. Keep it up.

What I say in my second point is not an exageration. It’s the plain truth: if not for backups, I will never use 1% of my quota. I mean *I* won’t. Don’t know about others, just me.

It seems a little unfair that some guy with 500 GB of HD use and 5800 GB/month of bw use is paying 8$/month as I am (I don’t recall the exact amount), while I am using 4 GB and 0.2 GB/month. Then I want to use 80 GB and 30 GB/month and I have to pay an extra 16$. That’s a total of TRIPLE that of the aforementioned guy, while I’m still using 6 times less HD and 200 times less bw.

I would love to pay for some resources, and administer them as I like, be it for web, backup, svn, or whatever. What I meant with my third point is that 100 MB of my backups “hurt” the system as much as sb else’s 100 MB of web content, so I can’t see the reason to make the user pay a separate bill for “backups”. Just make ftp traffic count against the disk/bw quotas and that’s it! You could then stop worrying about “fair” use.

But that’s pointless ranting on my side. Thanks for the attention. I will consider what to do in the light of the information you provided me.

IÃ±aki

I just want to point out how ridiculous their answer to my third point above is. DH tells me that they should be able to discriminate my data according to content (or use), because the opposite would supposedly allow me to break the law with copyright violations or child pornography. To follow with the truck metaphor, I am renting a truck from them, to carry furniture around. Since I don’t use up all the space in the truck, and I have a fridge I want to move, I put it into the truck. Now DH wants to patrol what I carry in the truck, and tell me that the fridge is not allowed, because it is not “furniture”. When I complain, and say that what I carry in the truck they lend me is none of their business, they answer that it is, because I could well be using the truck for drug smuggling. That’s really lousy reasoning. If I use the truck for carrying something illegal, then the police will sort it out, not the renting company. It is the general Law that will tell me what I can use the truck for, not the renting company.

Permalink Comments (11)

Filelight makes my day

February 7, 2008 at 11:52 am · Filed under my ego and me

First of all: yes, this could have been made with du. Filelight is just more visual.

The thing is that yesterday I noticed that my root partition was a bit on the crowded side (90+%). I though it could be because of /var/cache/apt/archives/, where all the installed .deb files reside, and started purging some unneeded installed packages (very few… I only install what I need). However, I decided to double check, and Filelight has given me the clue:

(click to enlarge)

Some utter disaster in a printing job filled the /var/spool/cups/tmp/ with 1.5GB of crap! After deleting it, my root partition is back to 69% full, which is normal (I partitioned my disk with 3 roots of 7.5GB (for three simultaneous OS installations, if need be), a /home of 55GB, and a secondary disk of 250GB).

Simple problem, simple solution.

Permalink Comments

App of the week: Filelight

November 12, 2007 at 16:56 pm · Filed under Application of the Week

Actually it is two applications I want to highlight: Filelight and Baobab. Both are disk usage analyzers, the former for KDE (see Figure 1), and the latter for GNOME (see Figure 2).

Figure 1: Filelight (click to enlarge)

Figure 2: Baobab (click to enlarge)

A disk usage analyzer is a tool to conveniently find out how much hard disk space different directories and files are taking up. It combines the effectiveness of the Unix du (if you never used it, stop here and do a man du in your command line immediately. If you do not know what that “command line” thingie is, whip yourself in the back repeatedly), with the convenience of a visual clue of how large directories are compared to one another.

From the two DUAs I mention, I largely prefer Filelight, for some reasons:

1 – When I want to open a terminal in a location chosen from the DUA window, with Baobab it’s two clicks away: “Open file manager here”, then “Open terminal here” in the file manager. With Filelight, it’s just one click: “open terminal here”. Plus Filelight has a handy locator bar at the top, showing the full path to the current location (useful to copy-and-paste with the mouse to an already open terminal).

2 – Filelight shows directories up to individual files. Baobab just dirs.

3 – With Filelight, navigation up and down (and back and forward) in the dir tree is a breeze (web browser-style). With Baobab, it’s a pain.

4 – The presentation is similar, but the one of Filelight is slightly nicer, with more info when the mouse is hovered over the graph.

Probably Baobab can be easily made to behave like Filelight. I just tried them both, and liked the latter better on first sight. I tried Baobab first, and I found some things lacking. When I tried Filelight, five minutes later, I just thought “These are the details Baobab was missing!”

Permalink Comments

Meta