<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: ChopZip: a parallel implementation of arbitrary compression algorithms</title>
	<atom:link href="http://handyfloss.net/2009.12/chopzip-a-parallel-implementation-of-arbitrary-compression-algorithms/feed/" rel="self" type="application/rss+xml" />
	<link>http://handyfloss.net/2009.12/chopzip-a-parallel-implementation-of-arbitrary-compression-algorithms/</link>
	<description>Because FLOSS is handy, isn&#039;t it?</description>
	<lastBuildDate>Mon, 21 May 2012 06:39:20 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: Nagilum</title>
		<link>http://handyfloss.net/2009.12/chopzip-a-parallel-implementation-of-arbitrary-compression-algorithms/comment-page-1/#comment-116107</link>
		<dc:creator>Nagilum</dc:creator>
		<pubDate>Sat, 14 Jan 2012 12:37:50 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=913#comment-116107</guid>
		<description>For gzip there is pigz..
To build pxz you need to install liblzma-dev.</description>
		<content:encoded><![CDATA[<p>For gzip there is pigz..<br />
To build pxz you need to install liblzma-dev.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: isilanes</title>
		<link>http://handyfloss.net/2009.12/chopzip-a-parallel-implementation-of-arbitrary-compression-algorithms/comment-page-1/#comment-95593</link>
		<dc:creator>isilanes</dc:creator>
		<pubDate>Mon, 22 Aug 2011 12:44:48 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=913#comment-95593</guid>
		<description>Responding to my own comment above, I have found the reason for the dramatic speedup of parallel. When compressing data strings, compressors look for redundancy of data (if no redundancy is present, no compression is possible, because by definition our data can not be conveyed by a smaller set of data. If it could, it would mean some of it is redundant). Modern compressors do not search for such redundancies in the whole input data stream, because that would skyrocket the usage of memory and CPU. Instead of that, they compress the input data in chunks: they get the first MB of data and compress it, then the second MB, etc (or whatever chunk size they use).

The catch here is that parallel splits the input stream into 1 MB chunks by default, and each chunk is send to a single process (xz, in our example). When making a serial compression, apparently longer chunks are used (xz knows that the bigger the chunks, the more CPU and memory will be used, but also the better the compression ratio, so it uses as much as it can), so it&#039;s slower (but compresses more) than with parallel.

We can force parallel to split the input data into bigger chunks before sending each chunk to a xz process, as follows:

$ cat inputfile &#124; parallel --pipe -k --blocksize 2000k xz -3 &gt; inputfile.xz

Increasing the blocksize to around 3200 kB, we get a 8 second compression, which is a quarter of the serial compression. Bigger chunks result in slower compression, and smaller ones in a faster one (up to a limit). Also, increasing the block size to 3200 kB does reduce the size of the compressed file, actually to the level obtained by ChopZip or serial xz.</description>
		<content:encoded><![CDATA[<p>Responding to my own comment above, I have found the reason for the dramatic speedup of parallel. When compressing data strings, compressors look for redundancy of data (if no redundancy is present, no compression is possible, because by definition our data can not be conveyed by a smaller set of data. If it could, it would mean some of it is redundant). Modern compressors do not search for such redundancies in the whole input data stream, because that would skyrocket the usage of memory and CPU. Instead of that, they compress the input data in chunks: they get the first MB of data and compress it, then the second MB, etc (or whatever chunk size they use).</p>
<p>The catch here is that parallel splits the input stream into 1 MB chunks by default, and each chunk is send to a single process (xz, in our example). When making a serial compression, apparently longer chunks are used (xz knows that the bigger the chunks, the more CPU and memory will be used, but also the better the compression ratio, so it uses as much as it can), so it&#8217;s slower (but compresses more) than with parallel.</p>
<p>We can force parallel to split the input data into bigger chunks before sending each chunk to a xz process, as follows:</p>
<p>$ cat inputfile | parallel &#8211;pipe -k &#8211;blocksize 2000k xz -3 > inputfile.xz</p>
<p>Increasing the blocksize to around 3200 kB, we get a 8 second compression, which is a quarter of the serial compression. Bigger chunks result in slower compression, and smaller ones in a faster one (up to a limit). Also, increasing the block size to 3200 kB does reduce the size of the compressed file, actually to the level obtained by ChopZip or serial xz.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: isilanes</title>
		<link>http://handyfloss.net/2009.12/chopzip-a-parallel-implementation-of-arbitrary-compression-algorithms/comment-page-1/#comment-95441</link>
		<dc:creator>isilanes</dc:creator>
		<pubDate>Fri, 19 Aug 2011 09:03:35 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=913#comment-95441</guid>
		<description>Thanks Ole,

I seems that it could certainly do.

However I am getting somewhat disturbing results with it. With either xz, xz + parallel, or ChopZip, I compress a given file, then decompress it back and check that the md5sum hasn&#039;t changed, so the compression and decompression where correct. Now the file sizes are:

Uncompressed: 94 MB (97922080 bytes)
xz -3: 14399948 bytes
xz -3 + parallel -j 4: 14767356 bytes
chopzip.py -n 4 -m xz -l 3: 14413616 bytes

So far, so good. Parallel compression is slightly less efficient than serial, which was expected. I still don&#039;t know why chopzip beats xz + parallel in size, if they both split the input into 4 equal pieces, but oh well...

However, the surprise comes when benchmarking the speed. The compression times are:

xz: 32.8 s
chopzip.py: 14.6 s
xz + parallel: 3.77 s

I am disapointed at how chopzip barely gets a 2x speedup, when 4x is expected. However, I am much much more surprised at how xz+parallel got almost a 9x speedup!! I just don&#039;t get it...

I also tried gzip, instead of xz, and with it gzip+parallel was not much faster than serial gzip (1.33 vs 1.83 s). I&#039;m a bit puzzled.</description>
		<content:encoded><![CDATA[<p>Thanks Ole,</p>
<p>I seems that it could certainly do.</p>
<p>However I am getting somewhat disturbing results with it. With either xz, xz + parallel, or ChopZip, I compress a given file, then decompress it back and check that the md5sum hasn&#8217;t changed, so the compression and decompression where correct. Now the file sizes are:</p>
<p>Uncompressed: 94 MB (97922080 bytes)<br />
xz -3: 14399948 bytes<br />
xz -3 + parallel -j 4: 14767356 bytes<br />
chopzip.py -n 4 -m xz -l 3: 14413616 bytes</p>
<p>So far, so good. Parallel compression is slightly less efficient than serial, which was expected. I still don&#8217;t know why chopzip beats xz + parallel in size, if they both split the input into 4 equal pieces, but oh well&#8230;</p>
<p>However, the surprise comes when benchmarking the speed. The compression times are:</p>
<p>xz: 32.8 s<br />
chopzip.py: 14.6 s<br />
xz + parallel: 3.77 s</p>
<p>I am disapointed at how chopzip barely gets a 2x speedup, when 4x is expected. However, I am much much more surprised at how xz+parallel got almost a 9x speedup!! I just don&#8217;t get it&#8230;</p>
<p>I also tried gzip, instead of xz, and with it gzip+parallel was not much faster than serial gzip (1.33 vs 1.83 s). I&#8217;m a bit puzzled.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ole Tange</title>
		<link>http://handyfloss.net/2009.12/chopzip-a-parallel-implementation-of-arbitrary-compression-algorithms/comment-page-1/#comment-95405</link>
		<dc:creator>Ole Tange</dc:creator>
		<pubDate>Thu, 18 Aug 2011 12:25:43 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=913#comment-95405</guid>
		<description>Could ChopZip be replaced by GNU Parallel:

    ZIPPRG=lzip
    cat bigfile &#124; parallel --pipe -k $ZIPPRG &gt; bigfile.lz</description>
		<content:encoded><![CDATA[<p>Could ChopZip be replaced by GNU Parallel:</p>
<p>    ZIPPRG=lzip<br />
    cat bigfile | parallel &#8211;pipe -k $ZIPPRG &gt; bigfile.lz</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: isilanes</title>
		<link>http://handyfloss.net/2009.12/chopzip-a-parallel-implementation-of-arbitrary-compression-algorithms/comment-page-1/#comment-48960</link>
		<dc:creator>isilanes</dc:creator>
		<pubDate>Tue, 05 Jan 2010 11:10:01 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=913#comment-48960</guid>
		<description>Thanks, Iñigo! I also experimented with [[pexec]]. I even made a Python function to mimic pexec (and I guess ParallelPython), in the sense that it accepts a list of commands, then runs N of them simultaneously, making sure that there are always N processes running (for N cores). When one finishes, a new one is added to the list of running processes, until the list of commands to run is exhausted. In the case of ChopZip, I ended up going back to my original formulation, to avoid unnecessary code complexity (for little or no efficiency gain). pexec-like functionality allows (in my case) for generating more file pieces than cores are available, and thus a better load balance (compensate for chunks that get compressed faster, and their core would be idle). However, as I said, I saw no actual performance gain.</description>
		<content:encoded><![CDATA[<p>Thanks, Iñigo! I also experimented with <a href="http://en.wikipedia.org/wiki/pexec">pexec</a>. I even made a Python function to mimic pexec (and I guess ParallelPython), in the sense that it accepts a list of commands, then runs N of them simultaneously, making sure that there are always N processes running (for N cores). When one finishes, a new one is added to the list of running processes, until the list of commands to run is exhausted. In the case of ChopZip, I ended up going back to my original formulation, to avoid unnecessary code complexity (for little or no efficiency gain). pexec-like functionality allows (in my case) for generating more file pieces than cores are available, and thus a better load balance (compensate for chunks that get compressed faster, and their core would be idle). However, as I said, I saw no actual performance gain.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Inigo</title>
		<link>http://handyfloss.net/2009.12/chopzip-a-parallel-implementation-of-arbitrary-compression-algorithms/comment-page-1/#comment-47606</link>
		<dc:creator>Inigo</dc:creator>
		<pubDate>Thu, 24 Dec 2009 02:36:04 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=913#comment-47606</guid>
		<description>For trivial paralelization using python, with some examples that may ring to you, take a look at Parallel Python http://www.parallelpython.com/

Iñigo</description>
		<content:encoded><![CDATA[<p>For trivial paralelization using python, with some examples that may ring to you, take a look at Parallel Python <a href="http://www.parallelpython.com/" rel="nofollow">http://www.parallelpython.com/</a></p>
<p>Iñigo</p>
]]></content:encoded>
	</item>
</channel>
</rss>

