<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Summary of my Python optimization adventures</title>
	<atom:link href="http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/feed/" rel="self" type="application/rss+xml" />
	<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/</link>
	<description>Because FLOSS is handy, isn&#039;t it?</description>
	<lastBuildDate>Fri, 19 Mar 2010 09:03:02 -0700</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Summary of my Python optimization adventures &#171; handyfloss</title>
		<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/comment-page-1/#comment-5409</link>
		<dc:creator>Summary of my Python optimization adventures &#171; handyfloss</dc:creator>
		<pubDate>Thu, 18 Sep 2008 12:42:40 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=298#comment-5409</guid>
		<description>[...] Entry available at: http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/ [...]</description>
		<content:encoded><![CDATA[<p>[...] Entry available at: <a href="http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/" rel="nofollow">http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Top Posts &#171; WordPress.com</title>
		<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/comment-page-1/#comment-123</link>
		<dc:creator>Top Posts &#171; WordPress.com</dc:creator>
		<pubDate>Mon, 18 Feb 2008 23:59:23 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=298#comment-123</guid>
		<description>[...]  Summary of my Python optimization adventures This is a follow up to two previous posts. In the first one I spoke about saving memory by reading line-by-line, [&#8230;] [...]</description>
		<content:encoded><![CDATA[<p>[...]  Summary of my Python optimization adventures This is a follow up to two previous posts. In the first one I spoke about saving memory by reading line-by-line, [&#8230;] [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lorenzo E. Danielsson</title>
		<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/comment-page-1/#comment-122</link>
		<dc:creator>Lorenzo E. Danielsson</dc:creator>
		<pubDate>Mon, 18 Feb 2008 20:43:32 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=298#comment-122</guid>
		<description>@James: what is wrong about calling an external tool to perform a job that it was designed to do? It doesn&#039;t make the program itself any less &quot;python&quot;. That&#039;s the UNIX way, let each tool do what it&#039;s best at.</description>
		<content:encoded><![CDATA[<p>@James: what is wrong about calling an external tool to perform a job that it was designed to do? It doesn&#8217;t make the program itself any less &#8220;python&#8221;. That&#8217;s the UNIX way, let each tool do what it&#8217;s best at.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Dalke</title>
		<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/comment-page-1/#comment-121</link>
		<dc:creator>Andrew Dalke</dc:creator>
		<pubDate>Mon, 18 Feb 2008 17:39:07 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=298#comment-121</guid>
		<description>Oops, missed the parens:  execfile(&quot;your_filename.py&quot;)</description>
		<content:encoded><![CDATA[<p>Oops, missed the parens:  execfile(&#8221;your_filename.py&#8221;)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Dalke</title>
		<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/comment-page-1/#comment-120</link>
		<dc:creator>Andrew Dalke</dc:creator>
		<pubDate>Mon, 18 Feb 2008 17:23:43 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=298#comment-120</guid>
		<description>While I find it annoying to use, try the &quot;profile&quot; module to get an idea of where time is spent in your program.  Because your code isn&#039;t written as a module, the easiest way to do the profiling is using the command &quot;execfile &#039;filename&#039;&quot;.

This should tell you which line consumes the most time.</description>
		<content:encoded><![CDATA[<p>While I find it annoying to use, try the &#8220;profile&#8221; module to get an idea of where time is spent in your program.  Because your code isn&#8217;t written as a module, the easiest way to do the profiling is using the command &#8220;execfile &#8216;filename&#8217;&#8221;.</p>
<p>This should tell you which line consumes the most time.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Justin</title>
		<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/comment-page-1/#comment-119</link>
		<dc:creator>Justin</dc:creator>
		<pubDate>Mon, 18 Feb 2008 15:07:52 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=298#comment-119</guid>
		<description>If you are going to do this:
&lt;code&gt;
if search_cre(line):
    line = re.sub(&#039;&gt;&#039;,&#039;&lt;&#039;,line)
    aline = line.split(&#039;&lt;&#039;)
    credit = float(aline[2])
&lt;/code&gt;

you should just change the regular expression to to be
&lt;code&gt;
&quot;total_credit &gt;(?P[^&lt;]+)&lt;&quot;
&lt;/code&gt;

or such, and then just pull out the credit if it matched.  The way you&#039;re doing it, you are processing the same line 3 times.  From the looks of it, you could change the program to use 1 regular expression for everything, instead of 4.

&lt;code&gt;
f = os.popen(&#039;zcat host.gz&#039;)
&lt;/code&gt;

Will be a lot faster than the gzip module though.</description>
		<content:encoded><![CDATA[<p>If you are going to do this:<br />
<code><br />
if search_cre(line):<br />
    line = re.sub('&gt;','&lt;',line)<br />
    aline = line.split('&lt;')<br />
    credit = float(aline[2])<br />
</code></p>
<p>you should just change the regular expression to to be<br />
<code><br />
"total_credit &gt;(?P[^&lt;]+)&lt;"<br />
</code></p>
<p>or such, and then just pull out the credit if it matched.  The way you&#8217;re doing it, you are processing the same line 3 times.  From the looks of it, you could change the program to use 1 regular expression for everything, instead of 4.</p>
<p><code><br />
f = os.popen('zcat host.gz')<br />
</code></p>
<p>Will be a lot faster than the gzip module though.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: isilanes</title>
		<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/comment-page-1/#comment-118</link>
		<dc:creator>isilanes</dc:creator>
		<pubDate>Mon, 18 Feb 2008 13:34:05 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=298#comment-118</guid>
		<description>Andrew, I am not doubting that the &quot;in&quot; construct is faster (I repeated your test, and here it&#039;s 3 times faster, as well). The problem is that maybe that part of the script is not the bottleneck, and the uncertainty in measured time (I always measured walltime with /usr/bin/time -f %e &lt;i&gt;command&lt;/i&gt; is of the order of the difference in using one or the other, so I can&#039;t diferentiate. I&#039;ll keep testing... (the faster the script, the more noticeable the subtle differences).</description>
		<content:encoded><![CDATA[<p>Andrew, I am not doubting that the &#8220;in&#8221; construct is faster (I repeated your test, and here it&#8217;s 3 times faster, as well). The problem is that maybe that part of the script is not the bottleneck, and the uncertainty in measured time (I always measured walltime with /usr/bin/time -f %e <i>command</i> is of the order of the difference in using one or the other, so I can&#8217;t diferentiate. I&#8217;ll keep testing&#8230; (the faster the script, the more noticeable the subtle differences).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Dalke</title>
		<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/comment-page-1/#comment-117</link>
		<dc:creator>Andrew Dalke</dc:creator>
		<pubDate>Mon, 18 Feb 2008 13:21:41 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=298#comment-117</guid>
		<description>When I benchmark with

python -m timeit -s &#039;import re&#039; -s &#039;search = re.compile(&quot;Linux&quot;).match&#039; &#039;search(&quot;Uses Linux&quot;)&#039;

I get &quot;0.413 usec per loop&quot;.  When I benchmark with

python -m timeit &#039;&quot;Linux&quot; in &quot;Uses Linux&quot;&#039;

I get &quot;0.141 usec per loop&quot;.  Not quite 3 times faster than using search.  (My 20-fold case was when I tested re.search(), which has a couple extra function calls overhead and a cache check.)

Are you sure you timed what you think you timed?  Every time I&#039;ve done the comparison the &quot;in&quot; test is faster, and I know the underlying implementation well enough that I can&#039;t think of how it can be slower than the re code.

Also, instead of writing to a temporary file and reading from that, use either the subprocess module or the older and harder to use os.popen call.  (Harder to use because it&#039;s harder to deal with errors.)  That should also give you some performance increase because you aren&#039;t doing a full read/write through the disk.</description>
		<content:encoded><![CDATA[<p>When I benchmark with</p>
<p>python -m timeit -s &#8216;import re&#8217; -s &#8217;search = re.compile(&#8221;Linux&#8221;).match&#8217; &#8217;search(&#8221;Uses Linux&#8221;)&#8217;</p>
<p>I get &#8220;0.413 usec per loop&#8221;.  When I benchmark with</p>
<p>python -m timeit &#8216;&#8221;Linux&#8221; in &#8220;Uses Linux&#8221;&#8216;</p>
<p>I get &#8220;0.141 usec per loop&#8221;.  Not quite 3 times faster than using search.  (My 20-fold case was when I tested re.search(), which has a couple extra function calls overhead and a cache check.)</p>
<p>Are you sure you timed what you think you timed?  Every time I&#8217;ve done the comparison the &#8220;in&#8221; test is faster, and I know the underlying implementation well enough that I can&#8217;t think of how it can be slower than the re code.</p>
<p>Also, instead of writing to a temporary file and reading from that, use either the subprocess module or the older and harder to use os.popen call.  (Harder to use because it&#8217;s harder to deal with errors.)  That should also give you some performance increase because you aren&#8217;t doing a full read/write through the disk.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: isilanes</title>
		<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/comment-page-1/#comment-116</link>
		<dc:creator>isilanes</dc:creator>
		<pubDate>Mon, 18 Feb 2008 12:01:22 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=298#comment-116</guid>
		<description>James, you&#039;re absolutely right. What I meant was &quot;optimizing a script&quot;, by using the best tools I could get access to (plus my limited knowledge).

Anyway, my first attempt was to make it all in Python, and as effectively as possible, and I give some hints of what to do and what not to do... so this qualifies as &quot;Python optimization&quot;? :^)</description>
		<content:encoded><![CDATA[<p>James, you&#8217;re absolutely right. What I meant was &#8220;optimizing a script&#8221;, by using the best tools I could get access to (plus my limited knowledge).</p>
<p>Anyway, my first attempt was to make it all in Python, and as effectively as possible, and I give some hints of what to do and what not to do&#8230; so this qualifies as &#8220;Python optimization&#8221;? :^)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: James</title>
		<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/comment-page-1/#comment-115</link>
		<dc:creator>James</dc:creator>
		<pubDate>Mon, 18 Feb 2008 11:41:00 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=298#comment-115</guid>
		<description>Is this really Python optimization? Haven&#039;t you just offloaded most of the program to grep, effectively &quot;rewriting&quot; it in C?</description>
		<content:encoded><![CDATA[<p>Is this really Python optimization? Haven&#8217;t you just offloaded most of the program to grep, effectively &#8220;rewriting&#8221; it in C?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
