<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Summary of my Python optimization adventures</title>
	<atom:link href="http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/feed/" rel="self" type="application/rss+xml" />
	<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/</link>
	<description>Because FLOSS is handy, isn't it?</description>
	<pubDate>Thu, 20 Nov 2008 14:37:56 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.2</generator>
		<item>
		<title>By: Summary of my Python optimization adventures &#171; handyfloss</title>
		<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/#comment-5409</link>
		<dc:creator>Summary of my Python optimization adventures &#171; handyfloss</dc:creator>
		<pubDate>Thu, 18 Sep 2008 12:42:40 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=298#comment-5409</guid>
		<description>[...] Entry available at: http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/ [...]</description>
		<content:encoded><![CDATA[<p>[...] Entry available at: <a href="http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/" rel="nofollow">http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Top Posts &#171; WordPress.com</title>
		<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/#comment-123</link>
		<dc:creator>Top Posts &#171; WordPress.com</dc:creator>
		<pubDate>Mon, 18 Feb 2008 23:59:23 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=298#comment-123</guid>
		<description>[...]  Summary of my Python optimization adventures This is a follow up to two previous posts. In the first one I spoke about saving memory by reading line-by-line, [&#8230;] [...]</description>
		<content:encoded><![CDATA[<p>[...]  Summary of my Python optimization adventures This is a follow up to two previous posts. In the first one I spoke about saving memory by reading line-by-line, [&#8230;] [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lorenzo E. Danielsson</title>
		<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/#comment-122</link>
		<dc:creator>Lorenzo E. Danielsson</dc:creator>
		<pubDate>Mon, 18 Feb 2008 20:43:32 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=298#comment-122</guid>
		<description>@James: what is wrong about calling an external tool to perform a job that it was designed to do? It doesn't make the program itself any less "python". That's the UNIX way, let each tool do what it's best at.</description>
		<content:encoded><![CDATA[<p>@James: what is wrong about calling an external tool to perform a job that it was designed to do? It doesn&#8217;t make the program itself any less &#8220;python&#8221;. That&#8217;s the UNIX way, let each tool do what it&#8217;s best at.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Dalke</title>
		<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/#comment-121</link>
		<dc:creator>Andrew Dalke</dc:creator>
		<pubDate>Mon, 18 Feb 2008 17:39:07 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=298#comment-121</guid>
		<description>Oops, missed the parens:  execfile("your_filename.py")</description>
		<content:encoded><![CDATA[<p>Oops, missed the parens:  execfile(&#8221;your_filename.py&#8221;)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Dalke</title>
		<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/#comment-120</link>
		<dc:creator>Andrew Dalke</dc:creator>
		<pubDate>Mon, 18 Feb 2008 17:23:43 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=298#comment-120</guid>
		<description>While I find it annoying to use, try the "profile" module to get an idea of where time is spent in your program.  Because your code isn't written as a module, the easiest way to do the profiling is using the command "execfile 'filename'".

This should tell you which line consumes the most time.</description>
		<content:encoded><![CDATA[<p>While I find it annoying to use, try the &#8220;profile&#8221; module to get an idea of where time is spent in your program.  Because your code isn&#8217;t written as a module, the easiest way to do the profiling is using the command &#8220;execfile &#8216;filename&#8217;&#8221;.</p>
<p>This should tell you which line consumes the most time.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Justin</title>
		<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/#comment-119</link>
		<dc:creator>Justin</dc:creator>
		<pubDate>Mon, 18 Feb 2008 15:07:52 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=298#comment-119</guid>
		<description>If you are going to do this:
&lt;code&gt;
if search_cre(line):
    line = re.sub('&#62;','&#60;',line)
    aline = line.split('&#60;')
    credit = float(aline[2])
&lt;/code&gt;

you should just change the regular expression to to be
&lt;code&gt;
"total_credit &#62;(?P[^&#60;]+)&#60;"
&lt;/code&gt;

or such, and then just pull out the credit if it matched.  The way you're doing it, you are processing the same line 3 times.  From the looks of it, you could change the program to use 1 regular expression for everything, instead of 4.

&lt;code&gt;
f = os.popen('zcat host.gz')
&lt;/code&gt;

Will be a lot faster than the gzip module though.</description>
		<content:encoded><![CDATA[<p>If you are going to do this:<br />
<code><br />
if search_cre(line):<br />
    line = re.sub('&gt;','&lt;',line)<br />
    aline = line.split('&lt;')<br />
    credit = float(aline[2])<br />
</code></p>
<p>you should just change the regular expression to to be<br />
<code><br />
"total_credit &gt;(?P[^&lt;]+)&lt;&#8221;<br />
</code></p>
<p>or such, and then just pull out the credit if it matched.  The way you&#8217;re doing it, you are processing the same line 3 times.  From the looks of it, you could change the program to use 1 regular expression for everything, instead of 4.</p>
<p><code><br />
f = os.popen('zcat host.gz')<br />
</code></p>
<p>Will be a lot faster than the gzip module though.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: isilanes</title>
		<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/#comment-118</link>
		<dc:creator>isilanes</dc:creator>
		<pubDate>Mon, 18 Feb 2008 13:34:05 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=298#comment-118</guid>
		<description>Andrew, I am not doubting that the "in" construct is faster (I repeated your test, and here it's 3 times faster, as well). The problem is that maybe that part of the script is not the bottleneck, and the uncertainty in measured time (I always measured walltime with /usr/bin/time -f %e &lt;i&gt;command&lt;/i&gt; is of the order of the difference in using one or the other, so I can't diferentiate. I'll keep testing... (the faster the script, the more noticeable the subtle differences).</description>
		<content:encoded><![CDATA[<p>Andrew, I am not doubting that the &#8220;in&#8221; construct is faster (I repeated your test, and here it&#8217;s 3 times faster, as well). The problem is that maybe that part of the script is not the bottleneck, and the uncertainty in measured time (I always measured walltime with /usr/bin/time -f %e <i>command</i> is of the order of the difference in using one or the other, so I can&#8217;t diferentiate. I&#8217;ll keep testing&#8230; (the faster the script, the more noticeable the subtle differences).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Dalke</title>
		<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/#comment-117</link>
		<dc:creator>Andrew Dalke</dc:creator>
		<pubDate>Mon, 18 Feb 2008 13:21:41 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=298#comment-117</guid>
		<description>When I benchmark with

python -m timeit -s 'import re' -s 'search = re.compile("Linux").match' 'search("Uses Linux")'

I get "0.413 usec per loop".  When I benchmark with

python -m timeit '"Linux" in "Uses Linux"'

I get "0.141 usec per loop".  Not quite 3 times faster than using search.  (My 20-fold case was when I tested re.search(), which has a couple extra function calls overhead and a cache check.)

Are you sure you timed what you think you timed?  Every time I've done the comparison the "in" test is faster, and I know the underlying implementation well enough that I can't think of how it can be slower than the re code.

Also, instead of writing to a temporary file and reading from that, use either the subprocess module or the older and harder to use os.popen call.  (Harder to use because it's harder to deal with errors.)  That should also give you some performance increase because you aren't doing a full read/write through the disk.</description>
		<content:encoded><![CDATA[<p>When I benchmark with</p>
<p>python -m timeit -s &#8216;import re&#8217; -s &#8217;search = re.compile(&#8221;Linux&#8221;).match&#8217; &#8217;search(&#8221;Uses Linux&#8221;)&#8217;</p>
<p>I get &#8220;0.413 usec per loop&#8221;.  When I benchmark with</p>
<p>python -m timeit &#8216;&#8221;Linux&#8221; in &#8220;Uses Linux&#8221;&#8216;</p>
<p>I get &#8220;0.141 usec per loop&#8221;.  Not quite 3 times faster than using search.  (My 20-fold case was when I tested re.search(), which has a couple extra function calls overhead and a cache check.)</p>
<p>Are you sure you timed what you think you timed?  Every time I&#8217;ve done the comparison the &#8220;in&#8221; test is faster, and I know the underlying implementation well enough that I can&#8217;t think of how it can be slower than the re code.</p>
<p>Also, instead of writing to a temporary file and reading from that, use either the subprocess module or the older and harder to use os.popen call.  (Harder to use because it&#8217;s harder to deal with errors.)  That should also give you some performance increase because you aren&#8217;t doing a full read/write through the disk.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: isilanes</title>
		<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/#comment-116</link>
		<dc:creator>isilanes</dc:creator>
		<pubDate>Mon, 18 Feb 2008 12:01:22 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=298#comment-116</guid>
		<description>James, you're absolutely right. What I meant was "optimizing a script", by using the best tools I could get access to (plus my limited knowledge).

Anyway, my first attempt was to make it all in Python, and as effectively as possible, and I give some hints of what to do and what not to do... so this qualifies as "Python optimization"? :^)</description>
		<content:encoded><![CDATA[<p>James, you&#8217;re absolutely right. What I meant was &#8220;optimizing a script&#8221;, by using the best tools I could get access to (plus my limited knowledge).</p>
<p>Anyway, my first attempt was to make it all in Python, and as effectively as possible, and I give some hints of what to do and what not to do&#8230; so this qualifies as &#8220;Python optimization&#8221;? :^)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: James</title>
		<link>http://handyfloss.net/2008.02/summary-of-my-python-optimization-adventures/#comment-115</link>
		<dc:creator>James</dc:creator>
		<pubDate>Mon, 18 Feb 2008 11:41:00 +0000</pubDate>
		<guid isPermaLink="false">http://handyfloss.net/?p=298#comment-115</guid>
		<description>Is this really Python optimization? Haven't you just offloaded most of the program to grep, effectively "rewriting" it in C?</description>
		<content:encoded><![CDATA[<p>Is this really Python optimization? Haven&#8217;t you just offloaded most of the program to grep, effectively &#8220;rewriting&#8221; it in C?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
