Wed, 13 Sep 2006
python list generation optimization
Menno Smits recently made some cleanups to yum's RPMDBPackageSack class. In a mail to the yum-devel list, Menno points out that python's list comprehension turns out to be 20% faster than using a for loop while appending items to a list.
A simple benchmark script shows the speed increase claim to be true. Menno's script test two different methods, the first being the standard for/list.append() loop.
l1 = []
for x in xrange(LOOPS):
if x % 2 == 0:
l1.append(x)
The second being list comprehension:
l2 = [x for x in xrange(LOOPS) if x % 2 == 0]
Both methods utilize python's xrange, which is a generator object written in pure C, and is faster than the plain old range. Out of curiosity, I went ahead and modified this script to test a few more methods of generating the same list as above.
Using a for loop, and an append function reference:
l3 = []
append = l3.append
for x in xrange(LOOPS):
if x % 2 == 0:
append(x)
Using the filter/lambda approach:
l4 = filter(lambda x: x % 2 == 0, xrange(LOOPS))
Using filter and a function reference to the lambda:
x = lambda x: x % 2 == 0
l5 = filter(x, xrange(LOOPS))
The results were actually quite surprising. Using a modified version of Menno's script, I came up with these results after 10 iterations. The percentages are compared to the standard for loop/list.append() method.
As it turns out, using the filter/lambda approach seems to shave off ~57% of the usual overhead from the for/list.append() approach.for loop/append function reference: 23% faster list comprehension: 36% faster filter/lambda: 40% faster filter/reference to lambda: 57% faster
Although extremely powerful, filter/map/lambdas tend to compact code, thus making it more difficult to read/maintain. This reason alone may cause people to completely stay away from them all together.
Avoiding powerful language features because of 'readability' can be a mistake in my opinion. Any competent programmer should have the ability to make their code understandable to anyone -- yes, it may actually require writing comments. Even the most complex regular expressions can be written so they can be understood easily (see Five Habits for Successful Regular Expressions).
I still tend to agree with Donald Knuth's statement "Premature optimization is the root of all evil". Jumping right into a complex algorithm with maps/filters/lambdas/etc may not be the best idea, and will most likely make it a nightmare to debug down the road.
posted at: 19:11 | link | | 6 comments








