Wed, 13 Sep 2006

python list generation optimization

Menno Smits recently made some cleanups to yum's RPMDBPackageSack class. In a mail to the yum-devel list, Menno points out that python's list comprehension turns out to be 20% faster than using a for loop while appending items to a list.

A simple benchmark script shows the speed increase claim to be true. Menno's script test two different methods, the first being the standard for/list.append() loop.

l1 = []
for x in xrange(LOOPS):
    if x % 2 == 0:
        l1.append(x)

The second being list comprehension:

l2 = [x for x in xrange(LOOPS) if x % 2 == 0]

Both methods utilize python's xrange, which is a generator object written in pure C, and is faster than the plain old range. Out of curiosity, I went ahead and modified this script to test a few more methods of generating the same list as above.

Using a for loop, and an append function reference:

l3 = []
append = l3.append
for x in xrange(LOOPS):
    if x % 2 == 0:
        append(x)

Using the filter/lambda approach:

l4 = filter(lambda x: x % 2 == 0, xrange(LOOPS))

Using filter and a function reference to the lambda:

x = lambda x: x % 2 == 0
l5 = filter(x, xrange(LOOPS))

The results were actually quite surprising. Using a modified version of Menno's script, I came up with these results after 10 iterations. The percentages are compared to the standard for loop/list.append() method.

for loop/append function reference: 23% faster list comprehension: 36% faster filter/lambda: 40% faster filter/reference to lambda: 57% faster

As it turns out, using the filter/lambda approach seems to shave off ~57% of the usual overhead from the for/list.append() approach.

Although extremely powerful, filter/map/lambdas tend to compact code, thus making it more difficult to read/maintain. This reason alone may cause people to completely stay away from them all together.

Avoiding powerful language features because of 'readability' can be a mistake in my opinion. Any competent programmer should have the ability to make their code understandable to anyone -- yes, it may actually require writing comments. Even the most complex regular expressions can be written so they can be understood easily (see Five Habits for Successful Regular Expressions).

I still tend to agree with Donald Knuth's statement "Premature optimization is the root of all evil". Jumping right into a complex algorithm with maps/filters/lambdas/etc may not be the best idea, and will most likely make it a nightmare to debug down the road.


posted at: 19:11 | link | Tags: , | 7 comments