Mon, 24 Mar 2008

PyCon 2008

I was in Chicago last week for PyCon 2008. It was my first time in the windy city, and I must say that I was thoroughly impressed. As expected in any city, we got a chance to see a lady get her purse snattched, and a mentally unstable gentleman on the train yelling profanities at god. Anyway, the conference itself was extremely well done, and tons of awesome innovation happened at the sprints afterwords.

Day 1: Tutorials
8+ hours of TurboGears/Pylons/WSGI tutorials. Awesome. I'm really excited with what is in the works for TurboGears2. By wielding Pylons, the TG2 team was able to completely re-write their framework with minimal amounts of code, while at the same time, gaining a *ton* of new features and some amazing middleware. Mark Ramm and Ben Bangert took turns walking us through the deep internals of their frameworks, while also giving some examples how to use them.

Sessions
During the 3-day conference portion of PyCon, there was a vast plethora of incredibly interesting sessions and conversations. You can find a schedule of the talks and some slides here. Everything was video taped as well, so the sessions should be making their way on to YouTube hopefully at some point soon.

Here are some things that caught my attention while I was there.

WSGI
Defined by Phillip J. Eby in PEP-333, the Web Server Gateway Interface is a simple interface between web servers, applications, and frameworks. Or, as explained by Ian Bicking: WSGI is a series of Tubes. The basic idea is that it lets you connect a bunch of different applications together into a functioning whole. Since TurboGears2 is based on Pylons, it will be a full blown WSGI application out the box, loaded with lots of useful middleware (WebError, Routes, Sessions, Caching, etc), and will allow you to use any WSGI server that you wish (Paste, CherryPy, orbited, mod_wsgi, etc). An example of a basic Hello World WSGI application:

def wsgi_app(environ, start_response):
    start_response('200 OK', [('content-type', 'text/html')])
    return ['Hello world!']

So, what is WSGI middleware? Well, it's essentially the WSGI equivalent of a python decorator, but instead of wrapping one function in another, you're wrapping one web-app in another. You can see a list of some existing WSGI middleware here.

virtualenv
With so many new shiny python programs to play with, I really tried to resist the urge to easy_install everything into my global Python site-packages so I could tinker with things. This is generally a Bad Thing in a distribution, as easy_install not only installs things behind your package managers back, but it also lacks the ability to uninstall anything with it, unless you want to take Zed's easy_fucking_uninstall approach ;) During the TurboGears tutorial, I was introduced to a tool call virtualenv, which will setup a virtual python environment in which you can easy_install as many eggs as you want without worrying about butchering your site-packages.

$ easy_install virtualenv
$ virtualenv --no-site-packages foo
$ cd foo; source bin/activate
$ easy_install <shiny python programs>

nose
I've been in love with nose since day one, but realized that I haven't been utilizing it to it's fullest abilities. I blogged in the past about nose's profiler plugin. Come to find out, nose offers a lot more plugins that can seriously help make your life easier:

$ nosetests --pdb --pdb-failures
.............................................................> /home/lmacken/tg1.1/turbogears/turbogears/identity/tests/test_visit.py(92)test_cookie_permanent()
-> assert abs(should_expire - expires) < 3
(Pdb) locals()
{'morsel': <Morsel: tg-visit='452c94de3900fc2adff2cd6b0b0f04c4533e3e9e'>, 'self': <turbogears.identity.tests.test_visit.TestVisit testMethod=test_cookie_permanent>, 'expires': 1206228604.0, 'should_expire': 1206232205.0, 'permanent': False}
(Pdb)

You can also measure code coverage during your unit test execution using the '--with-coverage' option, which utilizes coverage.py.

SQLAlchemy
Also known as "the greatest object-relational-mapper created for any language. ever.", 0.4 has seen vast improvements since 0.3. Among them, a new declarative API is now available that essentially lets you define your class, Table and mapper constructs "at once" under a single class declaration (giving you a similar ActiveMapper feel like SQLObject or Elixir).

from sqlalchemy.ext.declarative import declarative_base

engine = create_engine('sqlite://')
Base = declarative_base(engine)

class SomeClass(Base):
    __tablename__ = 'some_table'
    id = Column('id', Integer, primary_key=True)
    name = Column('name', String(50))

Unicode, demystified.
By far, the most frustrating problems I've ever encountered in Python have been unicode related. I was fortunate enough to catch Kumar McMillan's presentation, "Unicode in Python, Completely Demystified". This presentation helped enlighten many on the concept of unicode, clear up many misconceptions, and explain how to handle it properly in Python. Check out his slides for more details, but the general idea here is to follow these three rules:

His solution to decoding to unicode turns out to be quite elegant compared to some nasty try/except UnicodeDecodeError blocks that I have written in the past ;)
def to_unicode_or_bust(obj, encoding='utf-8'):
    if isinstance(obj, basestring):
        if not isinstance(obj, unicode):
            obj = unicode(obj, encoding)
    return obj

Later that night I went and shined some light on some dark corners of certain projects that I've been working on to try and handle unicode the Right Way.

Grassyknoll
After the code sprints, I got a chance to see these guys show off their hard work. grassyknoll is a search engine written in Python. With the ability to handle multiple backends, frontends, and wire formats, grassyknoll has a ton of potential to revolutionize the open source search engine. There has been recent talk in Fedora land about what kind of search engine to use, and I think grassyknoll is definitley a viable option.

Packaging BOF
Toshio, Spot, and I attended a Packaging BOF where we discussed our experiences with distutils and setuptools with a bunch of people from various companies and distros. This then sparked discussions on python-dev and the distutils-sig mailing lists. You can also find the details of the BOF session on the Python wiki. There is definitely a lot of energy behind this, so hopefully we'll see some good changes in setuptools in the near future that will make our lives as distro packagers much easier :)

Orbited
Orbited is an HTTP daemon that is optimized for long-lasting comet connections. This allows you to write real-time web applications with ease. For example, embeding an irc channel anywhere:

You can also use orbited as a WSGI server! Toshio did some brief benchmarking of of CherryPy{2,3}, Paste, and Orbited WSGI servers, and orbited seemed to be the clear winner in all scenerios. There is a good chance that we will be using orbited to handle our comet widgets within MyFedora :)

Code Sprints
I stayed the entire time for the code sprints, and mainly focused on TurboGears hacking. This is what I ended up working on:

Want to read more blog posts about PyCon 2008? You can find links to lots of PyCon related posts here and on Planet Python.



posted at: 17:05 | link | Tags: , | 1 comments

Wed, 19 Dec 2007

TurboFlot 0.0.1

In an effort to clean up bodhi's metrics code a bit, I wrote a TurboFlot plugin that allows you to wield the jQuery plugin flot inside of TurboGears applications. The code is quite trivial -- it's essentially just a TurboGears JSON proxy to the jQuery flot plugin. Breaking this code out into it's own widget makes it really easy to generate shiny graphs in a Pythonic fashon, without having to write a line of javascript.

Check out the README to see the code for the example above.

To use TurboFlot in your own application, you just pass your data and graph options to the widget, and then throw it up to your template. Read the flot API documentation for details on all of the arguments. Here is a simple usage example:

flot = TurboFlot([
    {
        'data'  : [[0, 3], [4, 8], [8, 5], [9, 13]],
        'lines' : { 'show' : True, 'fill' : True }
    }],
    {
        'grid'  : { 'backgroundColor' : '#fffaff' },
        'yaxis' : { 'max' : '850' }
    }
)
Then, to display the widget in your template, you simply use:
${flot.display()}

The code for the widget itself is pretty simple. It just takes your data and graph options, encodes them as JSON and tosses them at flot.
class TurboFlot(Widget):
    """
        A TurboGears Flot Widget.
    """
    template = """
      <div xmlns:py="http://purl.org/kid/ns#" id="turboflot"
           style="width:${width};height:${height};">
        <script>
          $.plot($("#turboflot"), ${data}, ${options});
        </script>
      </div>
    """
    params = ["data", "options", "height", "width"]
    javascript = [JSLink('turboflot', 'excanvas.js'),
                  JSLink("turboflot", "jquery.js"),
                  JSLink("turboflot", "jquery.flot.js")]

    def __init__(self, data, options={}, height="300px", width="600px"):
        self.data = simplejson.dumps(data)
        self.options = simplejson.dumps(options)
        self.height = height
        self.width = width

You can download the latest releases from the Python Package Index:

http://pypi.python.org/pypi/TurboFlot
Or you can grab my latest development tree out of mercurial:
http://hg.lewk.org/TurboFlot
As always, patches are welcome :)



posted at: 14:21 | link | Tags: , , , | 1 comments

Sat, 08 Dec 2007

Fedora update metrics

Using flot, a plotting library for jQuery, I threw together some shiny metrics for bodhi. It's pretty amazing to see how a Fedora release evolves over time, with almost as many enhancements as bugfixes. This could arguably be a bad thing, as our "stable" bits seem to change so much; but it definitely shows how much innovation is happening in Fedora.

I should also note that the data on the graphs may look different than the numbers you see next to each category in the bodhi menu. This is due to the fact that updates may contain multiple builds, and the graphs account for all builds in the system.

When I get some free cycles I'd like to generate some metrics from the old updates system for FC4-FC6. I can imagine that the differences will be pretty drastic, considering how the old updates tool was internal to Red Hat, and that the majority of our top packagers are community folks.



posted at: 19:05 | link | Tags: , , , , , | 0 comments

Mon, 01 Oct 2007

Use your Nose!

Every programmer out there [hopefully] knows that unittests are an essential part of any growing body of code, especially in the open source world. However, most hackers out either never write test cases (let alone comments), or usually put them off until "later" (aka: never). Having to deal with Java and JUnit tests in college not only made me not want to write unit tests, but it made me want to kill myself and everyone around me. Thankfully, I learned Python.

So, I just happen to maintain a piece of software in Fedora called nose (which lives in the python-nose package). Nose is a discovery-based unittest extension for Python, and is also a part of the TurboGears stack. If you're hacking on a TurboGears project, the turbogears.testutil module provides some incredibly useful features that make writing tests powerfully trivial.

For example, in the code below (taken from bodhi), I create a test case that utilizes a fresh SQLite database in memory. Inheriting from the the testutil.DBTest parent class, this database will be created and torn down automagically before and after each test case is run -- ensuring that my tests are executed in complete isolation. With this example, I wrote a test case to ensure that unauthenticated people cannot create a new update.

import urllib, cherrypy
from turbogears import update_config, database, testutil, url

update_config(configfile='dev.cfg', modulename='bodhi.config')
database.set_db_uri("sqlite:///:memory:")

class TestControllers(testutil.DBTest):

    def test_unauthenticated_update(self):
        params = {
                'builds'  : 'TurboGears-1.0.2.2-2.fc7',
                'release' : 'Fedora 7',
                'type'    : 'enhancement',
                'bugs'    : '1234 5678',
                'cves'    : 'CVE-2020-0001',
                'notes'   : 'foobar'
        }
        path = url('/save?' + urllib.urlencode(params))
        testutil.createRequest(path, method='POST')
        assert "You must provide your credentials before accessing this resource." in cherrypy.response.body[0]
In the above example, the TestControllers class is automatically detected by nose, which then executes each method that begins with the word 'test'. To run your unittests, just type 'nosetests'.
[lmacken@tomservo bodhi]$ nosetests
.................................
----------------------------------------------------------------------
Ran 33 tests in 16.798s

OK
Now, for the fun part. Nose comes equipped with a profiling plugin that will profile your test cases using Python's hotshot module. So, I went ahead and added a 'profile' target to bodhi's Makefile:
profile:
    nosetests --with-profile --profile-stats-file=nose.prof
    python -c "import hotshot.stats ; stats = hotshot.stats.load('nose.prof') ; stats.sort_stats('time', 'calls') ; stats.print_stats(20)"
Now, typing 'make profile' will execute and profile all of our unit tests, and spit out the top 20 method calls -- ordered by internal time and call count.
[lmacken@tomservo bodhi]$ make profile
nosetests --with-profile --profile-stats-file=nose.prof
.................................
----------------------------------------------------------------------
Ran 33 tests in 42.878s

OK
python -c "import hotshot.stats ; stats = hotshot.stats.load('nose.prof') ; stats.sort_stats('time', 'calls') ; stats.print_stats(20)"
         800986 function calls (702850 primitive calls) in 42.878 CPU seconds

   Ordered by: internal time, call count
   List reduced from 3815 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       14   13.675    0.977   13.675    0.977 /usr/lib/python2.5/socket.py:71(ssl)
       31   10.683    0.345   10.683    0.345 /usr/lib/python2.5/httplib.py:994(_read)
2478/2429    9.297    0.004    9.677    0.004 :1()
        1    0.604    0.604    0.604    0.604 /usr/lib/python2.5/commands.py:50(getstatusoutput)
     2999    0.536    0.000    0.539    0.000 /usr/lib/python2.5/site-packages/sqlobject/sqlite/sqliteconnection.py:177(_executeRetry)
   105899    0.448    0.000    0.773    0.000 Modules/pyexpat.c:871(Default)
       60    0.327    0.005    1.102    0.018 /usr/lib/python2.5/site-packages/kid/parser.py:343(_buildForeign)
   105899    0.325    0.000    0.325    0.000 /usr/lib/python2.5/site-packages/kid/parser.py:452(_default)
     3396    0.280    0.000    0.420    0.000 /usr/lib/python2.5/site-packages/cherrypy/config.py:107(get)
     2965    0.263    0.000    0.263    0.000 /usr/lib/python2.5/logging/__init__.py:364(formatTime)
44964/6587    0.238    0.000    0.252    0.000 /usr/lib/python2.5/site-packages/kid/parser.py:156(_pull)
       60    0.116    0.002    0.116    0.002 /usr/lib/python2.5/site-packages/kid/compiler.py:38(py_compile)
     8127    0.114    0.000    0.114    0.000 /usr/lib/python2.5/site-packages/cherrypy/_cputil.py:311(lower_to_camel)
     8982    0.110    0.000    0.137    0.000 /usr/lib/python2.5/site-packages/sqlobject/dbconnection.py:902(__getattr__)
13740/4044    0.108    0.000    2.176    0.001 /usr/lib/python2.5/site-packages/kid/parser.py:209(_coalesce)
24353/4026    0.107    0.000    2.143    0.001 /usr/lib/python2.5/site-packages/kid/parser.py:174(_track)
     3170    0.093    0.000    0.398    0.000 /usr/lib/python2.5/logging/__init__.py:405(format)
        1    0.082    0.082    0.082    0.082 /usr/lib/python2.5/site-packages/rpm/__init__.py:5()
     4777    0.081    0.000    1.320    0.000 /usr/lib/python2.5/site-packages/kid/serialization.py:564(generate)
  759/176    0.074    0.000    0.210    0.001 /usr/lib/python2.5/sre_parse.py:385(_parse)



posted at: 09:40 | link | Tags: , , , , | 0 comments

Sat, 01 Sep 2007

Recovering a Pyblosxom blog using liferea's RSS cache

My buddy who used to host lewk.org didn't pay his bills, so his server got taken down last week. What sucks is I that never backed up my Pyblosxom data. What doesn't suck is that thankfully Liferea, my RSS reader, did for me.

Grepping through ~/.liferea_1.2/cache/feeds, I was able to find my blog cached in some XML format. Then I wrote a little bit of code to re-create my Pyblosxom entry structure with the proper filenames and timestamps.

#!/usr/bin/python -tt
"""
 Turns XML into pyblosxom blog entries.

 It parses BLOG_XML pulling out blog entires in the form of:

     <feed version="1.1">
       <item>
         <title></title>
         <description></description>
         <source>http://foo.com/blog/2007/08/20/bar.html</source>
         <time>1187621268</time>
       </item>
     </feed>

 The file '2007/08/20/bar.txt' will be created in pyblosxom format with
 the appropriate timestamp.  The #mdate is used by the pyblosxom.vim plugin.

     title
     #mdate Aug 20 10:47:48 2007
     <p>description</p>
"""

import os
import time

try: from xml.etree import cElementTree
except ImportError: import cElementTree
iterparse = cElementTree.iterparse

entries = {} # { 'title' : <Element> }

BLOG_XML = 'blog.xml'
BLOG_ROOT = 'http://foo.com/blog/'

def getField(elem, field):
    for child in elem:
        if child.tag == field:
            return child.text

## Pull out all feed items, removing older duplicates
for event, elem in iterparse(BLOG_XML):
    if elem.tag == 'feed':
        for child in elem:
            if child.tag == 'item':
                title = getField(child, 'title')
                if entries.has_key(title):
                    if int(getField(child, 'time')) > \
                       int(getField(entries[title], 'time')):
                        entries[title] = child
                else:
                    entries[title] = child

for title, entry in entries.items():
    source = getField(entry, 'source').replace(BLOG_ROOT, '')
    source = source.replace('.html', '.txt')
    if not os.path.isdir(os.path.dirname(source)):
        os.makedirs(os.path.dirname(source))
    output = file(source, 'w')
    output.write(title + '\n')
    mtime = time.localtime(int(getField(entry, 'time')))
    mdate = time.strftime("%b %e %H:%M:%S %Y", mtime)
    output.write("#mdate %s\n" % mdate)
    output.write("<p>%s</p>\n" % getField(entry, 'description'))
    output.close()
    timestamp = time.strftime("%y%m%d%H%M", mtime)
    os.system("touch -t %s %s" % (timestamp, source))

It also adds an #mdate tag into each entry, which read by the spiffy pyblosxom mdate vim hack that Jordan Sissel wrote to restore each entries original timestamp after editing. His code only works on FreeBSD at the moment, so I started a pyblosxom.vim plugin that works on Linux (hopefully it will eventually support both, along with a bunch of other handy functions). You can find all of this code in my mercurial repo: hg.lewk.org/xml2pyblosxom



posted at: 11:44 | link | Tags: , , | 0 comments

Sat, 19 May 2007

Security LiveCD

So last week I created an initial version of a potential Fedora Security LiveCD spin. The goal is to provide a fully functional livecd based on Fedora for use in security auditing, penetration testing, and forensics. I created it as a bonus project for my Security Auditing class (instead of following the 5-pages of instructions on how to create a Gentoo livecd that she handed out (mad props to davidz for creating an amazing LiveCD tool)), but it has the potential to be extremely useful and also help increase the number and quality of Fedora's security tools. I threw in all of the tools I could find that already exist in Fedora, but I'm sure I'm missing a bunch, so feel free to send patches or suggestions. I also added a Wishlist of packages that I would eventually like to see make their way in Fedora, after the core->extras merge reviews are done.

I would eventually like to see Fedora offer a LiveCD that puts all of the existing linux security livecds to shame. We have quite a ways to go, but this is a start. I'm taking a computer forensics class next quarter, so I will be expanding it to fit the needs of our class as well.



posted at: 14:15 | link | Tags: , , , , | 0 comments

Wed, 14 Feb 2007

break

So my Thanksgiving break was far from a break. I spent a couple of days last week at Red Hat's westford office before heading back up to RIT to start a new quarter. In my two days in the office I was able to touch base with a bunch of people, and get a bunch of stuff done as well. I had a long discussion with dmalcom about integrating the Fedora Updates System with Beaker/TableCloth. He also gave me a quick rundown on a bunch of the Red Hat QA infrastructure that is currently being used. Ideally we'd like to be able to crunch all package updates through an automated test system before pushing them out to the world. Involvement needed: FedoraTesting.

Later that day I met with jrb and jkeating about getting a package updating system in place for a new Red Hat product that is going out the door very soon. This means that much work will be going into the new UpdatesSystem in the near future, which means I get to dig deeper into the world of TurboGears :)

On thursday I cranked a bunch of code out, but was fairly distracted most of the time by the OLPC laptops that were lying around the office. I must say, it is an absolutely incredible machine. The screen is gorgeous, and it's camera is very impressive. I hung around later at the office for an OLPC hackfest that was going down.

I was busy working on the updates system most of the time, but then later on I started looking into some Python start-up issues, which can be seen by doing:

	strace python 2>&1 | grep ENOENT
You'll notice a ton of syscalls like the following, which try to open/stat modules in locations that do not exist:
stat64("/usr/lib/python24.zip/posixpath", 0xbfdb5094) = -1 ENOENT (No such file or directory)
PrivoxyWindowOpen("/usr/lib/python24.zip/posixpath.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
PrivoxyWindowOpen("/usr/lib/python24.zip/posixpathmodule.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
PrivoxyWindowOpen("/usr/lib/python24.zip/posixpath.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
PrivoxyWindowOpen("/usr/lib/python24.zip/posixpath.pyc", O_RDONLY|O_LARGEFILE) = -1 ENOENT (N o such file or directory)
stat64("/usr/lib/python2.4/posixpath", 0xbfdb5094) = -1 ENOENT (No such file or directory)
PrivoxyWindowOpen("/usr/lib/python2.4/posixpath.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No su ch file or directory)
PrivoxyWindowOpen("/usr/lib/python2.4/posixpathmodule.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
PrivoxyWindowOpen("/usr/lib/python2.4/posixpath.py", O_RDONLY|O_LARGEFILE) = 5 

So it's obvious that modules could exist in multiple locations, but if you are repeatedly going to check a series of directories, such as /usr/lib/python24.zip, wouldn't it be a *bit* smarter to check if they exists first, and then avoid checking there in the future? Doing so would help cut down from the 233+ syscalls python makes while starting up looking for modules. I really don't have any free cycles to try and add some sense into Python, so I really hope someone can beat me to a patch.


TurboGears 1.0b2


I came back home to find the new TurboGears book in my mailbox, which has been extremely informative, aside from the fact that the project has awesome online docs as well. I pushed out the latest TurboGears release, 1.0b2, for FC6 and rawhide yesterday as well.



posted at: 21:12 | link | Tags: , , , , | 0 comments