Mon, 24 Mar 2008
PyCon 2008
I was in Chicago last week for PyCon 2008. It was my first time in the windy city, and I must say that I was thoroughly impressed. As expected in any city, we got a chance to see a lady get her purse snattched, and a mentally unstable gentleman on the train yelling profanities at god. Anyway, the conference itself was extremely well done, and tons of awesome innovation happened at the sprints afterwords.
Day 1: Tutorials
8+ hours of TurboGears/Pylons/WSGI tutorials. Awesome. I'm really
excited with what is in the works for TurboGears2. By wielding Pylons, the
TG2 team was able to completely re-write their framework with minimal amounts
of code, while at the same time, gaining a *ton* of new features
and some amazing middleware. Mark Ramm and Ben Bangert took turns walking us through the
deep internals of their frameworks, while also giving some examples how to use
them.
Sessions
During the 3-day conference portion of PyCon, there was a vast plethora of
incredibly interesting sessions and conversations. You can find a schedule of
the talks and some slides here. Everything was
video taped as well, so the sessions should be making their way on to YouTube
hopefully at some point soon.
Here are some things that caught my attention while I was there.
WSGI
Defined by Phillip J. Eby in PEP-333, the Web Server Gateway Interface is a simple interface between web servers, applications, and frameworks. Or, as explained by Ian Bicking: WSGI is a series of Tubes. The basic idea is that it lets you connect a bunch of different applications together into a functioning whole.
Since TurboGears2 is based on Pylons, it will be a full blown WSGI application out the box, loaded with lots of useful middleware (WebError, Routes, Sessions, Caching, etc), and will allow you to use any WSGI server that you wish (Paste, CherryPy, orbited, mod_wsgi, etc).
An example of a basic Hello World WSGI application:
def wsgi_app(environ, start_response):
start_response('200 OK', [('content-type', 'text/html')])
return ['Hello world!']
So, what is WSGI middleware? Well, it's essentially the WSGI equivalent of a python decorator, but instead of wrapping one function in another, you're wrapping one web-app in another. You can see a list of some existing WSGI middleware here.
virtualenv
With so many new shiny python programs to play with, I really tried to resist
the urge to easy_install everything into my global Python site-packages so I
could tinker with things. This is generally a Bad Thing in a distribution, as
easy_install not only installs things behind your package managers back,
but it also lacks the ability to uninstall anything with it, unless you want to take Zed's easy_fucking_uninstall
approach ;) During the TurboGears tutorial, I was introduced to a tool call
virtualenv, which will setup a virtual python environment in which you can
easy_install as many eggs as you want without worrying about butchering
your site-packages.
$ easy_install virtualenv
$ virtualenv --no-site-packages foo
$ cd foo; source bin/activate
$ easy_install <shiny python programs>
nose
I've been in love with nose since day
one, but realized that I haven't been utilizing it to it's fullest abilities.
I blogged in the past about nose's
profiler plugin. Come to find out, nose offers a lot more plugins that can
seriously help make your life easier:
$ nosetests --pdb --pdb-failures
.............................................................> /home/lmacken/tg1.1/turbogears/turbogears/identity/tests/test_visit.py(92)test_cookie_permanent()
-> assert abs(should_expire - expires) < 3
(Pdb) locals()
{'morsel': <Morsel: tg-visit='452c94de3900fc2adff2cd6b0b0f04c4533e3e9e'>, 'self': <turbogears.identity.tests.test_visit.TestVisit testMethod=test_cookie_permanent>, 'expires': 1206228604.0, 'should_expire': 1206232205.0, 'permanent': False}
(Pdb)
You can also measure code coverage during your unit test execution using the '--with-coverage' option, which utilizes coverage.py.
SQLAlchemy
Also known as "the greatest object-relational-mapper created for any language. ever.", 0.4 has seen vast improvements since 0.3. Among them, a new declarative
API is now available that essentially lets you define your class, Table and
mapper constructs "at once" under a single class declaration (giving you a
similar ActiveMapper feel like SQLObject or Elixir).
from sqlalchemy.ext.declarative import declarative_base
engine = create_engine('sqlite://')
Base = declarative_base(engine)
class SomeClass(Base):
__tablename__ = 'some_table'
id = Column('id', Integer, primary_key=True)
name = Column('name', String(50))
Unicode, demystified.
By far, the most frustrating problems I've ever encountered in Python have been
unicode related. I was fortunate enough to catch Kumar McMillan's
presentation, "Unicode in Python, Completely Demystified". This presentation
helped enlighten many on the concept of unicode, clear up many misconceptions,
and explain how to handle it properly in Python. Check out his slides for more details, but
the general idea here is to follow these three rules:
- decode early
- unicode everywhere
- encode late
def to_unicode_or_bust(obj, encoding='utf-8'):
if isinstance(obj, basestring):
if not isinstance(obj, unicode):
obj = unicode(obj, encoding)
return obj
Later that night I went and shined some light on some dark corners of certain projects that I've been working on to try and handle unicode the Right Way.
Grassyknoll
After the code sprints, I got a chance to see these guys show off their hard
work. grassyknoll is a
search engine written in Python. With the ability to handle multiple backends,
frontends, and wire formats, grassyknoll has a ton of potential to
revolutionize the open source search engine. There has been recent talk in
Fedora land about what kind of search engine to use, and I think grassyknoll is
definitley a viable option.
Packaging BOF
Toshio, Spot, and I attended a Packaging BOF where we discussed our
experiences with distutils and setuptools with a bunch of people from various
companies and distros. This then sparked discussions on python-dev and the
distutils-sig mailing lists. You can also find the details of the BOF session
on the Python wiki. There
is definitely a lot of energy behind this, so hopefully we'll see some good changes
in setuptools in the near future that will make our lives as distro packagers much easier :)
Orbited
Orbited is an HTTP daemon that is optimized for long-lasting comet
connections. This allows you to write real-time web applications with
ease. For example, embeding an irc channel anywhere:
You can also use orbited as a WSGI server! Toshio did some brief benchmarking of of CherryPy{2,3}, Paste, and Orbited WSGI servers, and orbited seemed to be the clear winner in all scenerios. There is a good chance that we will be using orbited to handle our comet widgets within MyFedora :)
Code SprintsI stayed the entire time for the code sprints, and mainly focused on TurboGears hacking. This is what I ended up working on:
- Added SQLAlchemy support to turbogears.testutil.DBTest (Ticket #1764). When you inherit from this class, it will automatically set up and tear down your SQLObject or SQLAlchemy database before and after each of your unit tests.
- Added a FlotWidget using ToscaWidgets to twTools This widget allows you to create attractive graphs with ease.
- Made the TurboGears2 templating engine configurable (Ticket #1680). Things were hardcoded to use genshi; this is no longer the case.
- WebTest integration for unit test (Ticket #1762). I wrote a some high level unit testing classes that wrap a WebTest object around your WSGI app. This gives you an extremely powerful API to write "framework independent" unit tests. The WebTest.get/post methods simply return WebOb objects, which allow for drastic simplification of your unittests. This also helped decouple the TG testutils from using CherryPy internals (one step closer to CherryPy3 support in TurboGears). As I mentioned on the TurboGears-trunk list, these changes will make writing unit tests a breeze:
class TestPages(testutil.DBWebTest):
def test_forbidden(self):
self.app.get('/hot_action', status=403)
def test_webob_response(self):
user = User(user_name=u"test", password=u"test")
self.login_user(user)
res = self.app.get('/hot_action')
assert "Hot WSGI action" in res
assert res.namespace['tg_flash'] == u'Hot WSGI action'
The WebTest integration is planned to hit in the TurboGears 1.1 release, deprecating testutils.{call,create_request}.
Want to read more blog posts about PyCon 2008? You can find links to lots of PyCon related posts here and on Planet Python.
posted at: 17:05 | link | | 1 comments
Wed, 19 Dec 2007
TurboFlot 0.0.1
In an effort to clean up bodhi's metrics code a bit, I wrote a TurboFlot plugin that allows you to wield the jQuery plugin flot inside of TurboGears applications. The code is quite trivial -- it's essentially just a TurboGears JSON proxy to the jQuery flot plugin. Breaking this code out into it's own widget makes it really easy to generate shiny graphs in a Pythonic fashon, without having to write a line of javascript.

Check out the README to see the code for the example above.
To use TurboFlot in your own application, you just pass your data and graph options to the widget, and then throw it up to your template. Read the flot API documentation for details on all of the arguments. Here is a simple usage example:
flot = TurboFlot([
{
'data' : [[0, 3], [4, 8], [8, 5], [9, 13]],
'lines' : { 'show' : True, 'fill' : True }
}],
{
'grid' : { 'backgroundColor' : '#fffaff' },
'yaxis' : { 'max' : '850' }
}
)
Then, to display the widget in your template, you simply use:
${flot.display()}
The code for the widget itself is pretty simple. It just takes your data and graph options, encodes them as JSON and tosses them at flot.
class TurboFlot(Widget):
"""
A TurboGears Flot Widget.
"""
template = """
<div xmlns:py="http://purl.org/kid/ns#" id="turboflot"
style="width:${width};height:${height};">
<script>
$.plot($("#turboflot"), ${data}, ${options});
</script>
</div>
"""
params = ["data", "options", "height", "width"]
javascript = [JSLink('turboflot', 'excanvas.js'),
JSLink("turboflot", "jquery.js"),
JSLink("turboflot", "jquery.flot.js")]
def __init__(self, data, options={}, height="300px", width="600px"):
self.data = simplejson.dumps(data)
self.options = simplejson.dumps(options)
self.height = height
self.width = width
You can download the latest releases from the Python Package Index:
http://pypi.python.org/pypi/TurboFlotOr you can grab my latest development tree out of mercurial:
http://hg.lewk.org/TurboFlotAs always, patches are welcome :)
posted at: 14:21 | link | | 1 comments
Sat, 08 Dec 2007
Fedora update metrics
Using flot, a plotting library for jQuery, I threw together some shiny metrics for bodhi. It's pretty amazing to see how a Fedora release evolves over time, with almost as many enhancements as bugfixes. This could arguably be a bad thing, as our "stable" bits seem to change so much; but it definitely shows how much innovation is happening in Fedora.
I should also note that the data on the graphs may look different than the numbers you see next to each category in the bodhi menu. This is due to the fact that updates may contain multiple builds, and the graphs account for all builds in the system.
When I get some free cycles I'd like to generate some metrics from the old updates system for FC4-FC6. I can imagine that the differences will be pretty drastic, considering how the old updates tool was internal to Red Hat, and that the majority of our top packagers are community folks.
posted at: 19:05 | link | | 0 comments
Mon, 01 Oct 2007
Use your Nose!
Every programmer out there [hopefully] knows that unittests are an essential part of any growing body of code, especially in the open source world. However, most hackers out either never write test cases (let alone comments), or usually put them off until "later" (aka: never). Having to deal with Java and JUnit tests in college not only made me not want to write unit tests, but it made me want to kill myself and everyone around me. Thankfully, I learned Python.
So, I just happen to maintain a piece of software in Fedora called nose (which lives in the python-nose package). Nose is a discovery-based unittest extension for Python, and is also a part of the TurboGears stack. If you're hacking on a TurboGears project, the turbogears.testutil module provides some incredibly useful features that make writing tests powerfully trivial.
For example, in the code below (taken from bodhi), I create a test case that utilizes a fresh SQLite database in memory. Inheriting from the the testutil.DBTest parent class, this database will be created and torn down automagically before and after each test case is run -- ensuring that my tests are executed in complete isolation. With this example, I wrote a test case to ensure that unauthenticated people cannot create a new update.
import urllib, cherrypy
from turbogears import update_config, database, testutil, url
update_config(configfile='dev.cfg', modulename='bodhi.config')
database.set_db_uri("sqlite:///:memory:")
class TestControllers(testutil.DBTest):
def test_unauthenticated_update(self):
params = {
'builds' : 'TurboGears-1.0.2.2-2.fc7',
'release' : 'Fedora 7',
'type' : 'enhancement',
'bugs' : '1234 5678',
'cves' : 'CVE-2020-0001',
'notes' : 'foobar'
}
path = url('/save?' + urllib.urlencode(params))
testutil.createRequest(path, method='POST')
assert "You must provide your credentials before accessing this resource." in cherrypy.response.body[0]
In the above example, the TestControllers class is automatically detected by nose, which then executes each method that begins with the word 'test'. To run your unittests, just type 'nosetests'.
[lmacken@tomservo bodhi]$ nosetests
.................................
----------------------------------------------------------------------
Ran 33 tests in 16.798s
OK
Now, for the fun part. Nose comes equipped with a profiling plugin that will profile your test cases using Python's hotshot module.
So, I went ahead and added a 'profile' target to bodhi's Makefile:
profile:
nosetests --with-profile --profile-stats-file=nose.prof
python -c "import hotshot.stats ; stats = hotshot.stats.load('nose.prof') ; stats.sort_stats('time', 'calls') ; stats.print_stats(20)"
Now, typing 'make profile' will execute and profile all of our unit tests, and spit out the top 20 method calls -- ordered by internal time and call count.
[lmacken@tomservo bodhi]$ make profile
nosetests --with-profile --profile-stats-file=nose.prof
.................................
----------------------------------------------------------------------
Ran 33 tests in 42.878s
OK
python -c "import hotshot.stats ; stats = hotshot.stats.load('nose.prof') ; stats.sort_stats('time', 'calls') ; stats.print_stats(20)"
800986 function calls (702850 primitive calls) in 42.878 CPU seconds
Ordered by: internal time, call count
List reduced from 3815 to 20 due to restriction <20>
ncalls tottime percall cumtime percall filename:lineno(function)
14 13.675 0.977 13.675 0.977 /usr/lib/python2.5/socket.py:71(ssl)
31 10.683 0.345 10.683 0.345 /usr/lib/python2.5/httplib.py:994(_read)
2478/2429 9.297 0.004 9.677 0.004 :1()
1 0.604 0.604 0.604 0.604 /usr/lib/python2.5/commands.py:50(getstatusoutput)
2999 0.536 0.000 0.539 0.000 /usr/lib/python2.5/site-packages/sqlobject/sqlite/sqliteconnection.py:177(_executeRetry)
105899 0.448 0.000 0.773 0.000 Modules/pyexpat.c:871(Default)
60 0.327 0.005 1.102 0.018 /usr/lib/python2.5/site-packages/kid/parser.py:343(_buildForeign)
105899 0.325 0.000 0.325 0.000 /usr/lib/python2.5/site-packages/kid/parser.py:452(_default)
3396 0.280 0.000 0.420 0.000 /usr/lib/python2.5/site-packages/cherrypy/config.py:107(get)
2965 0.263 0.000 0.263 0.000 /usr/lib/python2.5/logging/__init__.py:364(formatTime)
44964/6587 0.238 0.000 0.252 0.000 /usr/lib/python2.5/site-packages/kid/parser.py:156(_pull)
60 0.116 0.002 0.116 0.002 /usr/lib/python2.5/site-packages/kid/compiler.py:38(py_compile)
8127 0.114 0.000 0.114 0.000 /usr/lib/python2.5/site-packages/cherrypy/_cputil.py:311(lower_to_camel)
8982 0.110 0.000 0.137 0.000 /usr/lib/python2.5/site-packages/sqlobject/dbconnection.py:902(__getattr__)
13740/4044 0.108 0.000 2.176 0.001 /usr/lib/python2.5/site-packages/kid/parser.py:209(_coalesce)
24353/4026 0.107 0.000 2.143 0.001 /usr/lib/python2.5/site-packages/kid/parser.py:174(_track)
3170 0.093 0.000 0.398 0.000 /usr/lib/python2.5/logging/__init__.py:405(format)
1 0.082 0.082 0.082 0.082 /usr/lib/python2.5/site-packages/rpm/__init__.py:5()
4777 0.081 0.000 1.320 0.000 /usr/lib/python2.5/site-packages/kid/serialization.py:564(generate)
759/176 0.074 0.000 0.210 0.001 /usr/lib/python2.5/sre_parse.py:385(_parse)
posted at: 09:40 | link | | 0 comments
Sat, 01 Sep 2007
Recovering a Pyblosxom blog using liferea's RSS cache
My buddy who used to host lewk.org didn't pay his bills, so his server got taken down last week. What sucks is I that never backed up my Pyblosxom data. What doesn't suck is that thankfully Liferea, my RSS reader, did for me.
Grepping through ~/.liferea_1.2/cache/feeds, I was able to find my blog cached in some XML format. Then I wrote a little bit of code to re-create my Pyblosxom entry structure with the proper filenames and timestamps.
#!/usr/bin/python -tt
"""
Turns XML into pyblosxom blog entries.
It parses BLOG_XML pulling out blog entires in the form of:
<feed version="1.1">
<item>
<title></title>
<description></description>
<source>http://foo.com/blog/2007/08/20/bar.html</source>
<time>1187621268</time>
</item>
</feed>
The file '2007/08/20/bar.txt' will be created in pyblosxom format with
the appropriate timestamp. The #mdate is used by the pyblosxom.vim plugin.
title
#mdate Aug 20 10:47:48 2007
<p>description</p>
"""
import os
import time
try: from xml.etree import cElementTree
except ImportError: import cElementTree
iterparse = cElementTree.iterparse
entries = {} # { 'title' : <Element> }
BLOG_XML = 'blog.xml'
BLOG_ROOT = 'http://foo.com/blog/'
def getField(elem, field):
for child in elem:
if child.tag == field:
return child.text
## Pull out all feed items, removing older duplicates
for event, elem in iterparse(BLOG_XML):
if elem.tag == 'feed':
for child in elem:
if child.tag == 'item':
title = getField(child, 'title')
if entries.has_key(title):
if int(getField(child, 'time')) > \
int(getField(entries[title], 'time')):
entries[title] = child
else:
entries[title] = child
for title, entry in entries.items():
source = getField(entry, 'source').replace(BLOG_ROOT, '')
source = source.replace('.html', '.txt')
if not os.path.isdir(os.path.dirname(source)):
os.makedirs(os.path.dirname(source))
output = file(source, 'w')
output.write(title + '\n')
mtime = time.localtime(int(getField(entry, 'time')))
mdate = time.strftime("%b %e %H:%M:%S %Y", mtime)
output.write("#mdate %s\n" % mdate)
output.write("<p>%s</p>\n" % getField(entry, 'description'))
output.close()
timestamp = time.strftime("%y%m%d%H%M", mtime)
os.system("touch -t %s %s" % (timestamp, source))
It also adds an #mdate tag into each entry, which read by the spiffy pyblosxom mdate vim hack that Jordan Sissel wrote to restore each entries original timestamp after editing. His code only works on FreeBSD at the moment, so I started a pyblosxom.vim plugin that works on Linux (hopefully it will eventually support both, along with a bunch of other handy functions). You can find all of this code in my mercurial repo: hg.lewk.org/xml2pyblosxom
posted at: 11:44 | link | | 0 comments
Sat, 19 May 2007
Security LiveCD
So last week I created an initial version of a potential Fedora Security LiveCD spin. The goal is to provide a fully functional livecd based on Fedora for use in security auditing, penetration testing, and forensics. I created it as a bonus project for my Security Auditing class (instead of following the 5-pages of instructions on how to create a Gentoo livecd that she handed out (mad props to davidz for creating an amazing LiveCD tool)), but it has the potential to be extremely useful and also help increase the number and quality of Fedora's security tools. I threw in all of the tools I could find that already exist in Fedora, but I'm sure I'm missing a bunch, so feel free to send patches or suggestions. I also added a Wishlist of packages that I would eventually like to see make their way in Fedora, after the core->extras merge reviews are done.
I would eventually like to see Fedora offer a LiveCD that puts all of the existing linux security livecds to shame. We have quite a ways to go, but this is a start. I'm taking a computer forensics class next quarter, so I will be expanding it to fit the needs of our class as well.
posted at: 14:15 | link | | 0 comments
Wed, 14 Feb 2007
break
So my Thanksgiving break was far from a break. I spent a couple of days last week at Red Hat's westford office before heading back up to RIT to start a new quarter. In my two days in the office I was able to touch base with a bunch of people, and get a bunch of stuff done as well. I had a long discussion with dmalcom about integrating the Fedora Updates System with Beaker/TableCloth. He also gave me a quick rundown on a bunch of the Red Hat QA infrastructure that is currently being used. Ideally we'd like to be able to crunch all package updates through an automated test system before pushing them out to the world. Involvement needed: FedoraTesting.
Later that day I met with jrb and jkeating about getting a package updating system in place for a new Red Hat product that is going out the door very soon. This means that much work will be going into the new UpdatesSystem in the near future, which means I get to dig deeper into the world of TurboGears :)
On thursday I cranked a bunch of code out, but was fairly distracted most of the time by the OLPC laptops that were lying around the office. I must say, it is an absolutely incredible machine. The screen is gorgeous, and it's camera is very impressive. I hung around later at the office for an OLPC hackfest that was going down.
|
|
I was busy working on the updates system most of the time, but then later on I started looking into some Python start-up issues, which can be seen by doing:
You'll notice a ton of syscalls like the following, which try to open/stat modules in locations that do not exist:
strace python 2>&1 | grep ENOENT
stat64("/usr/lib/python24.zip/posixpath", 0xbfdb5094) = -1 ENOENT (No such file or directory)
PrivoxyWindowOpen("/usr/lib/python24.zip/posixpath.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
PrivoxyWindowOpen("/usr/lib/python24.zip/posixpathmodule.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
PrivoxyWindowOpen("/usr/lib/python24.zip/posixpath.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
PrivoxyWindowOpen("/usr/lib/python24.zip/posixpath.pyc", O_RDONLY|O_LARGEFILE) = -1 ENOENT (N o such file or directory)
stat64("/usr/lib/python2.4/posixpath", 0xbfdb5094) = -1 ENOENT (No such file or directory)
PrivoxyWindowOpen("/usr/lib/python2.4/posixpath.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No su ch file or directory)
PrivoxyWindowOpen("/usr/lib/python2.4/posixpathmodule.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
PrivoxyWindowOpen("/usr/lib/python2.4/posixpath.py", O_RDONLY|O_LARGEFILE) = 5
So it's obvious that modules could exist in multiple locations, but if you are repeatedly going to check a series of directories, such as /usr/lib/python24.zip, wouldn't it be a *bit* smarter to check if they exists first, and then avoid checking there in the future? Doing so would help cut down from the 233+ syscalls python makes while starting up looking for modules. I really don't have any free cycles to try and add some sense into Python, so I really hope someone can beat me to a patch.
TurboGears 1.0b2

I came back home to find the new TurboGears book in my mailbox, which has been extremely informative, aside from the fact that the project has awesome online docs as well. I pushed out the latest TurboGears release, 1.0b2, for FC6 and rawhide yesterday as well.
posted at: 21:12 | link | | 0 comments



