Thu, 07 Mar 2013

Keeping your finger on the pulse of the Fedora community

For those who haven't been keeping up with all of the awesome code Ralph Bean has been churning out lately, be sure to checkout fedmsg.com. Hop on #fedora-fedmsg on Freenode or load up busmon to see it in action. Not all of the Fedora Infrastructure services currently fire off fedmsgs, but we're getting very close.

This technology is built on top of Moksha, which I created many years ago while writing the first version of the fedoracommunity app. It's come a long way since then, and now can speak ØMQ over WebSockets, as well as AMQP and STOMP over Orbited. Now the time has finally come to bring Moksha to the desktop!

Introducing fedmsg-notify

fedmsg-notify lets you get realtime desktop notifications of activity within the Fedora community. It allows you to tap into the firehose of contributions as they happen and filter them to your liking. It works with any window manager that supports the notification-spec, however I've only seen the gravatars show up using GNOME.

For GNOME Shell users, you can [optionally] install gnome-shell-extension-fedmsg, and then enable it with the gnome-tweak-tool or by running `gnome-shell-extension-tool -e fedmsg@lmacken-redhat.com` (and then hit alt+f2 and type 'r' to reload the shell). You will then be graced with the presence of The Bus:

For those who aren't running GNOME shell, you can simply yum install fedmsg-notify, and then run fedmsg-notify-config, or launch it from your Settings menu. Due to a dependency on Twisted's gtk3reactor, fedmsg-notify is currently only available on Fedora 18 and newer.

The first tab shows you all services that are currently hooked into fedmsg. As we add new ones, the gui will automatically display them. These services are defined in the fedmsg_meta_fedora_infrastructure package.

The Advanced tab lets you further customize what messages you want to see. The "Bugs that you have encountered" option will display all messages that reference any Bugzilla numbers for crashes that you have hit locally with ABRT. The other filters involve querying your local yum database or the PackageDB.

Under the hood

The fedmsg-notify-daemon itself is fairly minimal (see daemon.py). At it's core, it's just a Twisted reactor that consumes ØMQ messages. Moksha does all of the heavy lifting behind the scenes, so all we really have to do is specify a topic to subscribe to and define a consume method that gets called with each message. This is essentially just a basic Moksha Consumer with some fedmsg + DBus glue.

class FedmsgNotifyService(dbus.service.Object, fedmsg.consumers.FedmsgConsumer):
    topic = 'org.fedoraproject.*'

    def consume(self, msg): 

The daemon will automatically startup upon login, or will get activated by DBus when enabled via the GUI. When a message arrives, it filters it accordingly, downloads & caches the icons, [optionally] relays the message over DBus, and then displays the notification on your desktop.

The API for writing custom filters is dead simple (see filters.py). Here is an example of one:

class MyPackageFilter(Filter):
    """ Matches messages regarding packages that a given user has ACLs on """
    __description__ = 'Packages that these users maintain'
    __user_entry__ = 'Usernames'

    def __init__(self, settings):
        self.usernames = settings.replace(',', ' ').split()
        self.packages = set()
        reactor.callInThread(self._query_pkgdb)

    def _query_pkgdb(self):
        for username in self.usernames:
            log.info("Querying the PackageDB for %s's packages" % username)
            for pkg in PackageDB().user_packages(username)['pkgs']:
                self.packages.add(pkg['name'])

    def match(self, msg, processor):
        packages = processor.packages(msg)
        for package in self.packages:
            if package in packages:
                return True
The fedmsg-notify-config interface (see gui.py), automatically introspects the filters and populates the Advanced tab with the appropriate labels, switches, and text entries.

Consuming fedmsg over DBus

Let's say you want to write an application that listens to fedmsg, but you don't want to deal with spinning up your own connection, or you're not using Python, etc. For these cases, fedmsg-notify supports relaying messages over DBus. This functionality can be enabled by running `gsettings set org.fedoraproject.fedmsg.notify emit-dbus-signals true`. You can then easily listen for the MessageReceived DBus signal, like so:
import json, dbus

from gi.repository import GObject
from dbus.mainloop.glib import DBusGMainLoop

def consume(topic, body):
    print(topic)
    print(json.loads(body))

DBusGMainLoop(set_as_default=True)
bus = dbus.SessionBus()
bus.add_signal_receiver(consume, signal_name='MessageReceived',
                        dbus_interface='org.fedoraproject.fedmsg.notify',
                        path='/org/fedoraproject/fedmsg/notify')
loop = GObject.MainLoop()
loop.run()

Contributing

If you're interested in helping out with any layer of the fedmsg stack, hop in #fedora-apps, and fork it on GitHub:

Hop on the bus!


posted at: 17:30 | link | Tags: , , , | 4 comments

Sat, 21 Apr 2012

Wielding the ANU Quantum Random Number Generator

Last week Science Daily published an article that caught my attention titled 'Sounds of Silence' Proving a Hit: World's Fastest Random Number Generator. The tl;dr is that researchers at the ANU ARC Centre of Excellence for Quantum Computation and Communication Technology created a blazing fast random number generator based on quantum fluctuations in a vacuum. Thankfully, these awesome scientists are giving their data away for free, and they even provide a JSON API.

In an effort to make it simple to leverage this data, I created a new project: quantumrandom. It provides a qrandom command-line tool, a Python API, and also a /dev/qrandom Linux character device.

Installing

$ virtualenv env
$ source env/bin/activate
$ pip install quantumrandom

Using the command-line tool

$ qrandom --int --min 5 --max 15
7
$ qrandom --binary
���I�%��e(�1��c��Ee�4�������j�Կ��=�^H�c�u
oq��G��Z�^���fK�0_��h��s�b��AE=�rR~���(�^
+a�a̙�IB�,S�!ꀔd�2H~�X�Z����R��.f
...
$ qrandom --hex
1dc59fde43b5045120453186d45653dd455bd8e6fc7d8c591f0018fa9261ab2835eb210e8
e267cf35a54c02ce2a93b3ec448c4c7aa84fdedb61c7b0d87c9e7acf8e9fdadc8d68bcaa5a
...

Creating /dev/qrandom

quantumrandom comes equipped with a multi-threaded character device in userspace. When read from, this device fires up a bunch of threads to fetch data. Not only can you utilize this as a rng, but you can also feed this data back into your system's entropy pool.

In order to build it's dependencies, you'll need the following packages installed: svn gcc-c++ fuse-devel gccxml libattr-devel. On Fedora 17 and newer, you'll also need the kernel-modules-extra package installed for the cuse module.

pip install ctypeslib hg+https://cusepy.googlecode.com/hg
sudo modprobe cuse
sudo chmod 666 /dev/cuse
qrandom-dev -v
sudo chmod 666 /dev/qrandom
By default it will use 3 threads, which can be changed by passing '-t #' into the qrandom-dev.

Testing the randomness for FIPS 140-2 compliance

$ cat /dev/qrandom | rngtest --blockcount=1000
rngtest: bits received from input: 20000032
rngtest: FIPS 140-2 successes: 1000
rngtest: FIPS 140-2 failures: 0
rngtest: FIPS 140-2(2001-10-10) Monobit: 0
rngtest: FIPS 140-2(2001-10-10) Poker: 0
rngtest: FIPS 140-2(2001-10-10) Runs: 0
rngtest: FIPS 140-2(2001-10-10) Long run: 0
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
rngtest: input channel speed: (min=17.696; avg=386.711; max=4882812.500)Kibits/s
rngtest: FIPS tests speed: (min=10.949; avg=94.538; max=161.640)Mibits/s
rngtest: Program run time: 50708319 microseconds

Adding entropy to the Linux random number generator

sudo rngd --rng-device=/dev/qrandom --random-device=/dev/random --timeout=5 --foreground

Monitoring your available entropy levels

watch -n 1 cat /proc/sys/kernel/random/entropy_avail

Python API

The quantumrandom Python module contains a low-level get_data function, which is modelled after the ANU Quantum Random Number Generator's JSON API. It returns variable-length lists of either uint16 or hex16 data.

>>> quantumrandom.get_data(data_type='uint16', array_length=5)
[42796, 32457, 9242, 11316, 21078]
>>> quantumrandom.get_data(data_type='hex16', array_length=5, block_size=2)
['f1d5', '0eb3', '1119', '7cfd', '64ce']

Based on this get_data function, quantumrandom also provides a bunch of higher-level helper functions that make easy to perform a variety of tasks.

>>> quantumrandom.randint(0, 20)
5
>>> quantumrandom.hex()[:10]
'8272613343'
>>> quantumrandom.binary()[0]
'\xa5'
>>> len(quantumrandom.binary())
10000
>>> quantumrandom.uint16()
numpy.array([24094, 13944, 22109, 22908, 34878, 33797, 47221, 21485, 37930, ...], dtype=numpy.uint16)
>>> quantumrandom.uint16().data[:10]
'\x87\x7fY.\xcc\xab\xea\r\x1c`'

Follow quantumrandom on GitHub: https://github.com/lmacken/quantumrandom


posted at: 16:30 | link | Tags: , , , , | 5 comments

Sat, 07 May 2011

Red Hat OpenShift Express & The Leafy Miracle

Red Hat made a lot of awesome announcements this week at The Red Hat Summit, one of which being OpenShift.

I had the opportunity to play with the internal beta for a little while now, and I must say that as a developer I am extremely impressed with the service. Just being able to git push my code into to the cloud drastically simplifies large-scale software deployment, and makes it so I don't even have to leave my development environment.

I figured out a way to get TurboGears2 and Pyramid running on OpenShift Express, and documented it here and here. After that, I proceeded to write my very first Pyramid application.

[ The Leafy Miracle ]

In memory of the proposed [and rejected] Fedora 16 codename "Beefy Miracle", this little app is called "Leafy Miracle".

leafy-miracle.rhcloud.com


[ Features & Tech ]

[ Running ]
sudo yum -y install python-virtualenv
git clone git://fedorapeople.org/~lmacken/leafymiracle && cd leafymiracle
virtualenv env && source env/bin/activate
python setup.py develop
python leafymiracle/populate.py
paster serve development.ini
[ Code ]

git clone git://fedorapeople.org/~lmacken/leafymiracle

[ Props ]

Mad props go out to RJ Bean, who helped me write this app. He is responsible for writing a ton of amazing Python widgets for various JavaScript visualization libraries. You can see some demos of them here: tw2-demos.threebean.org.


posted at: 20:20 | link | Tags: , , , , , , , , , , | 0 comments

Thu, 24 Mar 2011

git clone all of your Fedora packages

After doing a fresh Fedora 15 install on my laptop last night, I wanted to quickly clone all of the packages that I maintain. Here is a single command that does the job:

python -c "import pyfedpkg; from fedora.client.pkgdb import PackageDB; [pyfedpkg.clone(pkg['name'], '$USER') for pkg in PackageDB().user_packages('$USER')['pkgs']]"


posted at: 16:10 | link | Tags: , , | 4 comments

Sun, 13 Mar 2011

Fedora Photobooth @ SXSW

This is the first year that Fedora will have a booth at SXSW! Sadly, I am not going to be attending since it conflicts with PyCon. However, my code will be running at our booth. Usually the Fedora booth at conferences is comprised of a bunch of flyers, media, swag, and some people to help answer questions and tell the Fedora story. However at SXSW, things are going to be a little different.

Aside from the amazing flyers that Máirín created, there will also be a Fedora Photobooth. Someone (probably Spot or Jared) will be dressed in a full Tux costume, and people can come and get their photo taken with them. Spot came to me the other day and asked if I could write some code to streamline the whole process.

An hour or so later, photobooth.py was born. There are definitely lots of improvements that can be made, but here is what it currently does in its initial incarnation:

In Action
See Mo's blog for photos of this code in action at the Fedora SXSW booth!
* SXSW Expo Day 1 from the show floor
* SXSW Expo Day 2
* A Beefy, Miraculous Day at SXSW (Expo Day 3)


The Code
I threw this in a git repo and tossed it up on GitHub:
github.com/lmacken/photobooth.py
#!/usr/bin/python
# photobooth.py - version 0.3
# Requires: python-imaging, qrencode, gphoto2, surl
# Author: Luke Macken <lmacken@redhat.com>
# License: GPLv3

import os
import surl
import Image
import subprocess

from uuid import uuid4
from os.path import join, basename, expanduser

# Where to spit out our qrcode, watermarked image, and local html
out = expanduser('~/Desktop/sxsw')

# The watermark to apply to all images
watermark_img = expanduser('~/Desktop/fedora.png')

# This assumes ssh-agent is running so we can do password-less scp
ssh_image_repo = 'fedorapeople.org:~/public_html/sxsw/'

# The public HTTP repository for uploaded images
http_image_repo = 'http://lmacken.fedorapeople.org/sxsw/'

# Size of the qrcode pixels
qrcode_size = 10

# Whether or not to delete the photo after uploading it to the remote server
delete_after_upload = True

# The camera configuration
# Use gphoto2 --list-config and --get-config for more information
gphoto_config = {
    '/main/imgsettings/imagesize': 3, # small
    '/main/imgsettings/imagequality': 0, # normal
    '/main/capturesettings/zoom': 70, # zoom factor
}

# The URL shortener to use
shortener = 'tinyurl.com'

class PhotoBooth(object):

    def initialize(self):
        """ Detect the camera and set the various settings """
        cfg = ['--set-config=%s=%s' % (k, v) for k, v in gphoto_config.items()]
        subprocess.call('gphoto2 --auto-detect ' +
                        ' '.join(cfg), shell=True)

    def capture_photo(self):
        """ Capture a photo and download it from the camera """
        filename = join(out, '%s.jpg' % str(uuid4()))
        cfg = ['--set-config=%s=%s' % (k, v) for k, v in gphoto_config.items()]
        subprocess.call('gphoto2 ' +
                        '--capture-image-and-download ' +
                        '--filename="%s" ' % filename,
                        shell=True)
        return filename

    def process_image(self, filename):
        print "Processing %s..." % filename
        print "Applying watermark..."
        image = self.watermark(filename)
        print "Uploading to remote server..."
        url = self.upload(image)
        print "Generating QRCode..."
        qrcode = self.qrencode(url)
        print "Shortening URL..."
        tiny = self.shorten(url)
        print "Generating HTML..."
        html = self.html_output(url, qrcode, tiny)
        subprocess.call('firefox "%s"' % html, shell=True)
        print "Done!"

    def watermark(self, image):
        """ Apply a watermark to an image """
        mark = Image.open(watermark_img)
        im = Image.open(image)
        if im.mode != 'RGBA':
            im = im.convert('RGBA')
        layer = Image.new('RGBA', im.size, (0,0,0,0))
        position = (im.size[0] - mark.size[0], im.size[1] - mark.size[1])
        layer.paste(mark, position)
        outfile = join(out, basename(image))
        Image.composite(layer, im, layer).save(outfile)
        return outfile

    def upload(self, image):
        """ Upload this image to a remote server """
        subprocess.call('scp "%s" %s' % (image, ssh_image_repo), shell=True)
        if delete_after_upload:
            os.unlink(image)
        return http_image_repo + basename(image)

    def qrencode(self, url):
        """ Generate a QRCode for a given URL """
        qrcode = join(out, 'qrcode.png')
        subprocess.call('qrencode -s %d -o "%s" %s' % (
            qrcode_size, qrcode, url), shell=True)
        return qrcode

    def shorten(self, url):
        """ Generate a shortened URL """
        return surl.services.supportedServices()[shortener].get({}, url)

    def html_output(self, image, qrcode, tinyurl):
        """ Output HTML with the image, qrcode, and tinyurl """
        html = """
            <html>
              <center>
                <table>
                  <tr>
                    <td colspan="2">
                        <b><a href="%(tinyurl)s">%(tinyurl)s</a></b>
                    </td>
                  </tr>
                  <tr>
                    <td><img src="%(image)s" border="0"/></td>
                    <td><img src="%(qrcode)s" border="0"/></td>
                  </tr>
                </table>
              </center>
          </html>
        """ % {'image': image, 'qrcode': qrcode, 'tinyurl': tinyurl}
        outfile = join(out, basename(image) + '.html')
        output = file(outfile, 'w')
        output.write(html)
        output.close()
        return outfile

if __name__ == "__main__":
    photobooth = PhotoBooth()
    try:
        photobooth.initialize()
        while True:
            raw_input("Press enter to capture photo.")
            filename = photobooth.capture_photo()
            photobooth.process_image(filename)
    except KeyboardInterrupt:
        print "\nExiting..."


posted at: 02:57 | link | Tags: , , , | 21 comments

Thu, 06 Jan 2011

liveusb-creator 3.9.3 windows release

I spent the majority of yesterday at a DOS prompt. Thankfully, it wasn't as painful as it sounds, as git, vim and Python make Windows development quite tolerable.

Anyway, I was finally able to track down and fix a couple of major bugs in the liveusb-creator on Windows XP and 7, and I pushed out a new build yesterday with the following changes:

Windows users, download it here: http://liveusb-creator.fedorahosted.org


posted at: 15:34 | link | Tags: , , , | 7 comments

Mon, 18 Jan 2010

nose 0.11

I know nose 0.11 is old news, but I've only recently discovered it's new multiprocess module.

lmacken@tomservo ~/bodhi $ nosetests
................................................................................................
----------------------------------------------------------------------
Ran 96 tests in 725.111s

OK

lmacken@tomservo ~/bodhi $ nosetests --processes=50
................................................................................................
----------------------------------------------------------------------
Ran 96 tests in 10.915s

OK

Nose 0.11 is already in rawhide, and will soon be in updates-testing.

Note to self (and others): Buy the nose developers beer at PyCon next month


posted at: 22:58 | link | Tags: , , , , | 4 comments

Thu, 10 Dec 2009

FUDCon Toronto 2009

Another FUDCon is in the books, this time in Toronto. It was great to catch up with many people, put faces to some names, and meet a bunch of new contributors. I gave a session on Moksha, which I'll talk about below, and was also on the Fedora Infrastructure panel discussion.

My goal this FUDCon wasn't to crank out a ton of code, but to focus on gathering and prioritizing requirements and to help others be productive. Here are some of the projects I focused on.

Moksha

Moksha is a project I created a little over a year ago, which is the base of a couple of other applications I've been working on as well: Fedora Community and CIVX. I'll be blogging about these in more detail later.

One of the main themes of FUDCon this year was Messaging (AMQP), and Moksha is a large part of this puzzle, as it allows you to wield AMQP within web applications. During my session the demo involved busting open a terminal, creating a consumer that reacts to all messages, creating a message producer, and then creating a live chat widget -- all of which hooked up to Fedora's AMQP broker.

I'll be turning my slides into an article, so expect a full blog post explaining the basics soon. In the mean time, I found Adam Miller's description to be extremely amusing:

"I walked into a session called "Moksha and Fedora Community -- Real-time web apps with Python and AMQP" which blew my mind. This is Web3.0 (not by definition, but that's what I'm calling it), Luke Macken and J5 completely just stepped over web2.0 and said "pffft, childs play" (well not really but in my mind I assume it went something like that). This session showed off technology that allows real time message passing in a web browser as well as "native" support for standard protocols. The project page is https://fedorahosted.org/moksha/ and I think everyone on the planet should take some time to go there and enjoy the demo, prepare to have your mind blown. Oh, and I also irc transcribed that one as well http://meetbot.fedoraproject.org/fudcon-room-3/2009-12-05/fudcon-room-3.2009-12-05-22.07.log.html ... presentation slides found: http://lmacken.fedorapeople.org/moksha-FUDConToronto-2009.odp"

Fedora Community

So after we released v1.0 of Fedora Community for F12, all of us went off in seperate directions to hack on various things. J5 wrote AMQP javascript bindings, which I then integrated into Moksha. Máirín Duffy built a portable usability lab and has been doing great research on the usability of the project. And I dove back into Moksha to solidify the platform.

After we deploy our AMQP broker for Fedora, and once we have start adding shims into our existing infrastructure, we'll then be able to start creating live widgets and message consumers that can react to events, allowing us to wield Fedora in real-time. This will let us to keep our fingers on the pulse of Fedora, automate and facilitate tedious tasks, and gather metrics as things happen.

During the hackfests I also did some work on our current Fedora Community deployment. Over the past few weeks some of our widgets randomly died, and we haven't been receiving proper error messages. So, I successfully hooked up WebError and the team is now getting traceback emails, which will help us fix problems much faster (or at least nag the hell out of us about them).

I also worked with Ian Weller on the new Statistics section of the dashboard, which has yet to hit production. Ian and I wrote Wiki metrics, Seth Vidal wrote BitTorrent metrics, and I wrote Bodhi metrics. We've also got many more to come. My main concern was a blocker issue that we were hitting with our flot graphs when you quickly bounce between tabs. I ended up "fixing" the bug, so I'll be pushing what we have of the stats branch into production in the near future.

TurboGears2

TurboGears has definitely been our favorite web framework within Fedora's Infrastructure for many years now. TurboGears2, a complete re-invention of itself, has been released recently, and is catching on *very* quickly in the community. Tons of people are working on awesome new apps, and loving every minute of it. I was also able to convert a rails hacker over to it, after he was able to quickly dive into one of the tutorials with ease. See my previous blog post about getting up and running with TG2 in Fedora/EPEL.

python-fedora

One of my main tasks during the hackfests was to pull the authentication layer in Fedora Community that authenticates against the Fedora Account System, and port it over to python-fedora, so we can use it in any TurboGears2 application. I committed the initial port to python-fedora-devel, and have started working on integrating it into a default TG2 quickstart and document the process. There are still a couple of minor things I want to fix/clean up before releasing it, so expect a blog about it soon.

Bodhi

It seems like yesterday that I was an intern at Red Hat working on an internal updates system for Fedora Core. Coming up on 5 years later, and I am now working on my 3rd implementation of an updates system, Bodhi v2.0. What's wrong with the current Bodhi you ask? Well, if you talk to any user of it, you'll probably get a pretty long list. Bodhi is the first TurboGears application written & deployed in Fedora Infrastructure, and uses the vanilla components (SQLObject, kid, CherryPy2). The TG1 stack has been holding up quite nicely over the years, and is still supported upstream, but bodhi's current implemention and design does not make it easy to grow.

Bodhi v2.0 will be implemented in TurboGears2, using SQLAlchemy for an ORM, Mako for templates, and ToscaWidgets2 for re-usable widgets. It will be hook-based and plugin-driven, and will be completely distribution agnostic. Another important goal will be AMQP message-bus integration, which will allow other services or users to react to various events inside of the system as they happen.

So far I've ported the old DB model from SQLObject to SQLAlchemy, and have begun porting the old unit tests, and writing new ones. Come the new year, I'll be giving this much more of my focus.

During the hackfests I got a chance to talk to Dennis Gilmore about various improvements that we need to make with regard to the update push process. It was also great to talk to many different users of bodhi, who expressed various concerns, some of which I've already fixed. I also got a chance to talk to Xavier Lamien about deploying Bodhi for rpmfusion. On the bus ride home I helped explain to Mel how Bodhi & Koji fit into the big picture of things.

During the BarCamp sessions I also attended a session about the Update Experience, where we discussed many important issues surrounding updates.

liveusb-creator

So I got a chance to finally meet Sebastian Dziallas, of Sugar on a Stick fame, and was able to fix a few liveusb-creator issues on his laptop. I ended up pushing out a new release a couple of days ago that contains some of those fixes, along with a new version of Sugar on a Stick.

The liveusb-creator has been catching a lot of press recently (see the front page for a list). Not only did it have a 2 page spread in Linux Format, but it was also featured in this weeks Wired.com article New Sugar on a Stick Brings Much Needed Improvements. Rock.

Python

There was lot of brainstorming done by Dave Malcolm, Colin Walters, Toshio Kuratomi, Bernie Innocenti, I, and many others about various improvements that we could make to the Python interpreter. From speeding up startup time by doing some clever caching to potentially creating a new optimized compiled binary format. We also looked into how WebError/abrt gather tracebacks, and discussed ways of enabling interactive traceback debugging for vanilla processes, without requiring a layer of WSGI middleware.

There was also work done on adding SystemTap probes to Python, which is very exciting. There are many ideas for various probe points, including one that I blogged about previously.

Intel iMac8,1 support

My iMac sucks at Linux. This has been something that has been nagging me for a long time, and I've been slowly trying to chip away at the problems. First, I've been doing work on a Mac port of the liveusb-creator. I also started to work on a kernel patch for getting the EFI framebuffer working, and discussed how to do it with ajax and pjones. The screen doesn't display anything after grub, and since we don't know the base address of the framebuffer, it involves writing code to iterate over memory trying to find some common pixel patterns. I'm still trying to wrap my head around all of it, but I'll probably end up just buying them beer to fix it for me.

Thincrust

Thincrust is a project that I've been excited about for a while, and I actually have some appliances deployed in a production cloud. I was able to run some ideas for various virtual appliances by one of the authors over some beers. Some pre-baked virtual appliances that you can easily throw into a cloud that I would like to see:

dogtail

I'm glad to see that dogtail is still exciting people in the community. It still has a lot of potential to improve not only the way we test graphical software, but we also discussed ways of using it to teach people and automate various desktop tasks. What if you logged in after a fresh install and got the following popup bubble:

Hi, welcome to Fedora, what can I help you do today?

Each task would then allow Fedora to take the wheel and walk the user through various steps. I had this idea a while ago, when dogtail first came out, and I still think it would be totally awesome. Anyway, this was not a focus of the hackfests, but merely a conversation that I had while walking to lunch :)


posted at: 17:49 | link | Tags: , , , , , , , | 9 comments

Thu, 19 Nov 2009

TurboGears2 in Fedora & EPEL

I'm excited to announce that the TurboGears2 web application stack is now available in Fedora 12, 11 and EPEL-5.

What is TurboGears2?

TurboGears 2 is the built on top of the experience of several next generation web frameworks including TurboGears 1 (of course), Django, and Rails. All of these frameworks had limitations which were frustrating in various ways, and TG2 is an answer to that frustration. We wanted something that had:
  • Real multi-database support
  • Horizontal data partitioning (sharding)
  • Support for a variety of JavaScript toolkits, and new widget system to make building ajax heavy apps easier
  • Support for multiple data-exchange formats.
  • Built in extensibility via standard WSGI components

Installing the TurboGears2 stack & development tools

Fedora 12
yum install TurboGears2 python-tg-devtools
Fedora 11
yum --enablerepo=updates-testing install TurboGears2 python-tg-devtools
Red Hat Enterprise Linux 5 (with EPEL)
yum --enablerepo=epel-testing install TurboGears2 python-tg-devtools

Creating your first TG2 app

paster quickstart

Run your test suite

nosetests

Run your application

paster serve development.ini

Read the documentation

http://www.turbogears.org/2.0/docs

Contribute

If you're interested in helping maintain and improve the TG2/Pylons stack within Fedora/EPEL, please let me know. We're always looking for new Python hackers to join the team. There are still a few more components that need to be packaged and reviewed (eg: chameleon.genshi), so please take a look at the TurboGears2 page on the Fedora wiki for more details..


posted at: 06:00 | link | Tags: , , | 1 comments

Mon, 09 Nov 2009

New liveusb-creator release!

So I've gotten some pretty inspiring feedback from various users of the liveusb-creator recently, so I decided to put some cycles into it this weekend and crank out another release.

"As a non-Linux person, Live-USB Creator has improved the quality of my life measurably!" --Dr. Arthur B. Hunkins
Yesterday I released version 3.8.6 of the liveusb-creator. Changes in this release include:
Windows
https://fedorahosted.org/releases/l/i/liveusb-creator/liveusb-creator-3.8.6.zip

Fedora
https://admin.fedoraproject.org/updates/liveusb-creator-3.8.6-1.fc11
https://admin.fedoraproject.org/updates/liveusb-creator-3.8.6-1.fc12

Source
https://fedorahosted.org/releases/l/i/liveusb-creator/liveusb-creator-3.8.6.tar.bz2

Trac
http://liveusb-creator.fedorahosted.org


posted at: 02:39 | link | Tags: , , , | 1 comments

Tue, 13 Oct 2009

Good Python Habits: vim + pyflakes

Here is a neat little hack for running pyflakes on Python files after you save them. I like using pyflakes for quickly catching dumb errors, but you could easily replace it with a more comprehensive tool like pychecker, or pylint for more strict PEP8 compliance.

All you have to do is throw this in your ~/.vimrc

au BufWritePost *.py !pyflakes %

This has saved me *tons* of time and frustration over the past few weeks, and I have no idea I lived without it.


posted at: 13:32 | link | Tags: , , | 3 comments

Sun, 14 Dec 2008

>>> from fedora.client import Wiki

I created a simple Python API for interacting with Fedora's MediaWiki a while back, in an attempt to gather various metrics. I just went ahead and committed it to the python-fedora modules. Here is how to use it:

>>> from fedora.client import Wiki
>>> wiki = Wiki()
>>> wiki.print_recent_changes()
From 2008-12-07 20:59:01.187363 to 2008-12-14 20:59:01.187363
500 wiki changes in the past week

== Most active wiki users ==
 Bbbush............................................ 230
 Konradm........................................... 25
 Duffy............................................. 22
 Jreznik........................................... 21
 Ianweller......................................... 14
 Jjmcd............................................. 14
 Geroldka.......................................... 10
 Gdk............................................... 9
 Anouar............................................ 7
 Gomix............................................. 6

== Most edited pages ==
 Features/KDE42.................................... 21
 SIGs/SciTech/SAGE................................. 15
 FUDCon/FUDConF11.................................. 14
 Special:Log/upload................................ 13
 How to be a release notes beat writer............. 12
 Special:Log/move.................................. 11
 Design/SETroubleshootUsabilityImprovements........ 10
 PackageMaintainers/FEver.......................... 9
 User:Gomix........................................ 6
 Zh/主要配置文件..................................... 5

>>> for event in wiki.send_request('api.php', req_params={
...         'action': 'query',
...         'list': 'logevents',
...         'format': 'json',
...         })['query']['logevents']:
...     print '%-10s %-15s %s' % (event['action'], event['user'], event['title'])
...
patrol     Ianweller       User:Ianweller/How to create a contributor business card
move       Nippur          REvanderLuit
patrol     Ianweller       Project Leader
move       Ianweller       FPL
upload     Anouar          Image:AnouarAbtoy.JPG
move       Liangsuilong    ZH/Docs/FetionOnFedora
move       Liangsuilong    FetionOnFedora
patrol     Ianweller       User:Ianweller

It uses the fedora.client.BaseClient, which is a class that simplifies interacting with arbitrary web services. Toshio and I created it a while back as a the core client for talking with our various TurboGears-based Fedora Services (bodhi, pkgdb, fas, etc.), but it has now seemed to morph into a much more flexible client for talking JSON with web applications.

from datetime import datetime, timedelta
from collections import defaultdict
from fedora.client import BaseClient

class Wiki(BaseClient):

    def __init__(self, base_url='http://fedoraproject.org/w/', *args, **kwargs):
        super(Wiki, self).__init__(base_url, *args, **kwargs)

    def get_recent_changes(self, now, then, limit=500):
        """ Get recent wiki changes from `now` until `then` """
        data = self.send_request('api.php', req_params={
                'list'    : 'recentchanges',
                'action'  : 'query',
                'format'  : 'json',
                'rcprop'  : 'user|title',
                'rcend'   : then.isoformat().split('.')[0] + 'Z',
                'rclimit' : limit,
                })
        if 'error' in data:
            raise Exception(data['error']['info'])
        return data['query']['recentchanges']

    def print_recent_changes(self, days=7, show=10):
        now = datetime.utcnow()
        then = now - timedelta(days=days)
        print "From %s to %s" % (then, now)
        changes = self.get_recent_changes(now=now, then=then)
        num_changes = len(changes)
        print "%d wiki changes in the past week" % num_changes

        users = defaultdict(list) # {username: [change,]}
        pages = defaultdict(int)  # {pagename: # of edits}

        for change in changes:
            users[change['user']].append(change['title'])
            pages[change['title']] += 1

        print '\n== Most active wiki users =='
        for user, changes in sorted(users.items(),
                                    cmp=lambda x, y: cmp(len(x[1]), len(y[1])),
                                    reverse=True)[:show]:
            print ' %-50s %d' % (('%s' % user).ljust(50, '.'), len(changes))

        print '\n== Most edited pages =='
        for page, num in sorted(pages.items(),
                                cmp=lambda x, y: cmp(x[1], y[1]),
                                reverse=True)[:show]:
            print ' %-50s %d' % (('%s' % page).ljust(50, '.'), num)

I added a Wiki.login method to the latest version, but it isn't quite working yet. This is due to some minor limitations in the ProxyClient, so we currently cannot handle authenticated requests. However, this shouldn't be very difficult to implement. The reason for this is that we need to be able to run authenticated queries as a 'bot' account in order to mitigate the 500 entry API return limit.

This module makes it easy to talk to MediaWiki's API, so if you do anything cool with it feel free to send patches here. It's currently not being shipped in a python-fedora release, so you'll have to grab the code from Bazaar:

bzr branch bzr://bzr.fedorahosted.org/bzr/python-fedora/python-fedora-devel


posted at: 23:12 | link | Tags: , , , | 13 comments

Sat, 13 Dec 2008

Time spent in updates-testing purgatory

Will Woods asked me on IRC earlier today how easy it would be to determine the amount of time Fedora updates spend in testing within bodhi. It turned out to be fairly easy to calculate, so I thought I would share the code and results.

from datetime import timedelta
from bodhi.model import PackageUpdate

deltas = []
occurrences = {}
accumulative = timedelta()

for update in PackageUpdate.select():
    for comment in update.comments:
        if comment.text == 'This update has been pushed to testing':
            for othercomment in update.comments:
                if othercomment.text == 'This update has been pushed to stable':
                    delta = othercomment.timestamp - comment.timestamp
                    deltas.append(delta)
                    occurrences[delta.days] = occurrences.setdefault(delta.days, 0) + 1
                    accumulative += deltas[-1]
                    break
            break

deltas.sort()
all = PackageUpdate.select().count()
percentage = int(float(len(deltas)) / float(all) * 100)
mode = sorted(occurrences.items(), cmp=lambda x, y: cmp(x[1], y[1]))[-1][0]

print "%d out of %d updates went through testing (%d%%)" % (len(deltas), all, percentage)
print "mean = %d days" % (accumulative.days / len(deltas))
print "median = %d days" % deltas[len(deltas) / 2].days
print "mode = %d days" % mode


4878 out of 10829 updates went through testing (45%)
mean = 17 days
median = 11 days
mode = 6 days

So, it seems that the majority of updates leave updates-testing in less than a week. This is interesting when taking into consideration the testing workflow mechanisms that bodhi employs. An update can go from testing to stable in two ways: 1) The update's karma can reach an optional stable threshold, and automatically get pushed to the stable repository based on positive community feedback. 2) The developer can request that the update be marked as stable. After an update sits in testing for two weeks, bodhi will send the developer nagmail, which seems to help mitigate stale updates. When initially deploying bodhi, I thought that we would get bogged down with a ton of stale testing updates and would have to implement a timeout to have them automatically get marked as stable. This is still a viable option (which would require FESCo rubberstamping), but I'm quite surprised to see how effective this community-driven workflow is already. Now we just need to encourage more people to use it :)

Due to the limitations of the current model I couldn't figure out an easy way to determine which updates were marked as stable by positive community feedback. This issue will be assessed with the long-awaited SQLAlchemy port that I will hopefully finish up at some point early next year.


posted at: 08:13 | link | Tags: , , , | 1 comments

Wed, 16 Jul 2008

Python dictionary optimizations

In my recent journey through the book Beautiful Code, I came across a chapter devoted to Python's dictionary implementation. I found the whole thing quite facinating, due to the sheer simplicity and power of the design. The author mentions various special-case optimizations that the Python developers cater for in the CPython dictionary implementation, which I think are valuable to share.

Key lookups
In CPython, all PyDictObject's are optimized for dictionaries containing only string keys. This seems like a very common use case that is definitely worth catering for. The key lookup function pointer looks like this:

struct PyDictObject {
    PyDictEntry *(*ma_lookup)(PyDictObject *mp, PyObject *key, long hash);
    ...

ma_lookup is initially set to the lookdict_string function (renamed to lookdict_unicode in 3.0), which assumes that both the keys in the dictionary and the key being searched for are standard PyStringObject's. It is then able to make a couple of optimiziations, such as mitigating various error checks, since string-to-string comparison never raise exceptions. There is also no need for rich object comparisons either, which means we avoid calling PyObject_RichCompareBool, and always use _PyString_Eq directly.

This string-optimized key lookup function is utilized until you search for a non-string key. When lookdict_string detects this, it permanently changes the ma_lookup function to a slower, more generic lookdict function. Here is an example of how to trigger this degradation:

>>> d = {'foo': 'bar'}
>>> d.get(1) # Congratulations, your dictionary is now slower...
>>> d.get(u'foo') # Yes, even unicode objects trigger this degradation as well

Jython does not contain this optimization, however, it does have a string-specialized map object, org.python.core.PyStringMap, which is used for the __dict__ underpinning of all class instances and modules. User code that creates a dictionary utilizes a different class, org.python.core.PyDictionary, which is a heavyweight object that uses the java.util.Hashtable along with some extra indirection, allowing it to be subclassed.

Small dictionaries
Python's dictionary makes an effort to never be more than 2/3rds full. Since the default size of dict is 8, this allows you to have 5 active entries in your dict while avoiding an additional malloc. Dictionaries used for keyword arguments are usually within this limit, and thus are fairly efficient (along with the fact that they most likely come from a pool of cached unused dicts). This also can help improve cache locality. For example, the PyDictObject structure uses 124 bytes of space (on x86 w/gcc) and therefore can fit into two 64-byte cache lines.

So, the moral of the story: use dictionaries with string-only keys, and only look for string keys within them. If you can keep them small enough to avoid the extra malloc (<= 5), bonus. As expected, things get better in Python 3.0, as unicode keys will no longer slow your dictionary down.


posted at: 15:39 | link | Tags: , | 4 comments

Mon, 24 Mar 2008

PyCon 2008

I was in Chicago last week for PyCon 2008. It was my first time in the windy city, and I must say that I was thoroughly impressed. As expected in any city, we got a chance to see a lady get her purse snattched, and a mentally unstable gentleman on the train yelling profanities at god. Anyway, the conference itself was extremely well done, and tons of awesome innovation happened at the sprints afterwords.

Day 1: Tutorials
8+ hours of TurboGears/Pylons/WSGI tutorials. Awesome. I'm really excited with what is in the works for TurboGears2. By wielding Pylons, the TG2 team was able to completely re-write their framework with minimal amounts of code, while at the same time, gaining a *ton* of new features and some amazing middleware. Mark Ramm and Ben Bangert took turns walking us through the deep internals of their frameworks, while also giving some examples how to use them.

Sessions
During the 3-day conference portion of PyCon, there was a vast plethora of incredibly interesting sessions and conversations. You can find a schedule of the talks and some slides here. Everything was video taped as well, so the sessions should be making their way on to YouTube hopefully at some point soon.

Here are some things that caught my attention while I was there.

WSGI
Defined by Phillip J. Eby in PEP-333, the Web Server Gateway Interface is a simple interface between web servers, applications, and frameworks. Or, as explained by Ian Bicking: WSGI is a series of Tubes. The basic idea is that it lets you connect a bunch of different applications together into a functioning whole. Since TurboGears2 is based on Pylons, it will be a full blown WSGI application out the box, loaded with lots of useful middleware (WebError, Routes, Sessions, Caching, etc), and will allow you to use any WSGI server that you wish (Paste, CherryPy, orbited, mod_wsgi, etc). An example of a basic Hello World WSGI application:

def wsgi_app(environ, start_response):
    start_response('200 OK', [('content-type', 'text/html')])
    return ['Hello world!']

So, what is WSGI middleware? Well, it's essentially the WSGI equivalent of a python decorator, but instead of wrapping one function in another, you're wrapping one web-app in another. You can see a list of some existing WSGI middleware here.

virtualenv
With so many new shiny python programs to play with, I really tried to resist the urge to easy_install everything into my global Python site-packages so I could tinker with things. This is generally a Bad Thing in a distribution, as easy_install not only installs things behind your package managers back, but it also lacks the ability to uninstall anything with it, unless you want to take Zed's easy_fucking_uninstall approach ;) During the TurboGears tutorial, I was introduced to a tool call virtualenv, which will setup a virtual python environment in which you can easy_install as many eggs as you want without worrying about butchering your site-packages.

$ easy_install virtualenv
$ virtualenv --no-site-packages foo
$ cd foo; source bin/activate
$ easy_install <shiny python programs>

nose
I've been in love with nose since day one, but realized that I haven't been utilizing it to it's fullest abilities. I blogged in the past about nose's profiler plugin. Come to find out, nose offers a lot more plugins that can seriously help make your life easier:

$ nosetests --pdb --pdb-failures
.............................................................> /home/lmacken/tg1.1/turbogears/turbogears/identity/tests/test_visit.py(92)test_cookie_permanent()
-> assert abs(should_expire - expires) < 3
(Pdb) locals()
{'morsel': <Morsel: tg-visit='452c94de3900fc2adff2cd6b0b0f04c4533e3e9e'>, 'self': <turbogears.identity.tests.test_visit.TestVisit testMethod=test_cookie_permanent>, 'expires': 1206228604.0, 'should_expire': 1206232205.0, 'permanent': False}
(Pdb)

You can also measure code coverage during your unit test execution using the '--with-coverage' option, which utilizes coverage.py.

SQLAlchemy
Also known as "the greatest object-relational-mapper created for any language. ever.", 0.4 has seen vast improvements since 0.3. Among them, a new declarative API is now available that essentially lets you define your class, Table and mapper constructs "at once" under a single class declaration (giving you a similar ActiveMapper feel like SQLObject or Elixir).

from sqlalchemy.ext.declarative import declarative_base

engine = create_engine('sqlite://')
Base = declarative_base(engine)

class SomeClass(Base):
    __tablename__ = 'some_table'
    id = Column('id', Integer, primary_key=True)
    name = Column('name', String(50))

Unicode, demystified.
By far, the most frustrating problems I've ever encountered in Python have been unicode related. I was fortunate enough to catch Kumar McMillan's presentation, "Unicode in Python, Completely Demystified". This presentation helped enlighten many on the concept of unicode, clear up many misconceptions, and explain how to handle it properly in Python. Check out his slides for more details, but the general idea here is to follow these three rules:

His solution to decoding to unicode turns out to be quite elegant compared to some nasty try/except UnicodeDecodeError blocks that I have written in the past ;)
def to_unicode_or_bust(obj, encoding='utf-8'):
    if isinstance(obj, basestring):
        if not isinstance(obj, unicode):
            obj = unicode(obj, encoding)
    return obj

Later that night I went and shined some light on some dark corners of certain projects that I've been working on to try and handle unicode the Right Way.

Grassyknoll
After the code sprints, I got a chance to see these guys show off their hard work. grassyknoll is a search engine written in Python. With the ability to handle multiple backends, frontends, and wire formats, grassyknoll has a ton of potential to revolutionize the open source search engine. There has been recent talk in Fedora land about what kind of search engine to use, and I think grassyknoll is definitley a viable option.

Packaging BOF
Toshio, Spot, and I attended a Packaging BOF where we discussed our experiences with distutils and setuptools with a bunch of people from various companies and distros. This then sparked discussions on python-dev and the distutils-sig mailing lists. You can also find the details of the BOF session on the Python wiki. There is definitely a lot of energy behind this, so hopefully we'll see some good changes in setuptools in the near future that will make our lives as distro packagers much easier :)

Orbited
Orbited is an HTTP daemon that is optimized for long-lasting comet connections. This allows you to write real-time web applications with ease. For example, embeding an irc channel anywhere:

You can also use orbited as a WSGI server! Toshio did some brief benchmarking of of CherryPy{2,3}, Paste, and Orbited WSGI servers, and orbited seemed to be the clear winner in all scenerios. There is a good chance that we will be using orbited to handle our comet widgets within MyFedora :)

Code Sprints
I stayed the entire time for the code sprints, and mainly focused on TurboGears hacking. This is what I ended up working on:

Want to read more blog posts about PyCon 2008? You can find links to lots of PyCon related posts here and on Planet Python.


posted at: 22:05 | link | Tags: , | 1 comments

Wed, 19 Dec 2007

TurboFlot 0.0.1

In an effort to clean up bodhi's metrics code a bit, I wrote a TurboFlot plugin that allows you to wield the jQuery plugin flot inside of TurboGears applications. The code is quite trivial -- it's essentially just a TurboGears JSON proxy to the jQuery flot plugin. Breaking this code out into it's own widget makes it really easy to generate shiny graphs in a Pythonic fashon, without having to write a line of javascript.

Check out the README to see the code for the example above.

To use TurboFlot in your own application, you just pass your data and graph options to the widget, and then throw it up to your template. Read the flot API documentation for details on all of the arguments. Here is a simple usage example:

flot = TurboFlot([
    {
        'data'  : [[0, 3], [4, 8], [8, 5], [9, 13]],
        'lines' : { 'show' : True, 'fill' : True }
    }],
    {
        'grid'  : { 'backgroundColor' : '#fffaff' },
        'yaxis' : { 'max' : '850' }
    }
)
Then, to display the widget in your template, you simply use:
${flot.display()}

The code for the widget itself is pretty simple. It just takes your data and graph options, encodes them as JSON and tosses them at flot.
class TurboFlot(Widget):
    """
        A TurboGears Flot Widget.
    """
    template = """
      <div xmlns:py="http://purl.org/kid/ns#" id="turboflot"
           style="width:${width};height:${height};">
        <script>
          $.plot($("#turboflot"), ${data}, ${options});
        </script>
      </div>
    """
    params = ["data", "options", "height", "width"]
    javascript = [JSLink('turboflot', 'excanvas.js'),
                  JSLink("turboflot", "jquery.js"),
                  JSLink("turboflot", "jquery.flot.js")]

    def __init__(self, data, options={}, height="300px", width="600px"):
        self.data = simplejson.dumps(data)
        self.options = simplejson.dumps(options)
        self.height = height
        self.width = width

You can download the latest releases from the Python Package Index:

http://pypi.python.org/pypi/TurboFlot
Or you can grab my latest development tree out of mercurial:
http://hg.lewk.org/TurboFlot
As always, patches are welcome :)


posted at: 20:21 | link | Tags: , , , | 4 comments

Sun, 09 Dec 2007

Fedora update metrics

Using flot, a plotting library for jQuery, I threw together some shiny metrics for bodhi. It's pretty amazing to see how a Fedora release evolves over time, with almost as many enhancements as bugfixes. This could arguably be a bad thing, as our "stable" bits seem to change so much; but it definitely shows how much innovation is happening in Fedora.

I should also note that the data on the graphs may look different than the numbers you see next to each category in the bodhi menu. This is due to the fact that updates may contain multiple builds, and the graphs account for all builds in the system.

When I get some free cycles I'd like to generate some metrics from the old updates system for FC4-FC6. I can imagine that the differences will be pretty drastic, considering how the old updates tool was internal to Red Hat, and that the majority of our top packagers are community folks.


posted at: 01:05 | link | Tags: , , , , , | 2 comments

Mon, 01 Oct 2007

Use your Nose!

Every programmer out there [hopefully] knows that unittests are an essential part of any growing body of code, especially in the open source world. However, most hackers out either never write test cases (let alone comments), or usually put them off until "later" (aka: never). Having to deal with Java and JUnit tests in college not only made me not want to write unit tests, but it made me want to kill myself and everyone around me. Thankfully, I learned Python.

So, I just happen to maintain a piece of software in Fedora called nose (which lives in the python-nose package). Nose is a discovery-based unittest extension for Python, and is also a part of the TurboGears stack. If you're hacking on a TurboGears project, the turbogears.testutil module provides some incredibly useful features that make writing tests powerfully trivial.

For example, in the code below (taken from bodhi), I create a test case that utilizes a fresh SQLite database in memory. Inheriting from the the testutil.DBTest parent class, this database will be created and torn down automagically before and after each test case is run -- ensuring that my tests are executed in complete isolation. With this example, I wrote a test case to ensure that unauthenticated people cannot create a new update.

import urllib, cherrypy
from turbogears import update_config, database, testutil, url

update_config(configfile='dev.cfg', modulename='bodhi.config')
database.set_db_uri("sqlite:///:memory:")

class TestControllers(testutil.DBTest):

    def test_unauthenticated_update(self):
        params = {
                'builds'  : 'TurboGears-1.0.2.2-2.fc7',
                'release' : 'Fedora 7',
                'type'    : 'enhancement',
                'bugs'    : '1234 5678',
                'cves'    : 'CVE-2020-0001',
                'notes'   : 'foobar'
        }
        path = url('/save?' + urllib.urlencode(params))
        testutil.createRequest(path, method='POST')
        assert "You must provide your credentials before accessing this resource." in cherrypy.response.body[0]
In the above example, the TestControllers class is automatically detected by nose, which then executes each method that begins with the word 'test'. To run your unittests, just type 'nosetests'.
[lmacken@tomservo bodhi]$ nosetests
.................................
----------------------------------------------------------------------
Ran 33 tests in 16.798s

OK
Now, for the fun part. Nose comes equipped with a profiling plugin that will profile your test cases using Python's hotshot module. So, I went ahead and added a 'profile' target to bodhi's Makefile:
profile:
    nosetests --with-profile --profile-stats-file=nose.prof
    python -c "import hotshot.stats ; stats = hotshot.stats.load('nose.prof') ; stats.sort_stats('time', 'calls') ; stats.print_stats(20)"
Now, typing 'make profile' will execute and profile all of our unit tests, and spit out the top 20 method calls -- ordered by internal time and call count.
[lmacken@tomservo bodhi]$ make profile
nosetests --with-profile --profile-stats-file=nose.prof
.................................
----------------------------------------------------------------------
Ran 33 tests in 42.878s

OK
python -c "import hotshot.stats ; stats = hotshot.stats.load('nose.prof') ; stats.sort_stats('time', 'calls') ; stats.print_stats(20)"
         800986 function calls (702850 primitive calls) in 42.878 CPU seconds

   Ordered by: internal time, call count
   List reduced from 3815 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       14   13.675    0.977   13.675    0.977 /usr/lib/python2.5/socket.py:71(ssl)
       31   10.683    0.345   10.683    0.345 /usr/lib/python2.5/httplib.py:994(_read)
2478/2429    9.297    0.004    9.677    0.004 :1()
        1    0.604    0.604    0.604    0.604 /usr/lib/python2.5/commands.py:50(getstatusoutput)
     2999    0.536    0.000    0.539    0.000 /usr/lib/python2.5/site-packages/sqlobject/sqlite/sqliteconnection.py:177(_executeRetry)
   105899    0.448    0.000    0.773    0.000 Modules/pyexpat.c:871(Default)
       60    0.327    0.005    1.102    0.018 /usr/lib/python2.5/site-packages/kid/parser.py:343(_buildForeign)
   105899    0.325    0.000    0.325    0.000 /usr/lib/python2.5/site-packages/kid/parser.py:452(_default)
     3396    0.280    0.000    0.420    0.000 /usr/lib/python2.5/site-packages/cherrypy/config.py:107(get)
     2965    0.263    0.000    0.263    0.000 /usr/lib/python2.5/logging/__init__.py:364(formatTime)
44964/6587    0.238    0.000    0.252    0.000 /usr/lib/python2.5/site-packages/kid/parser.py:156(_pull)
       60    0.116    0.002    0.116    0.002 /usr/lib/python2.5/site-packages/kid/compiler.py:38(py_compile)
     8127    0.114    0.000    0.114    0.000 /usr/lib/python2.5/site-packages/cherrypy/_cputil.py:311(lower_to_camel)
     8982    0.110    0.000    0.137    0.000 /usr/lib/python2.5/site-packages/sqlobject/dbconnection.py:902(__getattr__)
13740/4044    0.108    0.000    2.176    0.001 /usr/lib/python2.5/site-packages/kid/parser.py:209(_coalesce)
24353/4026    0.107    0.000    2.143    0.001 /usr/lib/python2.5/site-packages/kid/parser.py:174(_track)
     3170    0.093    0.000    0.398    0.000 /usr/lib/python2.5/logging/__init__.py:405(format)
        1    0.082    0.082    0.082    0.082 /usr/lib/python2.5/site-packages/rpm/__init__.py:5()
     4777    0.081    0.000    1.320    0.000 /usr/lib/python2.5/site-packages/kid/serialization.py:564(generate)
  759/176    0.074    0.000    0.210    0.001 /usr/lib/python2.5/sre_parse.py:385(_parse)


posted at: 14:40 | link | Tags: , , , , | 8 comments

Sat, 01 Sep 2007

Recovering a Pyblosxom blog using liferea's RSS cache

My buddy who used to host lewk.org didn't pay his bills, so his server got taken down last week. What sucks is I that never backed up my Pyblosxom data. What doesn't suck is that thankfully Liferea, my RSS reader, did for me.

Grepping through ~/.liferea_1.2/cache/feeds, I was able to find my blog cached in some XML format. Then I wrote a little bit of code to re-create my Pyblosxom entry structure with the proper filenames and timestamps.

#!/usr/bin/python -tt
"""
 Turns XML into pyblosxom blog entries.

 It parses BLOG_XML pulling out blog entires in the form of:

     <feed version="1.1">
       <item>
         <title></title>
         <description></description>
         <source>http://foo.com/blog/2007/08/20/bar.html</source>
         <time>1187621268</time>
       </item>
     </feed>

 The file '2007/08/20/bar.txt' will be created in pyblosxom format with
 the appropriate timestamp.  The #mdate is used by the pyblosxom.vim plugin.

     title
     #mdate Aug 20 10:47:48 2007
     <p>description</p>
"""

import os
import time

try: from xml.etree import cElementTree
except ImportError: import cElementTree
iterparse = cElementTree.iterparse

entries = {} # { 'title' : <Element> }

BLOG_XML = 'blog.xml'
BLOG_ROOT = 'http://foo.com/blog/'

def getField(elem, field):
    for child in elem:
        if child.tag == field:
            return child.text

## Pull out all feed items, removing older duplicates
for event, elem in iterparse(BLOG_XML):
    if elem.tag == 'feed':
        for child in elem:
            if child.tag == 'item':
                title = getField(child, 'title')
                if entries.has_key(title):
                    if int(getField(child, 'time')) > \
                       int(getField(entries[title], 'time')):
                        entries[title] = child
                else:
                    entries[title] = child

for title, entry in entries.items():
    source = getField(entry, 'source').replace(BLOG_ROOT, '')
    source = source.replace('.html', '.txt')
    if not os.path.isdir(os.path.dirname(source)):
        os.makedirs(os.path.dirname(source))
    output = file(source, 'w')
    output.write(title + '\n')
    mtime = time.localtime(int(getField(entry, 'time')))
    mdate = time.strftime("%b %e %H:%M:%S %Y", mtime)
    output.write("#mdate %s\n" % mdate)
    output.write("<p>%s</p>\n" % getField(entry, 'description'))
    output.close()
    timestamp = time.strftime("%y%m%d%H%M", mtime)
    os.system("touch -t %s %s" % (timestamp, source))

It also adds an #mdate tag into each entry, which read by the spiffy pyblosxom mdate vim hack that Jordan Sissel wrote to restore each entries original timestamp after editing. His code only works on FreeBSD at the moment, so I started a pyblosxom.vim plugin that works on Linux (hopefully it will eventually support both, along with a bunch of other handy functions). You can find all of this code in my mercurial repo: hg.lewk.org/xml2pyblosxom


posted at: 16:44 | link | Tags: , , | 31 comments

Sat, 19 May 2007

Security LiveCD

So last week I created an initial version of a potential Fedora Security LiveCD spin. The goal is to provide a fully functional livecd based on Fedora for use in security auditing, penetration testing, and forensics. I created it as a bonus project for my Security Auditing class (instead of following the 5-pages of instructions on how to create a Gentoo livecd that she handed out (mad props to davidz for creating an amazing LiveCD tool)), but it has the potential to be extremely useful and also help increase the number and quality of Fedora's security tools. I threw in all of the tools I could find that already exist in Fedora, but I'm sure I'm missing a bunch, so feel free to send patches or suggestions. I also added a Wishlist of packages that I would eventually like to see make their way in Fedora, after the core->extras merge reviews are done.

I would eventually like to see Fedora offer a LiveCD that puts all of the existing linux security livecds to shame. We have quite a ways to go, but this is a start. I'm taking a computer forensics class next quarter, so I will be expanding it to fit the needs of our class as well.


posted at: 19:15 | link | Tags: , , , , | 0 comments

Thu, 15 Feb 2007

break

So my Thanksgiving break was far from a break. I spent a couple of days last week at Red Hat's westford office before heading back up to RIT to start a new quarter. In my two days in the office I was able to touch base with a bunch of people, and get a bunch of stuff done as well. I had a long discussion with dmalcom about integrating the Fedora Updates System with Beaker/TableCloth. He also gave me a quick rundown on a bunch of the Red Hat QA infrastructure that is currently being used. Ideally we'd like to be able to crunch all package updates through an automated test system before pushing them out to the world. Involvement needed: FedoraTesting.

Later that day I met with jrb and jkeating about getting a package updating system in place for a new Red Hat product that is going out the door very soon. This means that much work will be going into the new UpdatesSystem in the near future, which means I get to dig deeper into the world of TurboGears :)

On thursday I cranked a bunch of code out, but was fairly distracted most of the time by the OLPC laptops that were lying around the office. I must say, it is an absolutely incredible machine. The screen is gorgeous, and it's camera is very impressive. I hung around later at the office for an OLPC hackfest that was going down.

I was busy working on the updates system most of the time, but then later on I started looking into some Python start-up issues, which can be seen by doing:

	strace python 2>&1 | grep ENOENT
You'll notice a ton of syscalls like the following, which try to open/stat modules in locations that do not exist:
stat64("/usr/lib/python24.zip/posixpath", 0xbfdb5094) = -1 ENOENT (No such file or directory)
PrivoxyWindowOpen("/usr/lib/python24.zip/posixpath.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
PrivoxyWindowOpen("/usr/lib/python24.zip/posixpathmodule.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
PrivoxyWindowOpen("/usr/lib/python24.zip/posixpath.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
PrivoxyWindowOpen("/usr/lib/python24.zip/posixpath.pyc", O_RDONLY|O_LARGEFILE) = -1 ENOENT (N o such file or directory)
stat64("/usr/lib/python2.4/posixpath", 0xbfdb5094) = -1 ENOENT (No such file or directory)
PrivoxyWindowOpen("/usr/lib/python2.4/posixpath.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No su ch file or directory)
PrivoxyWindowOpen("/usr/lib/python2.4/posixpathmodule.so", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
PrivoxyWindowOpen("/usr/lib/python2.4/posixpath.py", O_RDONLY|O_LARGEFILE) = 5 

So it's obvious that modules could exist in multiple locations, but if you are repeatedly going to check a series of directories, such as /usr/lib/python24.zip, wouldn't it be a *bit* smarter to check if they exists first, and then avoid checking there in the future? Doing so would help cut down from the 233+ syscalls python makes while starting up looking for modules. I really don't have any free cycles to try and add some sense into Python, so I really hope someone can beat me to a patch.


TurboGears 1.0b2


I came back home to find the new TurboGears book in my mailbox, which has been extremely informative, aside from the fact that the project has awesome online docs as well. I pushed out the latest TurboGears release, 1.0b2, for FC6 and rawhide yesterday as well.


posted at: 03:12 | link | Tags: , , , , | 1 comments