Sun, 11 Nov 2007

Another day, another business

Last night my girlfriend and I threw together a little web site for our old roomate, Joe. Not only is Joe a brilliant high-voltage engineer, who plays with plasma in his free time, he also fabricates custom scope mounts for old Mosin Nagant sniper rifles. If you're interested in that sort of thing, then feel free to check it out. I'll be throwing up some CAD diagrams and better photos of the mount in near future.


AdvancedRifleParts.com
Over the summer, Joe took me out to the range and I got a chance to fire the Mosin pictured on the web site. Good times.


posted at: 19:45 | link | Tags: , , | 0 comments

Sat, 01 Sep 2007

Recovering a Pyblosxom blog using liferea's RSS cache

My buddy who used to host lewk.org didn't pay his bills, so his server got taken down last week. What sucks is I that never backed up my Pyblosxom data. What doesn't suck is that thankfully Liferea, my RSS reader, did for me.

Grepping through ~/.liferea_1.2/cache/feeds, I was able to find my blog cached in some XML format. Then I wrote a little bit of code to re-create my Pyblosxom entry structure with the proper filenames and timestamps.

#!/usr/bin/python -tt
"""
 Turns XML into pyblosxom blog entries.

 It parses BLOG_XML pulling out blog entires in the form of:

     <feed version="1.1">
       <item>
         <title></title>
         <description></description>
         <source>http://foo.com/blog/2007/08/20/bar.html</source>
         <time>1187621268</time>
       </item>
     </feed>

 The file '2007/08/20/bar.txt' will be created in pyblosxom format with
 the appropriate timestamp.  The #mdate is used by the pyblosxom.vim plugin.

     title
     #mdate Aug 20 10:47:48 2007
     <p>description</p>
"""

import os
import time

try: from xml.etree import cElementTree
except ImportError: import cElementTree
iterparse = cElementTree.iterparse

entries = {} # { 'title' : <Element> }

BLOG_XML = 'blog.xml'
BLOG_ROOT = 'http://foo.com/blog/'

def getField(elem, field):
    for child in elem:
        if child.tag == field:
            return child.text

## Pull out all feed items, removing older duplicates
for event, elem in iterparse(BLOG_XML):
    if elem.tag == 'feed':
        for child in elem:
            if child.tag == 'item':
                title = getField(child, 'title')
                if entries.has_key(title):
                    if int(getField(child, 'time')) > \
                       int(getField(entries[title], 'time')):
                        entries[title] = child
                else:
                    entries[title] = child

for title, entry in entries.items():
    source = getField(entry, 'source').replace(BLOG_ROOT, '')
    source = source.replace('.html', '.txt')
    if not os.path.isdir(os.path.dirname(source)):
        os.makedirs(os.path.dirname(source))
    output = file(source, 'w')
    output.write(title + '\n')
    mtime = time.localtime(int(getField(entry, 'time')))
    mdate = time.strftime("%b %e %H:%M:%S %Y", mtime)
    output.write("#mdate %s\n" % mdate)
    output.write("<p>%s</p>\n" % getField(entry, 'description'))
    output.close()
    timestamp = time.strftime("%y%m%d%H%M", mtime)
    os.system("touch -t %s %s" % (timestamp, source))

It also adds an #mdate tag into each entry, which read by the spiffy pyblosxom mdate vim hack that Jordan Sissel wrote to restore each entries original timestamp after editing. His code only works on FreeBSD at the moment, so I started a pyblosxom.vim plugin that works on Linux (hopefully it will eventually support both, along with a bunch of other handy functions). You can find all of this code in my mercurial repo: hg.lewk.org/xml2pyblosxom


posted at: 16:44 | link | Tags: , , | 31 comments