Migrating from Radiant CMS to... *anything* else

I mentioned recently about the painful transition of this website from the Ruby/Rails Radiant content management server to... well, anything that would actually work. Given its popularity, I have to assume that Ruby and Rails can be made to work well -- or that 1000s of development teams are herd-following idiots, but that can't be true, right? -- but my experience was a nightmare.

Mysterious Rakefiles, UI-disaster server commands, awful integration with system packages, god-awful outdated Radiant documentation, and changes with every release. In the end, an update of the base Ubuntu OS completely broke Radiant. I tried using Ruby Gems in all the ways I could find, and updated every package to the latest that Radiant thought it wanted but couldn't get it to run again. I tried making a new Radiant site and migrating the database via the advertised commands: it crashed. And in the end it seemed that Radiant's own declaration of package dependencies was inconsistent. This was just the final straw after several years of expecting a Rails epiphany, and dreading every time that I'd have to restart the server and somehow get the creaking mess up and running again.

Well, enough was enough. I'venow moved to using the Nikola static site generator instead and couldn't be happier: it's got a great command-line UI, it's totally clear what's going on, I can hack and extend it if I want to, and my data is forever in a human-readable, editable (even when offline!) format.

Radiant's page data is categorically not available in a human-readable format, so a significant part of the effort to get this site back to life was the need to write a script to access its article database, and dump out the pages in a form I could use. Fortunately the db is just an sqlite single-file database, and the table structure was pretty simple, so the dump script was easy. Here it is for posterity:

radiant2txt

#! /usr/bin/env python

"Convert a RadiantCMS SQLite3 db file into separate page and header text files"

import optparse, os
op = optparse.OptionParser()
op.add_option("-o", "--out", dest="OUTDIR", default="out")
opts, args = op.parse_args()

import sqlite3
conn = sqlite3.connect(args[0])
conn.row_factory = sqlite3.Row
c = conn.cursor()

import unicodedata
def norm(s):
    return unicodedata.normalize("NFD", s).encode("ascii", "ignore")

import datetime
def date(s):
    return datetime.datetime.strptime(s, "%Y-%m-%d %H:%M:%S").date().isoformat() if s else ""

import textwrap, re
class DocWrapper(textwrap.TextWrapper):
    """Wrap text in a document, processing each paragraph individually"""

    def __init__(self):
        self.tw = textwrap.TextWrapper(width=120, break_long_words=False)

    def wrap(self, text):
        """Override textwrap.TextWrapper to process 'text' properly when
        multiple paragraphs present"""
        para_edge = re.compile(r"(\n\s*\n)", re.MULTILINE)
        paragraphs = para_edge.split(text)
        wrapped_lines = []
        for para in paragraphs:
            if para.isspace():
                wrapped_lines.append('')
            else:
                wrapped_lines.extend(self.tw.wrap(para))
        return wrapped_lines

dw = DocWrapper()

for page in conn.execute("SELECT * FROM pages"):
    pagename = page["slug"] if page["slug"] != "/" else "index"
    outfile = os.path.join(opts.OUTDIR, "%s.md" % pagename)
    with open(outfile, "w") as f:
        f.write("<!-- \n")
        f.write(".. title: " + norm(page["title"]) + "\n")
        f.write(".. slug: " + pagename + "\n")
        if page["published_at"]:
            f.write(".. date: " + page["published_at"] + "\n")
        else:
            f.write(".. date: 2008-06-01 12:00:00\n")
        f.write(".. type: text\n")
        f.write(".. category: blog\n")
        f.write("-->")
        f.write("\n\n")
        for part in conn.execute("SELECT * FROM page_parts WHERE page_id = ? ORDER BY page_parts.name", (page["id"],)):
            text = dw.fill(norm(part["content"]))
            if text:
                f.write(text + "\n")

To get a bunch of pages out in the format I wanted (my site was using Markdown syntax, so the script writes out to a bunch of .md files), I ran this like:

./radiant2txt myradiantsite/db/radiant_live.sqlite.db -o out-nikola

A bit of manual hacking followed, but 95% of the job was done by the script above. Use if you like, but don't ask me for support; if you need something a bit different, hack it!

MP letter re. EDM 49 on Royal/commercial FoI

Well, I'm blogging again, and it seems to me that if I'm going to write a letter to my MP on a national issue, then I may as well wear my heart on my sleeve and make it an open letter. So here's the latest --- in fact the first I've written for a while, due to the replacement of my long-standing traditionalist/institutionalist MP with a hopefully more sympathetic model:

Attn: Owen Thompson MP Midlothian

Wednesday 10 June 2015

Dear Owen Thompson,

I'm writing to ask you to sign Parliamentary Early Day Motion 49, "Freedom of Information Legislation, publicly funded bodies and the royal family."

This EDM calls for two important things: 1) that commercial confidence not be a justification for secrecy on public sector contracts (after all, we all are the paying clients), and 2) that the Royal Family not be given special exemption from the freedom of information rules that govern all other publicly funded bodies (again, we are all paying for them and deserve to know what we get for our money).

The first point is, I hope, self-evident. One the second, I think it is worth noting that the recently published Prince Charles correspondence with ministers has shown how the heir to the throne, regardless of whether you agree with his comments, has abused his position of conventional neutrality on numerous political issues. He pressed ministers to favour his own interests and organisations, and the evidence is that most felt compelled to respond more substantially than they would to an "ordinary" citizen.

Extraordinarily, David Cameron claimed that there was an "important principle about the ability of senior members of the royal family to express their views to government confidentially" -- it's somehow democratically important than unelected aristocrats have special access to legislators despite that being constitutionally taboo?! And rather than respond constructively to the exposure, there is clearly a determination from the Conservative Government simply to hide the abuse from public view. This must be opposed, and indeed the existing exemption of the Royals from FoI requests (in response to the moves to publish Charles' letters) should be repealed. Please sign the EDM that calls for this.

Yours sincerely,

Dr Andy Buckley

And that's that.

Pygmentizing code for LaTeX

A couple of years ago, I realised that actually quite a few people were using my PySLHA library and plotter, and that I should write it up for them to cite, that being the tail-wags-dog way that the academic world rolls. So I knocked something together.

While writing this, using a LaTeX class file of my own devising, I decided I wanted to render my Python code examples better than the venerable listings package can do. And I found minted, a clever LaTeX package which automatically runs Pygments via the LaTeX chell escape mechanism. Problem is, the arXiv doesn't allow -shell-escape running of LaTeX; I had to beg a favour to get my original version of the paper uploaded.

Now I'm coming up to a major new release of PySLHA, it seems worth updating that arXiv note, and maybe even trying to get it "properly" published for the usual ineffable reasons. And another minted special request isn't going to wash. But I still like its output. So I just figured out what it was doing, and fiddled together a teeny bash script that provides the same code snippets statically. I don't think this exists elsewhere, but it's not worth a proper code release, so here's the while thing in case someone finds it useful:

pygtex

#! /usr/bin/env bash

## Write a .sty file defining the commands used in each Verbatim code block (bit hacky)
echo "" | pygmentize -l python -f latex -P full=True | head -n -10 | grep -E -v "documentclass|inputenc" > pygtex.sty

## Make a Verbatim code block for each input code file, transforming foo.ext to foo-ext.tex
for inname in $@; do
    outname=$(echo "$inname" | sed -e 's/[\ \.]/-/g').tex
    pygmentize -f latex -P verboptions='frame=leftline,framesep=1.5ex,framerule=0.8pt,fontsize=\smaller' $inname > $outname
done

I called it pygtex; you can call it whatever you like. It can be called like pygtex *snippet.py (if you've made code snippets with that name pattern) and will write out a pygtex.sty file, and a .tex file for each snippet. Then include them in your doc like this:

1
2
3
4
5
6
\usepackage{pygtex}
...
\input{foo-snippet-py}
...
\input{bar-snippet-py}
...

Enjoy.

38 Degrees and neonics

I've been very disappointed to see 38 Degrees, a people-power campaigning organisation whose petitions I've often signed, going down the data-blind anti-corporate route that blights the likes of Avaaz. Straying from their typical social justice agendas, 38 Degrees have decided to direct their ire at the government for considering a repeal of the EU-wide ban on neonicotinoid pesticides that's been in effect in Europe for the last year.

Read more…

Ahoy

Sorry for the 6 months that this site has been offline -- in particular for anyone who's been trying to read the night-climbers transcriptions. My RadiantCMS server refused to restart after an Ubuntu server upgrade, and because Radiant and Rails are a steaming pile of crap when it comes to package management and code quality, a good 10 or so hours of configuration fighting failed to resuscitate it. At which point I lost the will to live.

Read more…

A tense exchange

Time for another lazy excerpt from private correspondence! This time we visit that most viscerally thrilling and scientifically crucial of subjects: what tense(s) to use in your scientific paper. Daring, I know! But surprisingly controversial, and I'm motivated to write it after reading and reviewing umpteen notes, drafts, and published papers in which the tenses seem (to me) perverse. In particular I think there's a need to write such a thing after being told by one physicist "I think there's a convention in science writing that we always use present tense". Piffle!

Read more…

It just works... or does it? The dark side of Macs in HEP

If you attend a particle physics meeting these days (and most of us do, several times a day... this is not a good thing) it looks rather different to how it did 10+ years ago. Not that everyone paid attention then, but the type of laptop everyone's focusing on rather than the speaker has shifted, from the olden times array of various clunky black boxes to the situation now where 2/3 of the room seem to be wielding shiny silver Macbooks.

It seems like a no-brainer: Windows is pretty much 100% dysfunctional for computing-heavy science (unless you are either in a fully management role and never touch data, or for some reason love doing all your work though a virtual machine), but Linux is unfamiliar territory for most starting PhD students. Sure, it's a lot more user friendly than it used to be, with more helpful GUI ways to manage the system and the wifi even works out of the box most of the time. But Macs are perfect: beautifully designed, friendly, but with Unix underneath ... and they only cost an extra 50%! Ideal for HEP users who need Unix computing but want it to just work out of the box... and who doesn't? As the Apple advertising used to say "It just works". But does it?

Read more…

On unfolding

A while ago I was included in a discussion between an ATLAS experimentalist who had been told that some "unfolding" was needed in their analysis, and a theorist who had previously been a strong advocate of avoiding dangerous "unfoldings" of data. So it seemed that there was a conflict between the experimentalist position of what would be a good data processing and the view from the theory/MC generator community (or at least the portion of it who care about getting the details correct). In fact the real issue was just one of nomenclature: the u-word having been used to represent both a good and a bad thing. So here are my two cents on this issue, since they seemed to help in that case. First what the experimentalist was referring to as "unfolding" was almost certainly the "ok" kind: unfolding to hadrons, photons and leptons with lifetimes of at least ctau0 = 10 mm.

Read more…

Top mass measurements and MC definitions -- an inexpert precis

I was just recently notified that the world top mass combination uses "my" MCnet review paper on MC generators to justify stating that the definition of the top quark mass used in all (!) event generators is equivalent to the "pole" mass.

I've heard that statement very often, but not backed up by anything more concrete, so I was interested to read this section of the paper (Appendix C, starting on p184 of the PDF), which turns out to be rather good, interesting, and elegantly presented. Not to mention slightly embarrassing that I hadn't read it before, given that it has my name on the front! (In my defence, I did write some of this paper, just not that bit. I suspect most of the authors haven't read everything in it.)

Anyway, it definitely does not say that MC mass equals pole mass, so I thought it might be interesting to post my explanation of what it does say, at least as far as a dumb fence-sitting experimentalist/MC guy like myself can understand...

Read more…

Science TV is too nice

Well well well, another blog post, eh? So soon: it's only been... erm, two years. Oh. Well I never promised to be prolific. This one's come about because I grumbled briefly on Twitter about the nature of (British) pop-science TV and immediately hit the restrictions of that medium. Twitter is a wonderful way to share neat things that you find online, and to make pithy soundbites & jokes (and for describing what you're eating, the form of public transport that you happen to be on, listing film names with comic vegetable name substitutions...), but for exploring a non-trivial issue 140 characters is, to put it mildly, a limitation. I have difficulty fitting one of my normal sentences into 140 characters. So of course I came across as a whining idiot, prompting the reply "Yeah, we really should do more about how shit it all is. You don't see that tone ANYWHERE" from Dara O'Briain. Well, not remotely what I meant, but who can blame him? So here's an attempt at a more coherent and nuanced version that hopefully doesn't make me come across as an anti- science, axe-grinding git. But perhaps as a slightly grumpy science nerd, which is fair enough. There's been a welcome rise in the amount and profile of science programming on TV in recent years, a good whack of which is due to the influence of Brian Cox's three "Wonders" series', the LHC start-up & Higgs boson excitement, ... and who knows, maybe it's also due to The Big Bang Theory. Discounting the mostly-gawp-fest wildlife docs, there's been Dara's own Science Club, Astonomy Live, Bang Goes The Theory, at least a couple of series fronted by my old supervision partner Helen Czerski... and the long-running Horizon, of course. And that's just the stuff that I've noticed.

Read more…