Ramp Closed. Use Next Exit

(See what I did there? It’s because my site is a fill-in-the-blank on the Information Superhighway. Get it?)

The more eagle-eyed among you who visit this site on a regular basis (both of you) may have noticed some changes to the layout and whatnot. Or maybe something just went kerflooie in the RSS feed and your aggregator has just tossed the whole thing in the trash rather than try to deal with it.

Well, not that you asked (you could’ve asked, you know. I take an interest in your lives, you insensitive assholes1), but I’ve been messing with things behind the scenes, mainly to avoid having to update stuff all the time. So, in keeping with the vintage 1992 metaphor in the title, I’ve stopped leaning on my shovel, drained the last of my coffee, and actually gotten to work fixing the actual roadway underneath the twenty-times-patched potholes. And then knocking off early and asking someone to punch my time clock for me, because that’s the kind of tireless lazy fucker I am.

Actually, one thing y’all might like is the “Reply” button underneath comments, that allow you to reply to individual comments.

And now, if you’ll excuse me, I need to pop over to Geocities to download some animated “Under Construction” and flashing-light GIFs.


1: Not intended as a factual statement.

Keywords Seem to Be Working Now

Those links at the bottom of posts that say “Tags: foo, bar, baz” should now be working.

Back when I started this weblog, WordPress didn’t have keyword support, so I installed a plugin to implement them. Then at some point keywords became a core feature of WP. I think that having tags in the core and keywords in the plugin broke things.

So I finally consed up a quick and dirty kludge to transfer all of the old tags to new-style keywords, deactivated the plugin, and things started working again. Yay!

Character Encodings Are a PITA

Character encoding schemes (UTF-8, ASCII, ISO-8859-1/-15,
Windows-1252, etc.) are an incredible source of headaches. Stay away
from them.

(Oh, and if you tell me I mean “raw character encoding” or “codepoint
set” some such, I’ll whack you upside the head with a thick Unicode
reference.)

In case you hadn’t noticed, I upgraded WordPress not too long ago.
Being the cautious sort, I did a dump of the back-end database before
doing so, as I’ve done every other time I upgraded. And, like every
other time, I noticed that some characters got mangled. This time
around, though, I decided to do something about it.

It turned out that when I originally set up the database, I told it to
use ISO-8859-1 as the default text encoding. But later, I told
WordPress to use UTF-8. And somewhere between dumping, restoring, and
WordPress’s upgrade of the schema, various characters got mangled. For
the most part, various ISO-8859-1 quotation marks got converted to
UTF-8, then interpreted as ISO-8859-1, and converted again. On top of
which, some commenters used retarded software to post comments, which
insisted on using cp1252 or cp1258 (and I even saw something which
might’ve been IBM-CP1133), which also got converted to and from UTF-8
and ISO-8859-1 or -15.

Obviously, with 13 Mb of data, I wasn’t going to correct it all by
hand; I needed to write a script. But that introduced additional
problems: a Perl script that’s basically “s/foo/bar/g” is
pretty simple, but when foo and bar are strings that
represent the same character using different encodings, things can get
hairy: what if bar is UTF-8, but Perl thinks that the file is
in ISO-8859-15?

On top of that, you have to keep track of which encoding Emacs is
using to show you any given file.

iconv turned out to be an invaluable forensic tool, but it has one
limitation: you can’t use it to simply decode UTF-8 (or if you can, I
wasn’t able to figure out how to do so). There were times when I
wanted to decode a snippet of text and look at it to see if I could
recognize the encoding. But iconv only allows you to convert from one
encoding to another; so if you try to convert from UTF-8 to
ISO-8859-1, and the resulting character isn’t defined in ISO-8859-1,
you get an error. Bleah.

The moral of the story is, use UTF-8 for everything. If the software
you’re using doesn’t give you UTF-8 as an option, ditch it and use
another package.