As many of you have no doubt noticed, the current set of postings to this blog comes from 5 years of my livejournal. I decided I finally wanted to really do my own blog software (all the cool kids are doing it!) and now’s as good a time as any. WordPress 1.5 is hard to knock, and now that it has the Pages support, it really has everything I’ve been needing in blog software. Since I’m running Ubuntu 5.04 now (Hoary) and can just dpkg -i from breezy, it does make things pretty straightforward (speaking of which, I should hop up to 1.5.1 now that it’s out).
I knew I didn’t want to start from scratch, I wanted to import all those (hopefully interesting) posts I had made with my livejournal account. A quick google for “livejournal export wordpress” led to a WordPress forum post talking about the steps our friend Derek had tried and the problems he was having. Not sure what particular problem he was hitting (maybe wasn’t making XMLFILE an absolute path, who knows), but using the method of exporting your livejournal entries via ljArchive (a .NET 1.1 app, which is kind of cool) and then importing them via wp-admin/import-livejournal.php seemed like a decent approach.
After running the wp-admin/import-livejournal.php from a browser, I saw that I needed to edit it to point it at the LJ xml archive. No biggie, quick edit later, moved on to running it again with the new value.
First snag – running import-livejournal.php hit the 8MB default memory_limit value in my /etc/php4/apache2/php.ini – now, I know of at least a couple other things I’m going to be running in php-land that will need far more memory, so I jacked it up to 200MB as a swag and restarted apache2.
Second snag – the import ran but didn’t actually import anything. Turns out I had pointed it at the archive.lja that ljArchive created, which isn’t an XML file at all (bad assumption on my part). I go back to ljArchive and File / Export / XML Writer to get an actual flavor.xml file created. Then a quick edit (again) to import-livejournal.php to point it at the actual xml file this time.
Next snag – at least with 1.5, the wp-admin/import-livejournal.php page assumes that the livejournal export software does CDATA encoding on the “event” xml node. Unfortunately, at least for the ljArchive version 0.9.4 I was running, that wasn’t the case, it was good old URI escaping. Hence, the import actually ran just fine, but all my posts in WordPress now had URI escaping, so all the links were busted and actual HTML markup was being displayed instead of the intended posts.
Since the actual stored XML was fine, I figured the path of least resistance was to switch to another converter. Skipping over yet more boring parts, I ended up exporting to Movable Type’s format and using wp-admin/import-mt.php instead.
Next problem – since I made most of my livejournal entries without subjects, most of my WP posts now had a post_name of no-subject. This causes a problem in my Permalinks url’s since post_name is, in this case, not unique on a per-month basis, nor even a per-day basis. Since I’m using /%year%/%monthnum%/%postname%/ for my Permalinks format, I have to fix this up. Ok, well, strictly speaking I didn’t have to, but I thought it would be nice) so each of these old posts has their own nice pretty url.
My first attempt at that was “sudo mysql -p” and “use wordpress;” and then a good old “update wp_posts set post_title=’post-’ || id, post_name=’post-’ || id where post_name=’no-subject’;” which ran fine and updated the 2142 rows I expected. Many of you that spend more time in MySQL than I will notice that I used the standard || operator for SQL string concatenation (showing my Oracle/PostgreSQL background). This did an actual “or” operator, though, so those columns got set to “1″ instead of the intended strings – doh!
After a quick google to see that mysql uses concat(), I went with “update wp_posts set post_title=concat(‘post-’, id), post_name=concat(‘post-’, id) where post_name=1;” (notice the update where clause based on the first run). That worked fine. Now the post_name and post_title values were unique (at least per-month).
Pulling up the blog in a browser, I noticed that one thing went badly about using MT as the intermediary format – there’s an “EXTENDED BODY” section written by the tool I had used and it wasn’t supported by import-mt.php, so all my imported posts ended in these lines:
—–
EXTENDED BODY:
Another quick google to see the string replacement function for mysql, and “update wp_posts set post_content=replace(post_content, ‘—–\n\nEXTENDED BODY:’, ”);” which also worked fine, and removed the extraneous content from the posts.
And that’s the story so far. I’d imagine import-livejournal.php will get fixed soon-ish (although admittedly I haven’t hit the wordpress trac to file the bug), so most of this won’t have a lot of value for others, I hope :)