I’m trying to get the hang of the mod_rewrite module of my Apache web server. This enables me to trap certain URLs or patterns of URLs and rewrite them to point somewhere different on the back end. This has two practical benefits:
- It makes migration from an old path or permalink scheme much easier to handle. You no longer have to maintain duplicate content at outdated addresses, or manually redirect people. Instead, the old address seamlessly transports the user to the new address.
- It frees you from the URL scheme automatically generated by your content managment system, many of which are notoriously ugly or unuseful.
For example, I’ve set up a site using the DeanSpace software, which is a slightly modified version of Drupal. Drupal uses a “node” vocabulary that I find very geeky for ordinary users, and its URLs by default are of the form
http://root.address/node/#### (that is, the base URL, a geek word, and an arbitrary number), sometimes with
?=variable a bit of database or PHP query jargon tacked on the end to transform the view of the underlying data.
This is a URL scheme only an engineer could love.
I’d much rather signal the nature of the content, the place in the hierarchy (or taxonomy, or ontology), or at least some key date related to the “node,” such as the date it was created or last revised.
With mod_rewrite, I’ll be able to invent any URL scheme I can imagine, and more importantly I can make database-generates pages appear to be static files in a stable directory hierarchy, so that Google and other search engines will feel comfortable indexing them.
Here’s my problem, savvy use of mod_rewrite involves learning something I’ve tried to avoid as long as possible: regular expressions. (Cue jwz’s famous remark about regular expressions.)
Can anyone point me to a good primer for moderately dense poseurs such as myself?