Automatic content inventories

· Best Practices

Right now I’m working on two paying projects (amidst my more speculative and creative and fun stuffs), a book about the upcoming version of FrontPage, which I’m coauthoring, and an information architecture job, part of a requirements-gathering process for a corporate portal for a Fortune 500 client. OK, I’m also supposedly writing a white paper for another client but that project has been postponed indefinitely, thank gopod.
Anyway, one of the steps in developing this information architecture has been compiling a content inventory of the existing site. This is not a build from scratch. We’ll be migrating the current site to the new portal, so it’s essential to know what’s already out there. I’m classifying the content types, notating the current (soon-to-be obsolete) navigational hierarchy, and determining what portlets and templates will be needed to render the content in the new structure. I also had to cut and paste a whole bunch of URLs into a spreadsheet and then turn off the automatic link formatting.
Oh, and I added numbering. You know, like 1.2.8.1.1, etc.
I wish most of these tasks could have been automated. I need this work product for some of the other parts of my job, and it’s always nice to show the client that you understand exactly what they’ve got right now, but I kept feeling like most of what I was doing was massaging text and mostly in a fairly brain-dead way.
What I’d like to have is some kind of tool that could crawl an existing site and capture for me the names of the navigational links and the related URLs. Oh, and generate the numbering for me with some gentle hints. I don’t mind doing the rest, which is analytical and requires understanding the nature of the contnet. That’s what I’m here for. But the other parts sound like a good job for a computer, not an overgrown monkeybrain like myself.