Email and Browser URL Extraction and Search via a Personal

I think Jeremy Zawodny’s onto something big here. I’ll have to add a link to my “why not tag everything” post too, because I think it’s part of the same picture (Email and Browser URL Extraction and Search via a Personal

A few minutes ago, I needed to send a note to Russell about Yahoo Desktop Search. Specifically, I had to find a URL for an internal site that he wanted to see. But I couldn’t remember what the URL was or who sent it to me. All I knew was that it was in my e-mail inbox. Somewhere.

So I ran a quick grep (command-line search) for “http:” and got a big list of URLs and URL-like things from my inbox. I was able to further refine the search using the word “desktop” and found the URL in no time.

A moment later, a realization struck me:

I do this a lot!

In a sense, URLs are just another type of e-mail attachment. Someone can either send you the content directly or they send you a URL to the content.

What I really need is a tool that acts like a personal that’s automatically fed from the combination of URLs embedded in e-mail messages as well as my browser history. It could keep a database of those URLs, count the frequency with which I visit them as well as how often they appear in e-mail that I send or receive. And if it provided the ability to tag and annotate the URLs, all the better.

In fact, if it was like a private “satellite” version of that had the ability to check with the larger public that’d be even better. The idea being that for public URLs which end up in my local (private) database, I could still benefit form the collective tagging and annotation efforts of those in the outside world.

I can imagine a second generation of this system that goes a step further: fetching the web content that each of the URLs points to, storing a cached copy locally, and indexing it just like a traditional web search engine might. Bonus points for integration with something like the Slogger extension for Firefox, so that it doesn’t have to store duplicate data.

If I had a copy of the source code for handy, I could probably get the first cut of this going in a day’s time. That might be a day well spent.

In my mind I’m trying to tie all these things together into something I currently call by its codename, Mizzen. More about that when it gels or I get the slideshow put together or a venture started or a job implementing it for someone.