Wednesday, January 26, 2005

ontologies, tagging and categorizing, why?

As a result of my last two weeks in Mexico implementing Autonomy mainly for information retrieval and categorization, I had the opportunity to assist to a couple of interesting lectures about ontologies and taxonomies. These presentations, I must admit, made me think over again of the value of building complex hierarchical information structures. Is it really worth the effort?

And today I came across an interesting rough-n-ready "article". I must thank
Hugh Pyle for pointing me to this rather interesting post re: ontologies and taggs(Tags != folksonomies && Tags != Flat name spaces). The more I learn about developing complex taxonomies, the more I'm against it.

...the suckiness of ontology is .....the need to declare today what contains what as a prediction about the future. Let's say I have a bunch of books on art and creativity, and no other books on creativity. Books about creativity are, for the moment, a subset of art books, which are a subset of all books. Then I get a book about creativity in engineering. Ruh roh. I either break my ontology, or I have to separate the books on creativity, because when I did the earlier nesting, I didn't know there would be books on creativity in engineering. A system that requires you to predict the future up front is guaranteed to get worse over time.

And the reason ontology has been even a moderately good idea for the last few hundred years is that the physical fact of books forces you to predict the future. You have to put a book somewhere when you get it, and as you get more books, you can neither reshelve constantly, nor buy enough copies of any given book to file it on all dimensions you might want to search for it on later.

Ontology is a good way to organize objects, in other words, but it is a terrible way to organize ideas. ....

The move here is from graph theory (arrange everything in a tree graph, so that graph traversal becomes the organizing principle) to set theory (sets have members, and the overlap or non-overlap of those memberships becomes the organizing principle.) This is analogous to the change in how we handle digital data. The file system started out as a tree graph. Then we added symlinks (aliases, shortcuts), which said "You can organize things differently than you store them, and you can provide more than one mode of access."

"Not only does it not matter where something is stored, it doens't matter whether it's stored. A URI that generates the results on the fly is as valid as one that points to a disk." And once something is no longer dependant on tree graph traverals to find it, you can dispense with hierarchical assumptions about categorizing it too.

Lets translate this to the management of my personal information. Hey, in a smaller scale, we suffer the small information overdose as corporations do.

Well, as a "good" and disciplined engineer I believe in good organisation. Until recently I used to organise all my incoming emails in folders (i.e. admin, customer accounts, personal, etc..) was a bit time consuming, but it worked, it was effective. Well not anymore, as I got involved in more and more jobs and my work load increased, I found it increasingly hard to maintain a consistent and logic hierarchy of folders. The great breakthrough, the moment I decided to stop using folders in Outlook was Blinkx, a smart tool that indexes all my emails (even the webmail ones) and also all my local documents in m y desktop, attachments, etc and points me to them instantly....When I need to check a quote that I sent through email to one of my customers, I don't need to check under the customer's Inbox folder, or the "Quotes", no...I fire up Blinkx and I type the customer name and quote and in less than a sec, I have a direct link to the document which was attached in an email.

The bottomline is, why do we want to categorise when we can rely on strong powerful retrieval tools? Don't get me wrong, categorising may be beneficial for companies that want to maintain a current taxonomy or thesauri, but hey, first of all, before you act, stop and think twice...what is the true objective behind categorising? is it just allowing people to easily access and find information? then, you may be surprised to learn that categorising and structuring your own information universe may be an arduous task which will occupy to much of your valuable time and resources for far too little in return. In most cases, building up an ontology (i.e. a set of inter-related taxonomies) doesn't pay off! Let me know what you think folks...


Post a Comment

<< Home