Tuesday, July 01, 2008

Researching in a Search 2.0 world

A lot of my time is spent researching. A LOT of my time is spent researching. I research things around record linkage. I also research clustering, classification, natural language processing, and machine learning, in general. Quite a few times, I have to get up to speed. I need to understand what Felligi and Sunter did in the 1960s before I can understand what Winkler added to it in the 90s and what the general entity resolution research is all about now. Or, perhaps I just want to be a better programmer. Perhaps move from a O(N) to O(log N) on the Programmer Competency Matrix.

For Search 2.0, much of the hullabaloo has been about Natural Language Processing (NLP). Companies such as Powerset have touted their products as being able to understand a human query. For instance, the powerset engineers have given demos where they ask their engine "Which politicians died of disease?" and it gives back a list. This approach is great if I'm after general information, or if I'm helping my kid with her homework. However, it doesn't give me perspective about research. Why am I asking about politicians and disease anyway? Am I trying to get a statistical view of politicians that die of disease vs the health of the rest of the populace? Am I trying to understand the effects of an ill politician on the society? I might wish to see the sites that others also searched for, much like the Amazon feature. Or, maybe I want additional statistics about that country during the time period. In other words, I need more than just the answer to my question. I need a path that others have followed that I can follow as well. Eventually, I'll have to get off the path, but I want to stay on it as long as possible.

In addition, I want to quickly understand an author's position. I want to know, with my search results, whether this author is an expert or a novice in the field. I want to know where his or her funding comes from. I want to know, based on statistical analysis of their previous posts, if they are conservative or liberal. Have they published papers? If so, in what journals? Are they top journals? It is this context that will make search valuable. Whether or not I can ask a specific question is irrelevant to me. I'll figure out a way to ask the question; however, I want more information back in an easy to understand manner. I want the site's PageRank, I want a general view of how other sites have posted about the site in question (positive or negative), I want to see complaints or complements if it is a potential employer. I want CONTEXT. It seems to me that people get on the internet a lot for research. You research a good book to buy or what digital camera to get or where to go on vacation. All of these things could be enhanced by adding more context, more data mining, and better presentation of the information.

That will be search 2.0.


Anonymous said...

So u r saying you need a search engine which provides u the information for example of a camera or something prioritizing the results..
Yeah work is going on that. I think it will be the next big product to be released by any company...

Tanton said...

Yeah, it seems like Microsoft's Live Search is close to having the product functionality I want. It is definitely ahead of the competition there. Now, I want the same analysis done on everything I search for.