Topic Models, Gaussian Integrals, and Getting Scooped

Today’s post is a brief overview of one of my research projects; one that’s unfortunately already been done before. It is going to be a little mathematical, but I’ll try and provide sufficient intuitive reasoning that you don’t have to really be at one with the mathematics. Topic Models So, these days, every Tom, Dick and Harry can crawl the web or some other incomprehensibly large source of news, views and… I’d rather not continue. What could you do with this kind of data? Well, you could use it to classify/cluster web-pages for use in search engines like DuckDuckGo, or for recommending books, etc. While analysing these documents text, single words on their own like “files” could have several different connotations, depending on whether you’re talking about computers, bureaucracy or breaking out of a prison. However, you might be able to disambiguate which “topic” is being talked about depending on other words you saw like “screen”, “touchpad” and “Steve Jobs”. ...

March 25, 2012 · 9 min · 1774 words · Arun Tejasvi Chaganty