Sunday, June 26, 2011

Yahoo Summer School on IR - 2011

Last week, Yahoo Research organized a summer school on IR at IISC, Bangalore campus. This is the first time I visited IISC, the campus is lush green all around and add to it the nice weather.

Most of the sessions on classical IR were fairly detailed(but fast paced). A lot of the discussion assumed upfront knowledge of basic probability theory and linear algebra, which is kind of ok if you are attending a session on any advanced computer science material.

I could easily comprehend the sessions on Introduction to IR, IR Models(Boolean Retrieval, Vector space model, BIRM & BM25, IR based on language model), IR system quality evaluation and also how to create test collections, index creation with a bit of index compression techniques. Along the way, I followed the first half(up to chapter - 12, Language models for IR) of IIR book which covers more or less the same material covered in the mentioned sessions.
Then the sessions on web retrieval did a good job of describing the challenges faced in the web retrieval. It covered a very high level shape of web graph, architecture of web IR and crawlers. This also gave details about how pagerank is typically calculated from the web graph.

Then there were sessions on multimedia retrieval and learning to rank which tried to cover too much ground and I could only get a very high level outline of the area.

Overall it was a good effort and provided a very good introduction to the field of IR. I wish they had covered more on dynamic indexing as in the real world documents get added, deleted and modified every moment.

And, Yahoo research definitely deserves much appreciation for organizing this summer school.

References:
Most of the slides from sessions are available with Yahoo labs.
Modern Information Retrieval is written by one of the presenters in the summer school. It was cited multiple times and sessions on web retrieval are directly from the book chapters.
Introduction to Information Retrieval is another well known book by the author of famous Foundations of statistical natural language processing.