| Location | Title | Presenters |
| November 2nd: 9:00am - 12:30am | ||
| Room 202 | Tutorial 1 Resources and Methods for the Acquisition of Open-Domain Concepts and Conceptual Hierarchies from Text | Marius Pasca Google Inc. |
| Room 203 | Tutorial 2 Introduction to Computational Advertising | Andrei Broder, Vanja Josifovski, Evgeniy Gabrilovich Yahoo! Research |
| November 2nd: 2:00am - 5:30am | ||
| Room 202 | Tutorial 3 Parallel Algorithms for Mining Large-scale Datasets | Edward Y. Chang, Kaihua Zhu and Hongjie Bai Google Research |
| Room 203 | Tutorial 4 Statistical Models for Web Search Clicks Log Analysis | Fan Guo Carnegie Mellon University and Chao Liu Microsoft Research |
Tutorial 1
Resources and Methods for the Acquisition of Open-Domain Concepts and Conceptual Hierarchies from Text
Marius Pasca, (Google Inc.)
Abstract:
Despite differences in the types of targeted information, as well as
underlying algorithms and tools, a common theme shared across recent
approaches to information extraction is an aggressive push towards
large-scale extraction. Documents spanning various genres are readily
available on the Web, providing significant amounts of textual content
towards the acquisition of instances, concepts and conceptual
hierarchies, as a step towards the far-reaching goal of automatically
constructing knowledge bases from unstructured text. This tutorial
provides an overview of extraction methods developed in the area of
Web-based open-domain information extraction, with the purpose of
acquiring sets of instances within unlabeled or labeled open-domain
concepts. The concepts are organized either as a flat set of
hierarchically. The extraction methods operate over unstructured or
semi-structured text available within collections of Web documents, or
over relatively more intriguing streams of anonymized search queries.
They take advantage of weak supervision provided in the form of seed
examples or small amounts of annotated data, or draw upon knowledge
already encoded within resources created strictly by experts or
collaboratively by users. The more ambitious methods, aiming at
acquiring millions of instances from text, need to be designed to scale
to Web collections – a restriction with significant consequences on
overall complexity and choice of underlying tools – in order to
ultimately aid information retrieval in general and Web search in
particular, by producing open-domain concepts, along with facts or
relations among instances or among concepts.
Tutorial 2
Introduction to Computational Advertising
Andrei Broder, Vanja Josifovski, Evgeniy Gabrilovich (Yahoo! Research)
Abstract:
Online advertising affects virtually every Web user, and over the
recent years has grown into a $20 billon industry. As with the Web
corpus, the structure of the online ads is substantially different than
any other previously studied text corpus. The queries used for
selecting online ads can also differ substantially from the commonly
explored short textual queries, as for example when selecting
advertisements for a given web page or specific context of a user.
These differences require reexamination of many conclusions of
traditional IR, such as document analysis, query expansion, scoring and
length normalization, and performance evaluation. In this tutorial we
will give an overview of the Ad Retrieval field of Computational
Advertising. Computational advertising is a new scientific discipline
that studies the process of advertising on the Internet and combines
methods from IR, machine learning, statistics, optimization and
economics to select the optimal ads for a given user in a given context
on the Web. We will demonstrate how to employ a relevance feedback
assumption and use Web search results retrieved by the query. This step
allows one to use the Web as a repository of relevant query-specific
knowledge. We will also describe techniques that go beyond the
conventional bag of words indexing, and construct additional features
using a large external taxonomy and a lexicon of named entities
obtained by analyzing the entire Web as a corpus.
Tutorial 3
Parallel Algorithms for Mining Large-scale Datasets
Edward Y. Chang, Kaihua Zhu and Hongjie Bai (Google Research)
Abstract:
The explosive growth of data such as text, photo, video, and biological
requires scalable computational solutions. For instance, YouTube
attracts more than10-hour videos per minute. Photo sites such as Flickr
and PicasaWeb receive millions of uploads per week. And the coming of
personal genome data can be exceedingly demanding in storage and
computation. To organize, index, analyze, and retrieve these
large-scale data, a system must employ scalable algorithms. Therefore,
at the forefront, the research community ought to consider solving the
real, large-scale problems, rather than dealing with small toy
datasets, which success does not translate to real-world, large
datasets. In this tutorial, we will present key models and parallel
algorithms for dealing with data in the Gegascale. We will also provide
participates a huge annotated dataset to conduct research.
Tutorial 4
Statistical Models for Web Search Clicks Log Analysis
Fan Guo (Carnegie Mellon University) and Chao Liu (Microsoft Research)
Abstract:
Every day billions of queries and clicks submitted to search engines
are automatically logged and aggregated. Such click data have become
one of the most important and extensive feedback signals from the World
Wide Web audience. They are valuable resources for both information
retrieval researchers, to better understand human interaction with
retrieval results and calibrate their hypotheses or models, and web
search practitioners, to measure, monitor and learn to improve search
engine performance. However, the interpretation of user clicks is a
non-trivial task because many elements come into play in the decision
process. For example, previous eye tracking studies indicated that
clicks are generally biased as a form of absolute relevance judgment,
and clicking decision on a web document depends on both the position
(rank) and the context (other documents) of the presentation.
Click
models usually incorporate a statistical depiction of user interaction
with web search results in a query session, by specifying probabilities
of examination and clicks at different positions and how they depend on
each other. They provide principled, scalable solutions to inferring
user-perceived relevance of web documents, and modeling outputs could
be further leveraged in various search-related applications including
search engine quality evaluation and sponsored search auctions. In the
past year, quite a few click models have been presented in leading data
mining, web searches well as information retrieval conferences such as
KDD, WWW, SIGIR and WSDM. They are very well appreciated by audiences
with both academic and industrial backgrounds and have stimulated many
in-depth discussion and investigation. The growing popularity and
impact of this topic are reflected in the fact that both the WWW’09 and
the SIGIR’09 conference programs have an individual session devoted to
click models. We believe that it is timely and in high demand to have a
well-organized tutorial on this emerging growing theme accessible to
researchers and developers from the database, information retrieval,
and knowledge management communities.
In this tutorial, we
will present a comprehensive overview of these most recent
developments, examine and compare state-of-the-art models, explore
several application scenarios, and lay out challenges as well as future
directions of this area.