Keynote Speakers

DB-IR Integration and Its Application to a Massively-Parallel Search Engine

1Kyu-Young Whang (KAIST, Korea)

Nowadays, as there is an increasing need to integrate the DBMS (for structured data) with Information Retrieval (IR) features (for unstructured data), DB-IR integration is becoming one of major challenges in the database area. Extensible architectures provided by commercial object-relational DBMS(ORDBMS) vendors can be used for DB-IR integration. Here, extensions are implemented using a high-level (typically, SQL-level) interface. We call this architecture loose-coupling. The advantage of loose-coupling is ease of implementation. But, loose-coupling is not preferable for implementing new data types and operations in large databases when high performance is required. In this talk, we present a new DBMS architecture applicable to DB-IR integration, which we call tight-coupling. In tight-coupling, new data types and operations are integrated into the core of the DBMS engine in the extensible type layer. Thus, they are incorporated as the “first-class citizens” within the DBMS architecture and are supported in a consistent manner with high performance. This tight-coupling architecture is being used to incorporate IR features and spatial database features into the Odysseus ORDBMS that has been under development at KAIST/AITrc for over 19 years. In this talk, we introduce Odysseus and explain its tightly-coupled IR features (U.S. patented in 2002). Then, we demonstrate excellence in performance of tight-coupling by showing benchmark results. We have built a web search engine that is capable of managing 100 million web pages per node in a non-parallel configuration using Odysseus. This engine has been successfully tested in many commercial environments. This work won the Best Demonstration Award from the IEEE ICDE conference held in Tokyo, Japan, in April 2005. Last, we present a design of a massively-parallel search engine using Odysseus. Recently, parallel search engines have been implemented based on scalable distributed file systems (e.g., GFS). Nevertheless, building a massively-parallel search engine using a DBMS can be an attractive alternative since it supports a higher-level (i.e., SQL-level) interface than that of a distributed file system while providing scalability. The parallel search engine designed is capable of indexing 30 billion web pages with a performance comparable to or better than those of state-of-the-art search engines.

Biography: Kyu-Young Whang is a KAIST Distinguished Professor and Professor of Computer Science at KAIST.  Previously, he was with IBM T.J.Watson Research Center from 1983 to 1990.  Since joining KAIST in 1990, he has been leading the Odysseus DBMS/Search Engine project featuring tight-coupling of DBMS with information retrieval (IR) and spatial functions.  An earlier version of this technology played a vital role in starting up NaverCom Co. (currently, NHN Co.) in 1997-2000, which is the number one portal in Korea.  Dr. Whang is one of the pioneers of probabilistic counting, which nowadays is being widely used in approximate query answering, sampling, and data streaming.  One of the algorithms he co-developed at IBM Almaden (then San Jose)  Research Lab in 1981 has been made part of DB2.  Dr. Whang is the author of the first main-memory relational query optimization model developed in 1985 and reported in 1990 in ACM TODS in the context of Office-by-Example (OBE).  This model influenced subsequent optimization models of commercial main-memory DBMSs.  His research has covered a wide range of database issues including physical database design, query optimization, DBMS engine technologies, and more recently, IR, spatial databases, data mining, and XML.  Dr. Whang was the Coordinating Editor-in-Chief of the prestigious VLDB Journal, having served the journal for 19 years from its inception as a founding editorial board member.  He is a Trustee Emeritus of the VLDB Endowment and served the international academic community as the General Chair of VLDB2006, DASFAA2004, and PAKDD2003, as a PC Co-Chair of VLDB2000, CoopIS1998, and ICDE2006, and as an editorial board member of journals such as IEEE TKDE, The WWW Journal, and IEEE Data Engineering Bulletin. He served as the Chair of the Steering Committee of the DASFAA International Conference and as a co-founder of the Korea-Japan Database Workshop (KJDB) annually held alternately in Korea and Japan.  He is a member of the ACM SIGMOD Dissertation Award Committee and served as a member of many 10-year Best or Influential Paper Award committees of VLDB and IEEE ICDE.  He served as an IEEE Distinguished Visitor from 1989 to 1990 and was invited to ACM SIGMOD Distinguished Profile in Databases in 2007.   He earned his Ph.D. from Stanford University in 1984.  Dr. Whang is an IEEE Fellow, a member of the ACM and IFIP WG 2.6.


Confucius and "its" Intelligent Disciples

2Edward Chang (Google Research China)

Confucius is a great teacher in ancient China. His theories and principles were effectively spread throughout China by his disciples.  Confucius is the product code name of Google’s Knowledge Search product, which is built at Google Beijing lab by my team.  In this talk, I present Knowledge Search’s key disciples, which are data management subroutines that generate labels for questions, that match existing answers to a question, that evaluate quality of answers, that rank users based on their contributions, that distill high-quality answers for search engines to index, etc.  This talk presents scalable algorithms that we have developed to make these disciples effective in dealing with huge datasets. Efforts in making these algorithms run even faster on thousands of machines, and some open research problems will also be presented.

Biography: Edward Chang heads Google Research in China since March 2006.  He joined the department of Electrical & Computer Engineering at University of California, Santa Barbara, in 1999 after receiving his PhD from Stanford University. Ed received his tenure in 2003, and was promoted to full professor of Electrical Engineering in 2006. His recent research activities are in the areas of distributed data mining and their applications to rich-media data management and social-network collaborative filtering. His research group (which consists of members from Google, UC, MIT, Tsinghua, PKU, and Zheda) recently parallelized SVMs (NIPS 07), PLSA (KDD 08), Association Mining (ACM RS 08), Spectral Clustering (ECML 08), and LDA (WWW 09) (see MMDS/CIVR keynote slides for details) to run on thousands of machines for mining large-scale datasets. Ed has served on ACM (SIGMOD, KDD, MM, CIKM), VLDB, IEEE, WWW, and SIAM conference program committees, and co-chaired several conferences including MMM, ACM MM, ICDE, and WWW. Ed is a recipient of the IBM Faculty Partnership Award and the NSF Career Award. 


Advanced Metasearch Engines

3Clement Yu (University of Illinois at Chicago)

A metasearch engine is a system, which is connected to different search engines. In response to a user query, it invokes suitable search engines for the query, merges the information returned by these search engines and output the merged result. There are two types of metasearch engines: one type for unstructured data (mostly text) and the other for structured data. In comparison to a text search engine, a metasearch engine can have a higher coverage of the Web and can have more timely information. A metasearch engine for structured data facilitates comparison shopping and services and is convenient to use. In this talk, we discuss the problems and their potential solutions. In addition, challenges and unsolved problems are sketched.

Biography: Clement Yu is a professor in the Department of Computer Science at the University of Illinois at Chicago. His areas of research are information retrieval, data base management and applications to health care. He served as chair of the ACM SIGIR society, program committee chair of ACM SIGIR conference, general chair of ACM SIGMOD conference and as an advisory committee member of the National Science Foundation. He has published more than 200 papers in various journals such as JACM, TODS, TOIS, TKDE, and TSE and in various conferences such as SIGIR, CIKM, SIGMOD, VLDB, WWW and ICDE. He has served as associate editor/member of editorial board of several journals such as TKDE.

Platinum Supporters

Gold Supporters

Bronze Supporters

Organizations