Seminars
Research

>>
Home > Research > Research Seminars and Conferences > Seminars

Discovery of Sequence Patterns
9 November 2009


Time : 10:30 am - 11:30 am
Venue : Room PQ703, 7/Floor, P core, Anita Chan Lai Ling Building, The Hong Kong Polytechnic University
Speaker : Prof. Andrew K.C. Wong, University of Waterloo, Waterloo, Ontario, Canada

Sequence data is a very important type of data in many forms: event sequence, biological sequence, web click stream, custom purchase history, etc. Today, a vast amount of such data has been acquired from genomics, proteomics, business and industry. Knowledge discovery from these data has important applications and great value. However, much more efficient and effective methods are badly needed since the number of patterns mined from real world data in sequence mining today are overwhelmingly huge and containing considerable redundancies. This presentation reports a new method that discovers sequence patterns from single and/or multiple sequences, ensuring that all the patterns discovered are closed; statistically significant yet not statistically induced by their embedded lower order statistically significant patterns. Hence, redundant patterns are pruned to greatly reduce the pattern number and allow more succinct and compact patterns to emerge locally in revealing their local relations. Our method adapts the generalized suffix tree and uses new statistic criteria to identify statistical significant patterns. Its time and space complexity are linear to the character size of the sequence. To validate its capability, first, synthetic data are used. Then an English text taken from a novel with all punctuations and spaces between words removed are used to show that the discovered patterns are functionally meaningful in the linguistic and semantic sense. When the most statistical significant patterns discovered in an arbitrary subsequence in the text sequence are extracted, it is found that almost all of them conform to English words or short phrases. In the entire process, no prior knowledge is required. For genomic data Saccharomyces cerevisiae (Yeast), the discovered patterns which correspond to the consensus binding site sequences in the regulatory families, are ranked top. With this new method, succinct and compact sequence patterns are discovered and their local relation/structures in the sequence are revealed.

The Speaker:
Dr. Andrew Wong holds a Ph.D. from Carnegie Mellon University; and a B.Sc. (Hons) and M.Sc. from the Hong Kong University. He has earned an IEEE Fellow for his contribution in machine intelligence, computer vision, and intelligent robotics. He currently holds the title of Distinguished Professor Emeritus (Systems Design Engineering) at the University of Waterloo. He founded the renowned Pattern Analysis and Machine Intelligence Laboratory (PAMI) in 1980, served as the Director until 2001, and is now honored as PAMI’s Founding Director. He was a Distinguished Chair Professor at the Hong Kong Polytechnic University from 2000 to 2003, and has acted as consultant to government agencies and industry in both North America and Hong Kong since 1969. Dr. Wong has earned international recognition as a researcher, educator, author and entrepreneur in the multi-faceted fields of machine intelligence, knowledge systems, bioinformatics, computer vision, and vision-based intelligent robotics. Dr. Wong was the leading researcher who pioneered the incorporation of the notions of information, entropy and statistical patterns into the study of biological sequences, text, image data, relational databases, sequence data, multiple time series and structural patterns.

Over the years, he has made significant milestones in each of these fields, gaining international recognition in the academic world and making strong impacts to industry. To promote research collaboration and high-caliber training, he established the PAMI Research Group. He has made outstanding contribution in “knowledge transfer” through his highly-cited published work and "technology transfer” through consulting and research contracts and the founding of four successful high-tech companies - Virtek Vision International, Pattern Discovery Technologies, DossierView and D2K Endeavors.

* ALL ARE WELCOME
Contact : Prof. George Baciu
Email : csgeorge@comp.polyu.edu.hk
Tel : 2766-7295 or 2766-7272

back top