MSRA-HIT Summer School Course: Supervised and Semi-supervised Learning with Linear Models

Slides

Supervised learning [ Office 2007 ] [ Office 2003 ] [ PDF ]

Semi-supervised learning [ Office 2007 ] [ Office 2003 ] [ PDF ]

Course Description

A strong foundation in machine learning is becoming an essential skill for doing productive research in natural language processing. 50 of the 96 full papers at ACL 2008 explicitly mention statistical modeling or a specific learning algorithm in their titles. Many of the others also use probabilistic models or solve problems related to machine learning in NLP.

This course is an introduction to two common paradigms in machine learning: supervised and semi-supervised learning. Supervised methods are the most widely-used and well-understood learning techniques. They model a prediction function from an input to an output and are used throughout natural language processing, including in the canonical problems of tagging, parsing, speech recognition, and machine translation, as well as new problems like web search and advertising. The first part of the tutorial describes generative modeling, focusing on the Naive Bayes model. We discuss modeling the input, the principle of maximum likelihood estimation, and the independence assumptions inherent in the Naive Bayes model.

The next part of the tutorial focuses on discriminative learning. We will begin by describing the difference between generative and discriminative models. Then we will cover in detail boosting and support vector machine machines, focusing on loss functions and optimization techniques. The last part of the supervised section will give several examples of boosting and support vector machines applied to text processing problems.

Day two of the tutorial will reprise my ACL tutorial with Jerry Zhu on semi-supervised learning. For more information on this tutorial, please visit its website.

Other Materials

Dan Klein's tutorial on classification at NAACL 2007.

Dan Klein and Ben Taskar's tutorial on max-margin methods at ACL 2005.

Ryan McDonald's course on generalized linear classifiers in NLP.

Chris Bishop's book: Pattern Recognition and Machine Learning.