The Geometry of Constrained Structured Prediction: Applications to Inference and Learning of Natural Language Syntax
This thesis proposes new models and algorithms for structured output prediction, with an emphasis on natural language processing applications. We advance in two fronts: in the inference problem, whose aim is to make a prediction given a model, and in the learning problem, where the model is trained from data.
For inference, we make a paradigm shift, by considering rich models with global features and constraints, representable as constrained graphical models. We introduce a new approximate decoder that ignores global effects caused by the cycles of the graph. This methodology is then applied to syntactic analysis of text, yielding a new framework which we call “turbo parsing,” with state-of-the-art results.
For learning, we consider a family of loss functions encompassing conditional random fields, support vector machines and the structured perceptron, for which we provide new online algorithms that dispense with learning rate hyperparameters. We then focus on the regularizer, which we use for promoting structured sparsity and for learning structured predictors with multiple kernels. We introduce online proximal-gradient algorithms that can explore large feature spaces efficiently, with minimal memory consumption. The result is a new framework for feature template selection yielding compact and accurate models.