Uncovering Structure in High-Dimensions: Networks and Multi-task Learning Problems

Mladen Kolar

Thesis Committee: Eric P. Xing, Chair, Aarti Singh , Larry Wasserman, Francis Bach (INRIA)


Extracting knowledge and providing insights into complex mechanisms underlying noisy high-dimensional data sets is of utmost importance in many scientific domains. Statistical modeling has become ubiquitous in the analysis of high dimensional functional data in search of better understanding of cognition mechanisms, in the exploration of large-scale gene regulatory networks in hope of developing drugs for lethal diseases, and in prediction of volatility in stock market in hope of beating the market. Statistical analysis in these high-dimensional data sets is possible only if an estimation procedure exploits hidden structures underlying data.

This thesis develops flexible estimation procedures with provable theoretical guarantees for uncovering unknown hidden structures underlying data generating process. Of particular interest are procedures that can be used on high dimensional data sets where the number of samples n is much smaller than the ambient dimension p. Learning in high-dimensions is difficult due to the curse of dimensionality, however, the special problem structure makes inference possible. Due to its importance for scientific discovery, we put emphasis on consistent structure recovery throughout the thesis. Particular focus is given to two important problems, semi-parametric estimation of networks and feature selection in multi-task learning.