As large-scale techniques for studying and measuring gene expressions have been developed, automatically inferring gene interaction networks from expression data has emerged as a popular technique to advance our understanding of cellular systems.

Accurate prediction of gene interactions, especially in multicellular organisms such as Drosophila or humans, requires temporal and spatial analysis of gene expressions, which is not easily obtainable from microarray data. New image based techniques using in-situ hybridization(ISH) have recently been developed to allow large-scale spatial-temporal profiling of whole body mRNA expression. However, analysis of such data for discovering new gene interactions still remains an open challenge.

This project studies the question of predicting gene interaction networks from ISH data. We address this problem by sub-dividing it into 4 challenges

  1. Extract informative features from the image
  2. Learn a gene network from ISH images from a single time-point by analyzing spatial similarity alone
  3. Learn a gene network from multiple data sources, where data from each time point is considered a separate source.
  4. Extend the learning algorithm to also learn the importance of each data source, by using a small number of known gene interactions as training data.

1. SPEX2 : Automated Concise Extraction of Spatial Gene Expression Patterns from Fly Embryo ISH Images

We present SPEX2, an automatic system for embryonic ISH image processing, which can extract, transform, compare, classify and cluster spatial gene expression patterns in Drosophila embryos. Our pipeline for gene expression pattern extraction outputs the precise spatial locations and strengths of the gene expression. We performed experiments on the largest publicly available collection of Drosophila ISH images, and show that our method achieves excellent performance in automatic image annotation, and also finds clusters that are significantly enriched, both for gene ontology functional annotations, and for annotation terms from a controlled vocabulary used by human curators to describe these images.

Kriti Puniyani, Christos Faloutsos, Eric P. Xing, SPEX2: Automated Concise Extraction of Spatial Gene Expression Patterns from Fly Embryo ISH Images. Intelligent Systems for Molecular Biology (ISMB) 2010, Boston, USA.

Supplementary material, Code

The SPEX2 pipeline

2. Gin-IM : From ISH images to Gene Interaction Networks

Gin-IM learns gene interaction networks from embryonic ISH images, by extending recent work in learning sparse undirected graphical models to predict interactions between genes. By capturing the notion of spatial similarity of gene expression, while taking into account the presence of multiple images per gene via multi-instance kernels, Gin-IM predicts meaningful gene interaction networks. Using both synthetic data and a small manually curated data set, we demonstrate the effectiveness of our approach in network building. Further, results are reported on a large publicly available collection of Drosophila embryonic ISH images from the Berkeley Drosophila Genome Project, where Gin-IM makes novel and interesting predictions of gene interactions.

Kriti Puniyani, Eric P. Xing, Inferring gene interaction networks from ISH images via kernelized graphical models. European Conference on Computer Vision (ECCV) 2012, Firenze, Italy.

Supplementary material.

(a) Univariate measurements taken simultaneously for all genes simplifies gene network inference from microarray data. (b) Gin-IM extends such analysis to inferring a network from bags of images per gene.

3. NP-MuScL: Unsupervised global prediction of interaction networks from multiple data sources

We propose NP-MuScL (nonparanormal multi source learning) to estimate a gene interaction network that is consistent with multiple sources of data, having the same underlying relationships between the nodes \cite{Puniyani2013}. NP-MuScL casts the network estimation problem as estimating the structure of a sparse undirected graphical model. We use the semiparametric Gaussian copula to model the distribution of the different data sources, with the different copulas sharing the same covariance matrix, and show how to estimate such a model in the high dimensional scenario.
Results are reported on synthetic data, where NP-MuScL outperforms baseline algorithms significantly, even in the presence of noisy data sources. Experiments are also run on two real-world scenarios: two yeast microarray data sets, and three Drosophila embryonic gene expression data sets, where NP-MuScL predicts a higher number of known gene interactions than existing techniques.

Kriti Puniyani, Eric P. Xing, NP-MuScL: Unsupervised global prediction of interaction networks from multiple data sources. the 17th Annual International Conference on Research in Computational Molecular Biology (RECOMB) 2013, Beijing, China.

Supplementary material.

The overall algorithm for NP-MuScL. Each data source is transformed into a Gaussian, using a nonparanormal, and the Gaussian data is then used to jointly estimate a inverse covariance matrix, giving the structure of the Gaussian Graphical Model, underlying the data.