Using Visualization and Automation to Accelerate Genetics Discovery

Ross E. Curtis

Thesis Committee: Eric P Xing, Gregory Cooper, chair, Kathryn Roeder, Daniel Weeks, Sally Wenzel


Abstract:

The last ten years since the completion of the human genomic sequencing project have seen huge advances in the understanding of the genetic basis of human disease. Understanding the genes involved in disease and the causal genomic polymorphisms involved holds the promise of better treatment and prevention of disease. Much of the recent progress has been made through the use of the popular genome-wide association study (GWAS). However, despite the success of GWAS, its findings often fail to explain the full heritability of a disease, or the findings include SNPs that affect a disease through some unknown biological mechanism.

The incorporation of gene expression or clinical trait data into GWAS is one approach that can further elucidate the mechanisms behind SNP-disease associations. These so-called intermediate phenotypes have inherent structures, such as correlations and interactions, which can be leveraged to facilitate discovery. The promise of these data has motivated a new generation of GWAS algorithms, termed structured association mapping, which use cutting-edge machine learning techniques to fully leverage structures in the data to uncover associations between the genome, transcriptome, and phenome.

However, the increasing amounts of data used in GWAS, and the complexity of the methods used to analyze the data, demand a new integrative approach to genetics discovery. To fully capture the potential available in today’s genetic data, we must rely on the strengths of machines and people. With this in mind, I have developed a visual analytics software system called GenAMap. GenAMap has been built to automate the execution of structured association mapping algorithms, making them available to genetics analysts. Through GenAMap, I introduce new visualizations that are built to enable analysts to explore the structure of genetic data while considering genomic associations. Through the integration of the strengths in the machine learning, visualization, and genetics fields, I show that GenAMap has the potential to facilitate and advance the progress of genetics discovery through the analysis of human asthma, yeast, and mouse datasets.

In this work I also demonstrate the integration of visualization and machine learning to another domain in genetics research: the study of dynamic genetic networks. I present TVNViewer, an online visualization tool for exploring these networks, and use a yeast and breast cancer dataset to show how the visualizations in TVNViewer enable the analysis and exploration of the networks as they change across time and space.

In the genetics world where the amount of available data continues to grow, the integration of visualization and machine learning techniques has the potential to accelerate advancement in genetics discovery.

Thesis
Presentation