Biologists' increasing abilities to make simultaneous measurements of the abundance, linear structure, or modification level of many members of a particular class of bio-molecules (DNA, RNA, and protein) in cells or tissues have produced a tremendous increase in the resolution at which cellular activities can be viewed, and as a result, an intriguing set of new analytic problems. How can we use such information to discern what cellular processes are active in a given sample? Can the differences in such survey data be used to distinguish healthy tissues from diseased ones, or to differentiate types of diseases such as cancer into subgroups based on molecular typing for the purposes of prognosis or treatment decisions?
The first measurement technology to allow broad measurement of one of the cell's dynamically regulated systems was the use of microarray systems to survey the relative abundance of mRNA transcripts present in a cell. The laboratory has spent much effort over the last six years in increasing the precision and accuracy of such measurements, and establishing objective measures of the quality of each of the individual measurements in these large series of measurements. Simultaneously, many approaches to use these well-characterized measurements to gain insight into the molecular processes of healthy and diseased cells have been developed in collaborations with a variety of signal processing engineers, mathematicians, logicians and statisticians worldwide. The earliest methods were based predominantly on the most simple mathematical approaches, correlation and distributional tests. These allowed the classification of tissue samples on the basis of their similarity, and then the determination of the genes most differential between the sample sets, leading to useful tumor classification methods and providing insight into how the process of altering the regulation of the very diverse set of genes that must have their activity levels changed as a cancer cell becomes actively metastatic can be triggered.
In the course of these studies it became apparent that though such tools are quite good at looking for alterations that can largely be explained by a very simple cause, they do not provide much insight into the mechanics of gene regulation or into processes that are controlled by the interactions of many genes. A variety of analytic methods that are capable of recognizing multi-gene interactions that may contribute to phenotypic changes have been developed. Such methods typically require much more computational power and have resulted in an ongoing collaboration with the current members of TGen's High Performance Computing team. Two themes have developed from this work. One is that it is important to consider a given cellular expression state in a cell or tissue as a context within which to interpret possible patterns of interaction. This is very commonsensical, since many of the gene products present in a cell regulate the current level of transcriptional activity of the genome. The second major theme is that to deduce information about regulation in the complex control system that operates in the cell, it is important to devise analyses based on models of the operation of that system. This too makes sense, if the regulatory role of a gene presents in ways that are both quantitatively and qualitatively different when this gene is simultaneously present with another gene, then methods that cannot account for this contingency will not detect this form of information. Developing analytic tools that are able to detect gene interactions that contribute to phenotypic regulation and developing new measurement systems to allow rapid validation of the inferred regulatory relationships will be the central areas of the lab's research.