Single Cell Transcriptomics

Modern RNA-seq protocols provide the means for analyzing the transcriptomes of large collections of individual cells. This type of data exhibits high degree of stochastic variability. Probabilistic alternatives to traditional clustering and principal component analysis methods are needed to determine and interpret cell-to-cell variability in large single-cell datasets. These alternative probabilistic methods can then be used to cluster and identify subpopulations in single-cell datasets while capturing uncertainty. Potential applications include identifying driver subpopulations in tumors, identifying novel cell subpopulations, etc.

Single Cell Variant Calling

Calling mutations and variants on the single cell level is subject to high degrees of uncertainty due to uneven coverage and sequencing error. Again, probabilistic models are needed to assess and interpret observed variants and account for potential unobserved or missing data. These alternative probabilistic methods can be used to reconstruct subpopulation architectures and be used in conjunction with transcriptomic analyses to connect phenotype with genotype on a single cell level.

Functional Impact Prediction

Large-scale sequencing projects are rapidly identifying enormous quantities of novel mutations of undetermined functional impact. Computational models can help prioritize these mutations prior to empirical functional studies by identifying mutations with putative deleterious impact.