Automated annotation of gene expression image sequences via non-parametric factor analysis and conditional random fields.
Journal Article (Journal Article)
MOTIVATION: Computational approaches for the annotation of phenotypes from image data have shown promising results across many applications, and provide rich and valuable information for studying gene function and interactions. While data are often available both at high spatial resolution and across multiple time points, phenotypes are frequently annotated independently, for individual time points only. In particular, for the analysis of developmental gene expression patterns, it is biologically sensible when images across multiple time points are jointly accounted for, such that spatial and temporal dependencies are captured simultaneously. METHODS: We describe a discriminative undirected graphical model to label gene-expression time-series image data, with an efficient training and decoding method based on the junction tree algorithm. The approach is based on an effective feature selection technique, consisting of a non-parametric sparse Bayesian factor analysis model. The result is a flexible framework, which can handle large-scale data with noisy incomplete samples, i.e. it can tolerate data missing from individual time points. RESULTS: Using the annotation of gene expression patterns across stages of Drosophila embryonic development as an example, we demonstrate that our method achieves superior accuracy, gained by jointly annotating phenotype sequences, when compared with previous models that annotate each stage in isolation. The experimental results on missing data indicate that our joint learning method successfully annotates genes for which no expression data are available for one or more stages.
Full Text
Duke Authors
Cited Authors
- Pruteanu-Malinici, I; Majoros, WH; Ohler, U
Published Date
- July 1, 2013
Published In
Volume / Issue
- 29 / 13
Start / End Page
- i27 - i35
PubMed ID
- 23812993
Pubmed Central ID
- PMC3694682
Electronic International Standard Serial Number (EISSN)
- 1367-4811
Digital Object Identifier (DOI)
- 10.1093/bioinformatics/btt206
Language
- eng
Conference Location
- England