Presentation Information
[SS15-01]Continual improvement of cis-regulatory models
*Carl de Boer1 (1. University of British Columbia (Canada))
Keywords:
gene regulation,transcription factors,machine learning
Gene expression is regulated by transcription factors that work together to read cis-regulatory DNA sequences. A primary aim of my group is to decipher the “cis-regulatory code” - the rules that cells use to determine when, where, and how much genes should be expressed. While cis-regulation has proven to be exceedingly complex, recent advances in our ability to query the activity of DNA, combined with Machine Learning have enabled significant progress towards deciphering this code. Here, I will focus on several of our recent efforts to improve cis-regulatory models. First, I will describe a recent DREAM Challenge, where competitors from across the globe competed to create the best sequence-expression models using a dataset of random yeast promoter sequences and their experimentally determined expression levels, which resulted in state-of-the-art model architectures, even for human cis-regulatory data. Next, I will describe an ongoing effort to make cis-regulatory models and evaluation tasks interoperable, streamlining model evaluation and enabling model comparison. Then, I will describe an alternate strategy for dividing the genome into training and test datasets, which substantially mitigates the homology-driven data leakage common in genome-trained models. Finally, I will give a perspective on where the field needs to go to crack the cis-regulatory code.