My name is Sean Whalen. Here's my curriculum vitae and LinkedIn profile. I'm a research scientist applying machine learning to problems in computational biology and computer security, with recent emphasis on the former. I work with Katie Pollard at UCSF's Gladstone Institutes on genome-wide prediction of enhancer gene targets across diverse cell lines in order to disover causal SNPs and de novo mutations located within distal regulatory regions. Using this knowledge, we hope to identify the genes underlying poorly understood diseases as well as better understand the regulatory regions driving diseases where the causal genes are already known.
I was previously a postdoctoral fellow in computational biology at the Mount Sinai Institute for Genomics and Multiscale Biology headed by Eric Schadt. I worked with Gaurav Pandey primarily on applications of ensemble learning — combining multiple (often thousands) of machine learning classifiers — to build predictive models in various areas of genetics and genomics including genetic interactions, synergistic drug interactions, and protein function. As late entrants into the contest, we were the 4th place team in the 2013 DREAM Toxicogenetics Challenge.
I was also part of a team working on enhanced genotyping of the human leukocyte antigen (HLA) region using long read DNA sequencing technology in combination with error-correcting techniques utilizing short read technologies that make independent systematic errors. This work has several applications including improved compatibility between bone marrow donors and recipients. Our team recently pitched to a panel of venture capitalists and was awarded funding as part of a Sinai initiative to commercialize translational research.
I'm a recent transplant into the field, having finished a postdoctoral position during 2012 in the Intrusion Detection Systems lab at Columbia University with Salvatore Stolfo where I worked on anomaly detection and cloud security for DARPA's Mission-oriented Resilient Clouds initiative. In 2011 I finished my I3P postdoctoral fellowship at Lawrence Berkeley National Lab in the Computational Research Division where I developed several methods for anomaly detection in high performance computing systems with Sean Peisert and David Bailey.
I enjoy inter-disciplinary work involving machine learning, statistics, and network theory. I'm also interested in information visualization and virtual reality, having previously developed a complex networks visualization tool with head tracking and gesture recognition for the KeckCAVES project. I completed my Ph.D. at the University of California, Davis in 2010 and was fortunate to be jointly advised by Matt Bishop (Computer Security) and Jim Crutchfield (Physics).
My partner and frequent collaborator is Sophie Engle, currently an Assistant Professor of Computer Science at the University of San Francisco.
The following papers, book chapters, and extended abstracts have all been peer reviewed. Computer science publishes much of its research via competitive peer-reviewed conference and workshop proceedings in addition to journals, whereas biology typically reserves peer review for journals.
|Enhancer-Promoter Interactions are Encoded by Complex Genomic Signatures on Looping Chromatin||Nature Genetics||2016||news & views research highlight press release|
|Unboxing Cluster Heatmaps||Proceedings of the 6th Symposium on Biological Data Visualization (held in conjunction with IEEE VIS) (to appear)||2016|
|Prediction of Human Population Responses to Toxic Compounds by a Collaborative Competition||Nature Biotechnology||2015|
|Predicting Protein Function and Other Biomedical Characteristics with Heterogeneous Ensembles||Methods||2015|
|Model Aggregation for Distributed Content Anomaly Detection||Proceedings of the 7th ACM Workshop on Artificial Intelligence and Security (held in conjunction with the 21st ACM Conference on Computer and Communications Security)||2014|
|Enhancing the Functional Content of Eukaryotic Protein Interaction Networks||PLoS ONE||2014||bibtex|
|A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics||Proceedings of the 13th IEEE International Conference on Data Mining||2013||bibtex pdf|
|Multiclass Classification of Distributed Memory Parallel Computations||Pattern Recognition Letters||2013||bibtex pdf|
|Visualizing Distributed Memory Computations with Hive Plots||Proceedings of the 9th International Symposium on Visualization for Cyber Security (held in conjunction with the 14th Annual IEEE VIS Conference)||2012||bibtex|
|Structural Drift: The Population Dynamics of Sequential Learning||PLoS Computational Biology||2012||bibtex|
|Network-Theoretic Classification of Parallel Computation Patterns||International Journal of High Performance Computing Applications||2012||bibtex pdf|
|A Taxonomy of Buffer Overflow Characteristics||IEEE Transactions on Dependable and Secure Computing||2012||bibtex|
|This is the Remix: Structural Improvisation using Automated Pattern Discovery||Proceedings of the 4th International Workshop on Machine Learning and Music (held in conjunction with the 25th Annual Conference on Neural Information Processing Systems)||2011|
|Network-Theoretic Classification of Parallel Computation Patterns||Proceedings of the 1st International Workshop on Characterizing Applications for Heterogeneous Exascale Systems (held in conjunction with the 25th International Conference on Supercomputing)||2011||bibtex|
|Hidden Markov Models for Automated Protocol Learning||Proceedings of the 6th International ICST Conference on Security and Privacy in Communication Networks||2010||bibtex|
|A Risk Management Approach to the Insider Threat||Insider Threats in Cybersecurity — And Beyond, Springer Verlag||2010||bibtex|
|Case Studies of an Insider Framework||Proceedings of the 42nd Annual Hawaii International Conference on System Sciences||2009||bibtex|
|We Have Met the Enemy and He is Us||Proceedings of the 2008 New Security Paradigms Workshop||2008||bibtex|