Background

It is a fundamental goal of many scientific disciplines to not only predict future events, but to learn how the world works, and to thereby control future events. In modern modeling terminology, this means learning an accurate causal model. Randomized experiments have been considered the gold standard for learning causal relationships, however randomized experiments are often costly, infeasible, or unethical, especially in biomedicine.

In recent years there has been an accelerated accumulation of observational biomedical data, and causal analytics have matured to become more usable, reliable, and applicable to more complex data. Breakthroughs in high-throughput technologies, such as gene expression microarrays, mass spectrometry, SNP arrays, tissue arrays, and single cell arrays, have allowed multi-modular molecular data to be measured on the genome scale. Similarly, the modernization of the health care system has resulted in the accumulation of a vast amount of structured and unstructured clinical data stored in Electronic Health Records. Advances in causal analytics have also occurred in this time, including methods capable of scaling to the size and complexity of these growing data sets.

There have been relatively few applications of causal analytics to observational biomedical data despite the availability of numerous computational causal discovery methods. Applying causal discovery methods to bioinformatics and biomedicine has at least two benefits. (1) The successful application of causal discovery methods to biomedical data could produce tangible results, such as discovering novel biological mechanisms, identifying novel treatment targets and strategies, and improving diagnosis, prognosis, and treatment assignment. (2) The analysis of data and learning problems in the bioinformatic and biomedical domains could help advance the theory of causal analytics and inspire the development of improved causal discovery algorithms. Data from bioinformatics and biomedical domain have distinct characteristics, and analysing them may require tailored statistical methods and algorithmic strategies, which could expand the horizon of the current theoretical framework.