The rise of the biologists: The changing face of the bioinformatics industry

AbbVie’s FDA accelerated approval for Venclexta (ventoclax) in CLL was the result of 20 years of work by “a dedicated group of scientists who really understood the biology and how to use the tools” CEO Michael Severino told BioWorld, the daily biopharmaceutical news source, at the time.

Heraclitus famously said “The only thing constant in life is change”. Never is that more true than for the biological research and development field. The fast-paced rate of research makes the industry an exciting place to work but at the same time, it creates a challenging environment in which to keep in-depth biological knowledge up to date. Knowledge that Abbvie attributed as critical to its recent success. Recent publications by both AstraZeneca and Pfizer also highlighted the importance of such knowledge through direct correlations with clinical success and understanding of disease and target biology 1 & 2. AstraZeneca found as part of the analysis that 40% of clinical failures due to efficacy had no target:disease linkage established and that 82% of projects with efficacy biomarkers succeeded. Similarly, Pfizer also found that 43% of their failures due to efficacy had not had their mechanisms sufficiently tested.

We’ve all seen the exponential growth graphs of the volumes of data that have been produced over recent years. According to IBM we are producing “2.5 quintillion bytes of data” each day with 90% of all data in existence being produced in only the past 2 years 3. The challenges of producing and storing such volumes of data alone are significant but the actual analysis and interpretation of the data to facilitate such knowledge presents a further hurdle.

To complicate things further, not all data is the same. Since the data being produced is not all being made on the same machines with the same conditions nor even measuring the same parameters; the challenge of integrating the data presents a further hurdle to the ability to make decisions using this growing resource. However, we are beginning to find methods that allow us to do this efficiently. Implementing such data integration techniques takes knowledge and skill in various processes such as ontology design, management and curation. As a result skilled Data Scientists are becoming a much sought after and valuable resource.

Bernard Marr discusses a rather interesting solution that is naturally evolving to overcome the shortage of skilled data scientists in his recent Forbes Tech article 4. Large companies, such as Sears, are empowering other staff to perform data analytics previously reserved for the Data Scientist. The rise of these “Citizen Data Scientists” is empowered by the invention of appropriate tools and software that automate complex integration and analytics and present them back to the user in such a manner that it is immediately interpretable. The concept isn’t to replace the Data Scientists entirely but instead to release their highly sought after skill set to be applied to the more complex and custom analytics that are required above the day-to-day big data analysis.

Data Scientists are not the only skill set in short demand that is presenting a challenge in the face of data deluge. In a recent Nature commentary Jeffery Chang from the University of Texas Health Science Center observed “biological data are accumulating faster than people’s capacity to analyze them…there are not enough bioinformaticians.” 5 Whilst Chang also observes that there was surprisingly little application of a standardized workflow in the projects coming through his department he also notes that only 5% of his time was spent on “pure bioinformatics – that is, developing new algorithms that merit their own publications.” Such lack of time for innovation in our bioinformatics industry presents a threat to our ability to catch up on the backlog of biological data awaiting analysis by inventing new analytical methods. It also presents a bigger threat to keeping data analytical techniques on a par with the changing data generation technologies.

Taking the lead from the Data Scientists’ scarcity, one solution is to empower the biologist to perform some of the more analytics themselves. Biological researchers typically generate the data and, where available, work with their bioinformatics teams to have the data processed and analyses. The results are then returned to them so that they can then interpret the biological meaning and therefore impact of the experiment. With anywhere between 2 and 100+ biologists for every bioinformatician (in SME/Academia and Large Pharma respectively) 6 even processing the data and getting the results from the teams can lead to a lag in project progression. If biologists are empowered to perform their own data analytics and understand the potential biological impact then the bioinformaticians could be freed up to support the more complex, second pass and pure bioinformatics processes that the industry needs to keep the wheels of progress and profit moving smoothly.

Much like the Citizen Data Scientists, such Citizen Bioinformaticians would require the right toolkit to guide them through the process. Tools would need to automatically analyze the data with meaningful algorithms, display results in a manner that was interpretable without being “black-box” and most importantly be easily reproducible without needing to write a series of small novels capturing the settings and processes that were required to produce it in the first pass.

At Clarivate Analytics we have been working with both biologists and bioinformaticians for many years and this rising trend had concerned us. Our bioinformatician customers were increasingly overloaded and our biologists increasingly frustrated with long wait times. Indeed many biologists have begun to learn to process and analyze their own Omics data in an effort to bypass the bottle neck. This kind of approach whilst to be applauded for proactivity on one hand also presents significant risk if the incorrect algorithmic approach is applied and decisions made therefore on incorrect assumptions. Our solution is the first in a new suite of small applications and is called Key Pathway Advisor.



FIGURE 1:  Key Pathway Advisor. Understand the biology behind your data. Quickly, easily & reproducibly

Key Pathway Advisor is specifically designed with biologists in mind. The workflows apply state of the art bioinformatic algorithms such as synergy pathway enrichments 7 and causal reasoning 8 whilst explaining the results in a manner that a degree in bioinformatics is not required. The focus of the app is to help the user identify the potential biological impact and potential root causes for their expression and gene variant results. The tool also automatically aligns the results with current putative biomarker and drug pipeline knowledge for their disease of interest. Thus producing an interactive report that can be discussed with colleagues (including bioinformaticians) to decide next steps and increase project efficiency; saving everyone time, money and hopefully a few wrinkles.

Find out more about Key Pathway Advisor from our recorded webinar: “An upstream and downstream synergistic approach to OMICs data analysis using the Key Pathway Advisor,”


1 Cook, David; Brown, Dearg; Alexander, Robert; March, Ruth; Morgan Paul; Satterthwaite, Gemma and Pangalos, Menelas N.; 2014; Lessons learned from the fate of Astrazeneca’s drug pipeline: a five dimensional framework; Nature Reviews: Drug Discovery 13; 419-431

2  Nelson, Matthew R.; Johnson, Toby; Warren, Liling; Hughes Arelene R.; Chissoe, Stephanie L., Xu, Chun-Fang and Waterworth, Dawn M.; The genetics of drug efficacy opportunities and challenges. Nature Reviews:Genetics 17: 197-206 (Data correct as of 25th April 2016)

Marr, B. How The Citizen Data Scientist Will Democratize Big Data. Forbes: Tech, APR 1, 2016

Chang, J. Comment: Reward Bioinformaticians, Nature, 520:151 address>6Thomson Reuters independent customer & market research- data not published.

Riley BE, Gardai SJ, Emig-Agius D, Bessarabova M, Ivliev AE, Schüle B, et al. (2014) Systems-Based Analyses of Brain Regions Functionally Impacted in Parkinson’s Disease Reveals Underlying Causal Mechanisms. PLoS ONE 9(8): e102909.

Chindelevitch L, Ziemek D, Enayetallah A, Randhawa R, Sidders B, et al. (2012) Causal reasoning on biological networks: interpreting transcriptional changes. Bioinformatics 28: 1114.