In a field as complex as cancer research, sometimes scientists struggle to sing from the same proverbial songbook. Cancer data are complex to analyze and subtle differences in the way they are handled could lead to seemingly discordant results.
But a new cloud-based tool designed and built by Princess Margaret Cancer Centre scientists brings data from cutting-edge global oncology research into harmony like the music of the best symphony orchestra.
The ORCHEStration Tool for Reproducible Analysis (ORCESTRA) processes, annotates, curates and releases some of the world's most-used biomedical data. These data are the key building blocks that scientists will use to make the next important scientific breakthroughs in cancer research.
Currently, these data sets, and how they are created and processed, are not always transparent. The complex findings they reflect are then difficult to replicate for even the most sophisticated labs, either slowing research or putting it out of reach for some scientists. The partitions are missing, resulting in a cacophony that is slowing down the pace of cancer research.
Breaking down those barriers is key to improving cancer science and making a difference for patients, says Princess Margaret Cancer Centre Senior Scientist Dr. Benjamin Haibe-Kains, whose lab created the tool.
"Science builds upon the work of others," he says. "If you don't share the code and the data, then I would say more than half of the impact that you could have is gone."
ORCESTRA, which can be found online at
Orcestra.ca, is the subject of a new paper featured in the journal
Nature Communications. Dr. Haibe-Kains is senior author of the study. Read more about the study.
"It's not just what you write in a paper anymore," he says. "In 2021, the research is not just a textual description, it's a series of research outputs or deliverables that are very valuable.
"And that includes data and the code used to process them."
Dr. Haibe-Kains says many of the most complex sets, such as genomic sequencing data, often arrive in very rough forms, which then take time for researchers to organize and annotate if they want to use these in their own work. However, make these complex data "analysis-ready" is often an error-prone and poorly documented process.
ORCESTRA attempts to address that problem by providing data sets and detailed documentation and computer code which allow researchers to see how the raw data was processed. The tool contains a variety of clinical and pre-clinical, genomic and perturbation profiles of cancer data for download. It is also customizable, offering multiple ways to process a data set to meet a researchers' needs.
"Who are we to say we know the best way to process the data?" Dr. Haibe-Kains says. "What we've done is instead of giving researchers one way to process it we've implemented multiple analysis pipelines for researchers to choose from and clearly documented the different versions of the data."
Dr. Haibe-Kains says the ultimate goal is to increase the quality of science to benefit patients.
"If we make a discovery using data from ORCESTRA, it's more likely to be reproducible and impactful because we are able to document every single step that led to this discovery," he says.
This project is supported by the Princess Margaret Cancer Foundation, the Canadian Institutes of Health Research and the Government of Ontario. The implementation of the ORCESTRA platform has been partially supported by Genome Canada and Ontario Genomics through a Bioinformatics and Computational Biology grant.
Competing Interests
Dr. Haibe-Kains is a shareholder and paid consultant for Code Ocean Inc. Code Ocean Inc. did not participate in the design and the execution of the study.