The Cancer Moonshot Program was announced in the State of the Union address on January 12, 2016. It was funded with the 21st Century Cures Act, which was passed by the U.S. Senate on December 7, 2106 and signed by President Obama on December 13, 2016. The act provided NIH an additional $1.8 billion over seven year in supplemental funding to fund Cancer Moonshot projects and initiatives. The project was led by President Biden, who was then the Vice President of the US.
Data and data sharing played an important strategy in the cancer moonshot strategy. As we approach the fifth year anniversary of the passing of the 21st Century Cures Act, it might be a good time to look back at components of the underlying strategy from an analytic strategy point of view.
With five years of effort behind us, it should be easier to see what worked well, what worked less well, and how we might fine tune the current and planned activities.
As described on the National Cancer Institute’s Cancer Moonshot website [1], “The Cancer Moonshot has three ambitious goals: to accelerate scientific discovery in cancer, foster greater collaboration, and improve the sharing of data.”
As described in the White House’s announcement [2]: “Here’s the ultimate goal: To make a decade’s worth of advances in cancer prevention, diagnosis, and treatment, in five years.” The goal was to get done in five years what would normally take ten years. A critical element of the strategy was to share cancer related data in an effort to accelerate research.
In analytics and AI, the biggest challenge that you face is usually not building a model, but instead collecting or acquiring the data that you need to build the model. I call this crossing the data chasm, and one of the levers used was to force those funded by Cancer Moonshot projects to share data. You can learn more about crossing the data chasm and its role in an analytic strategy in my Primer on Analytic Strategy.
Today as we plan the next five years, it is important to look at how we can leverage data sharing to continue the goal of accomplishing in five years what would normally take ten years.
The good news is that more and more data is being shared when it is funded with federal dollars. As an example, the NCI has developed a data sharing policy for all Cancer Moonshot funded projects [3]. The not so good news is that often data sharing is not required when research is funded by private foundations and private philanthropy, and data is rarely shared when research is funded by industry. Given the size and complexity of the cancer industry, the amount of money at stake, the competitiveness of scientists and cancer centers, the challenges with deidentifying research data, and the legal risk of sharing human subject data that is collected by healthcare providers, most cancer data is not shared, and there is much less progress as a result.
An exception is pediatric cancer. Fortunately, pediatric cancer is relatively rare compared to adult cancers. With fewer cases, there is usually not enough pediatric cancer data at any one research center to provide the number of cases required for research projects, unless data is shared [4].
Some success stories
One of the success stories of the Cancer Moonshot is the BloodPAC Consortium, whose mission is accelerate the development, validation and accessibility of liquid biopsy assays to improve the outcomes of patients with cancer. The BloodPAC Consortium is now an independent 501(c)(3) organization with over 50 consortium members organized into a number of different working groups and operates the BloodPAC Data Commons to support data sharing among its members [5].
Other cancer data sharing success stories include: the NCI Genomic Data Commons, ACR’s Project GENIE and ASCO’s CancerLinQ. Each of these three projects provide large scale data sharing that has accelerated cancer research and resulted in many significant publications.
Why is data sharing so hard?
The first question to ask is: “Why is data sharing so hard?” There are number of reasons, but probably the most important are the following:
- It’s difficult and time consuming for researchers to prepare data and to submit data for data sharing. Often researchers have moved on to new experiments, new analyses, and writing new papers, and sharing data from their last project always remains on their “B-list” of items to do.
- Investigators must balance the potential public good and benefit to patients that results when data is shared compared to the loss of momentum in their career when others publish results from their data that they could have published with more time before they shared data.
- Data is often not collected with consents that make it easy to share.
- Data sharing is often not required, except with federally funded research; and, for federally funded research, data sharing is often not enforced [6].
- There is a risk when data is shared in case the shared data contains sensitive data or third party data sources can be used to re-identify some subjects in large shared datasets [7].
- Often, the computing infrastructure required for data sharing infrastructure is not funded. So data sharing is “unfunded mandate.”
What can change?
Perhaps the most important we can ask is: “What can change?”
- Cancer research organizations can form data sharing coalitions to share data around specific cancers of interest in order to accelerate research. For example, several NCI Comprehensive Cancer Centers could self-organize and share data to achieve critical mass to study cancers of interest that individually would be harder to study with just their own patients. Although there are already several such such projects like this, in practice, just a small fraction of the potential data is being shared in this way.
- Research projects can be collaborative with patients and data sharing technologies, such as blue-button, can be used so that patients themselves can directly their share their data. This is sometimes called patient-partnered research and for new research projects this is by far the preferred approach. For this to work, healthcare providers must make it easier for patients to share their data using blue-button and other data sharing technologies.
- We can improve the data ecosystems that link together local cancer information collected by different projects and support federated learning. This way cancer data that cannot be easily shared with others can stay within the security and compliance boundaries of the organization that provides healthcare services, but be shared effectively with the broader research community to accelerate research. This might be a good project for the proposed ARPA-Health (ARPA-H).
- We can reduce the liability and fines associated with the inadvertent disclosure of health information and create insurance pools to pay the fines in order to protect research organizations that use best practices to protect data, but health information is still exposed.
I’ll be returning to these four topics from time to time in future posts.
Disclaimers
I’m actively involved with both the BloodPAC Consortium and the NCI Genomic Data Commons.
References
[1] National Cancer Institute, Cancer Moonshot, retrieved from: https://www.cancer.gov/research/key-initiatives/moonshot-cancer-initiative on June 1, 2021.
[2] The White House, President Barack Obama, Join the Vice President’s Cancer Moonshot, retrieved from https://obamawhitehouse.archives.gov/cancermoonshot on June 1, 2021.
[3] NCI Cancer Moonshot Public Access and Data Sharing Policy, retrieved from https://www.cancer.gov/research/key-initiatives/moonshot-cancer-initiative/funding/public-access-policy on June 1, 2021.
[4] Samuel L. Volchenboum, Suzanne M. Cox, Allison Heath, Adam Resnick, Susan L. Cohn, and Robert Grossman, Data Commons to Support Pediatric Cancer Research, American Society of Clinical Oncology Educational Book 2017:37, 746-752
[5] Robert L. Grossman, Jonathan R. Dry, Sean E. Hanlon, Donald J. Johann, Anand Kolatkar, Jerry SH Lee, Christopher Meyer, Lea Salvatore, Walt Wells, and Lauren Leiman. “BloodPAC Data Commons for liquid biopsy data.” JCO Clinical Cancer Informatics 5 (2021): 479-486.
[6] Frisby, Tammy M., and Jorge L. Contreras. “The National Cancer Institute Cancer Moonshot Public Access and Data Sharing Policy—Initial assessment and implications.” Data & Policy 2 (2020).
[7] Luc Rocher, Julien M. Hendrickx, and Yves-Alexandre De Montjoye. “Estimating the success of re-identifications in incomplete datasets using generative models.” Nature communications 10, no. 1 (2019): 1-9.