• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
Analytic Strategy Partners

Analytic Strategy Partners

Improve your analytic operations and refine your analytic strategy

  • Home
  • Blog
  • Books
  • About
  • Services
  • Contact Us

Crossing the Data Chasm for Cancer Research

June 15, 2021 by Robert Grossman

The Cancer Moonshot Program was announced in the State of the Union address on January 12, 2016. It’s goal was to accomplish 10 years of research in 5 years and one of its strategies was to use data sharing to help accomplish this.

The Cancer Moonshot Program was announced in the State of the Union address on January 12, 2016. It was funded with the 21st Century Cures Act, which was passed by the U.S. Senate on December 7, 2106 and signed by President Obama on December 13, 2016. The act provided NIH an additional $1.8 billion over seven year in supplemental funding to fund Cancer Moonshot projects and initiatives.  The project was led by President Biden, who was then the Vice President of the US.

Data and data sharing played an important strategy in the cancer moonshot strategy. As we approach the fifth year anniversary of the passing of the 21st Century Cures Act, it might be a good time to look back at components of the underlying strategy from an analytic strategy point of view.

With five years of effort behind us, it should be easier to see what worked well, what worked less well, and how we might fine tune the current and planned activities.

As described on the National Cancer Institute’s Cancer Moonshot website [1], “The Cancer Moonshot has three ambitious goals: to accelerate scientific discovery in cancer, foster greater collaboration, and improve the sharing of data.”

As described in the White House’s announcement [2]: “Here’s the ultimate goal: To make a decade’s worth of advances in cancer prevention, diagnosis, and treatment, in five years.” The goal was to get done in five years what would normally take ten years. A critical element of the strategy was to share cancer related data in an effort to accelerate research.

In analytics and AI, the biggest challenge that you face is usually not building a model, but instead collecting or acquiring the data that you need to build the model. I call this crossing the data chasm, and one of the levers used was to force those funded by Cancer Moonshot projects to share data. You can learn more about crossing the data chasm and its role in an analytic strategy in my Primer on Analytic Strategy.

Today as we plan the next five years, it is important to look at how we can leverage data sharing to continue the goal of accomplishing in five years what would normally take ten years.

The good news is that more and more data is being shared when it is funded with federal dollars. As an example, the NCI has developed a data sharing policy for all Cancer Moonshot funded projects [3]. The not so good news is that often data sharing is not required when research is funded by private foundations and private philanthropy, and data is rarely shared when research is funded by industry. Given the size and complexity of the cancer industry, the amount of money at stake, the competitiveness of scientists and cancer centers, the challenges with deidentifying research data, and the legal risk of sharing human subject data that is collected by healthcare providers, most cancer data is not shared, and there is much less progress as a result.

An exception is pediatric cancer. Fortunately, pediatric cancer is relatively rare compared to adult cancers. With fewer cases, there is usually not enough pediatric cancer data at any one research center to provide the number of cases required for research projects, unless data is shared [4].

Some success stories

One of the success stories of the Cancer Moonshot is the BloodPAC Consortium, whose mission is accelerate the development, validation and accessibility of liquid biopsy assays to improve the outcomes of patients with cancer. The BloodPAC Consortium is now an independent 501(c)(3) organization with over 50 consortium members organized into a number of different working groups and operates the BloodPAC Data Commons to support data sharing among its members [5].

Other cancer data sharing success stories include: the NCI Genomic Data Commons, ACR’s Project GENIE and ASCO’s CancerLinQ. Each of these three projects provide large scale data sharing that has accelerated cancer research and resulted in many significant publications.

Why is data sharing so hard?

The first question to ask is: “Why is data sharing so hard?” There are number of reasons, but probably the most important are the following:

  1. It’s difficult and time consuming for researchers to prepare data and to submit data for data sharing. Often researchers have moved on to new experiments, new analyses, and writing new papers, and sharing data from their last project always remains on their “B-list” of items to do.
  2. Investigators must balance the potential public good and benefit to patients that results when data is shared compared to the loss of momentum in their career when others publish results from their data that they could have published with more time before they shared data.
  3. Data is often not collected with consents that make it easy to share.
  4. Data sharing is often not required, except with federally funded research; and, for federally funded research, data sharing is often not enforced [6].
  5. There is a risk when data is shared in case the shared data contains sensitive data or third party data sources can be used to re-identify some subjects in large shared datasets [7].
  6. Often, the computing infrastructure required for data sharing infrastructure is not funded. So data sharing is “unfunded mandate.”

What can change?

Perhaps the most important we can ask is: “What can change?”

  1. Cancer research organizations can form data sharing coalitions to share data around specific cancers of interest in order to accelerate research. For example, several NCI Comprehensive Cancer Centers could self-organize and share data to achieve critical mass to study cancers of interest that individually would be harder to study with just their own patients. Although there are already several such such projects like this, in practice, just a small fraction of the potential data is being shared in this way.
  2. Research projects can be collaborative with patients and data sharing technologies, such as blue-button, can be used so that patients themselves can directly their share their data. This is sometimes called patient-partnered research and for new research projects this is by far the preferred approach. For this to work, healthcare providers must make it easier for patients to share their data using blue-button and other data sharing technologies.
  3. We can improve the data ecosystems that link together local cancer information collected by different projects and support federated learning. This way cancer data that cannot be easily shared with others can stay within the security and compliance boundaries of the organization that provides healthcare services, but be shared effectively with the broader research community to accelerate research. This might be a good project for the proposed ARPA-Health (ARPA-H).
  4. We can reduce the liability and fines associated with the inadvertent disclosure of health information and create insurance pools to pay the fines in order to protect research organizations that use best practices to protect data, but health information is still exposed.

I’ll be returning to these four topics from time to time in future posts.

Disclaimers

I’m actively involved with both the BloodPAC Consortium and the NCI Genomic Data Commons.

References

[1] National Cancer Institute, Cancer Moonshot, retrieved from: https://www.cancer.gov/research/key-initiatives/moonshot-cancer-initiative on June 1, 2021.

[2] The White House, President Barack Obama, Join the Vice President’s Cancer Moonshot, retrieved from https://obamawhitehouse.archives.gov/cancermoonshot on June 1, 2021.

[3] NCI Cancer Moonshot Public Access and Data Sharing Policy, retrieved from https://www.cancer.gov/research/key-initiatives/moonshot-cancer-initiative/funding/public-access-policy on June 1, 2021.

[4] Samuel L. Volchenboum, Suzanne M. Cox, Allison Heath, Adam Resnick, Susan L. Cohn, and Robert Grossman, Data Commons to Support Pediatric Cancer Research, American Society of Clinical Oncology Educational Book 2017:37, 746-752

[5] Robert L. Grossman, Jonathan R. Dry, Sean E. Hanlon, Donald J. Johann, Anand Kolatkar, Jerry SH Lee, Christopher Meyer, Lea Salvatore, Walt Wells, and Lauren Leiman. “BloodPAC Data Commons for liquid biopsy data.” JCO Clinical Cancer Informatics 5 (2021): 479-486.

[6] Frisby, Tammy M., and Jorge L. Contreras. “The National Cancer Institute Cancer Moonshot Public Access and Data Sharing Policy—Initial assessment and implications.” Data & Policy 2 (2020).

[7] Luc Rocher, Julien M. Hendrickx, and Yves-Alexandre De Montjoye. “Estimating the success of re-identifications in incomplete datasets using generative models.” Nature communications 10, no. 1 (2019): 1-9.

Filed Under: Uncategorized Tagged With: analytic strategy, BloodPAC, Blue Button, Cancer Moonshot, cancer research, data commons, data ecosystems, data sharing, data sharing strategy

Primary Sidebar

Recent Posts

  • Developing an AI Strategy: Four Points of View
  • Ten Books to Motivate and Jump-Start Your AI Strategy
  • A Rubric for Evaluating New Projects that Produce Data
  • How Does No-Code Impact Your Analytic Strategy?
  • The Different Varieties of Advisors & the Difference it Makes

Recent Comments

    Archives

    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • June 2019
    • May 2019
    • September 2018

    Categories

    • Uncategorized

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org

    Copyright © 2025