City of Hope’s Harmonized, Findable, Accessible Precision Medicine Data Platform
By Allison Proffitt
September 14, 2023 | INNOVATIVE PRACTICES AWARDS—When City of Hope wanted to better use its primary and secondary clinical research data and harmonize files on more than 700,000 patients so they were findable and accessible, the hospital built a highly scalable, compliant cloud-based platform. The goal was to empower investigators across the City of Hope network of more than 35 hospitals and treatment centers with access to multi-omic, precision medicine data. SciBite, one of City of Hope’s partners in this effort, nominated the platform for a 2023 Bio-IT World Innovative Practices Award. Bio-IT World’s panel of peer judges concurred.
City of Hope has captured a wealth of data as part of their precision medicine program comprised of patient data captured during diagnosis, comprehensive molecular characterization of the disease and ongoing treatment, data from disease-specific research registries, and other sources. City of Hope wanted to aggregate and harmonize those data and built POSEIDON—Precision Oncology Software Environment Interoperable Data Ontologies Network—to serve as the conduit between the dataset and its varied users.
“POSEIDON is an oncology learning platform that supports our efforts in research as well as clinical decision support, genomic data analysis, data visualization,” explained Samir Courdy, Senior Vice President, City of Hope, in his award address at the Bio-IT World Conference & Expo.
He described the platform as having two components: one captures retrospective data for training AI and machine learning models. “One part of this oncology learning platform is to basically do the data engineering, data cleansing, and getting that data ready for deriving insights from,” he said. The second component of POSEIDON is real time, he added, “where we are doing predictive models to intervene at the right time to predict adverse events [and] surgical complications.”
POSEIDON’s data lake is built on the DNAnexus technology stack and Amazon Web Services with custom features and functionality created by the City of Hope Research Informatics group. “We knew all along we needed an ontology, interoperability layer on top of the data lake. That’s where SciBite came in,” Courdy explained. The ontology layer supports data harmonization. Data standards are managed with CENtree, SciBite’s ontology management platform, and normalization is provided by SciBite’s named entity recognition engine, TERMite.
“You hear an awful lot here about FAIR data and different applications and systems that are providing FAIR environments,” said Neal Dunkinson, VP Solutions & Professional Services at SciBite, in the joint presentation. “At SciBite, we start with the data; we have a data-centric view. We believe in generating better-quality and better-described data that then will perform and deliver more value to our customers.”
SciBite has an ontology-management focus, Dunkinson said. “You’ve got retrospective data that exists. Often it’s static datasets, but it’s unusable if it’s not described or articulated or annotated properly.” SciBite helps transform legacy EHR data, clinical trials data, and data from external sources like publications and databases for utility now and in the future, he explained.
The award entry outlined how administrators use SciBite’s CENtree to maintain a single source of truth between data standards (including various data domains, like medications). TERMite operationalizes these standards by normalizing the various data sources to those vocabularies regulated within CENtree.
On top of this data harmonization, a multi-step process was created to capture and structure multiple data types including imaging metadata into the POSEIDON Common Data Model (PCDM). Natural language processing (NLP) tools (Courdy mentioned tools from Linguamatics specifically) are deployed to automate and structure valuable data elements from unstructured documents including pathology reports and clinical notes. NLP augmented software tools were developed to assist manual data abstractors to capture more complex terms and disease specific data elements which can include disease progression, progression free survival, and other outcomes, the authors write in their entry.
The de-identified and harmonized clinical and multi-omic data can be analyzed and visualized by researchers, supporting cohort discovery and exploration as well as preliminary feasibility testing to derive patient specific insights from real world data (RWD) and real-word evidence (RWE).
POSEIDON enables investigators to access and visualize data from clinical and multi-omics sources and provides an engine that can be used for cohort discovery and exploration, preliminary feasibility testing as well as deriving patient specific insights based on RWD and RWE and delivering real world insights (RWI). Patients are consented through an IRB approved protocol with active, opt-in participation. Once an IRB is approved, access to identifiable patient data can be granted for further studies, Courdy explained.
Since POSEIDON launched, the platform has helped place patients into clinical trials, supported researchers and clinicians in the generation of patient specific treatment plans, and underpinned ongoing translational research projects at City of Hope. Courdy also noted that City of Hope plans to offer pharma and other users licensed access to the platform to explore deidentified patient data.
Among the types of analyses users can perform on the platform include RNAseq workflow pipeline, various ML and natural language processing workflows, patient stratification, clinical trial recruitment, genotype and phenotype cohort exploration, and more. “This demonstrates the capabilities that are available on the platform. If you have the right talent and the right skillset within your lab, you can bring that to bear,” he said.