
In December 2023, the United States Food and Drug Administration (FDA) published guidance entitled “Data Standards for Drug and Biological Product Submissions Containing Real-World Data
Guidance for Industry”. The guidance provides recommendations to sponsors for complying with section 745A(a) of the FD&C Act (21 U.S.C. 379k-1(a)) when submitting RWD as study data in
applicable drug submissions. This document outlines Guardian Research Network’s (GRN) approaches to implementing the recommended non-binding standards set forth by the guidance.
Section III.A outlines several challenges of using Real-World Data (RWD) sources for inclusion in drug submissions. GRN’s approach to each of these challenges is described below:
Challenge | GRN Approach |
---|---|
The variety of RWD sources and their inconsistent formats (e.g., EHR, registry)
|
GRN addresses the variability in Real-World Data (RWD) sources by employing a comprehensive data harmonization strategy. This involves standardizing data from multiple Electronic Health Record (EHR) systems and registries into a unified data model, ensuring consistency across different formats. To achieve this, GRN integrates standardized ontologies and widely recognized coding systems such as Intelligent Medical Objects (IMO), International Classification of Diseases (ICD), RxNorm, and Logical Observation Identifiers Names and Codes (LOINC). These standardized frameworks allow for accurate categorization and alignment of data elements across sources. Furthermore, GRN implements rigorous source validation processes to ensure that the harmonized data is accurate, reliable, and ready for comprehensive analysis, supporting high-quality research outcomes. |
The potential use of more than one type of RWD source in a study (such as combining EHR and claims data) | GRN’s standard data can be tokenized to allow for the combination of EHR and claims data and to allow for deduplication of records across multiple data sources. |
The differences in source data captured regionally and globally using different standards, terminologies, and exchange formats for the representation of the same or similar data elements | GRN’s database primarily consists of EHR data from US-based health systems, which reduces the challenges posed by regional and global differences in standards, terminologies, and exchange formats. Despite this, GRN remains committed to ensuring data consistency and reliability through robust data harmonization efforts. By utilizing standardized ontologies and common coding practices, GRN aligns data elements across sources to create a cohesive and integrated dataset. Additionally, our rigorous source validation processes ensure that the data is accurate and dependable, enabling high-quality research outcomes and facilitating future integration with other datasets, both domestically and internationally. |
Certain information only existing in non-structured documentation (such as text in physician notes) | GRN addresses the challenge of unstructured data, such as text in physician notes, by employing trained clinician curators who specialize in extracting relevant information from these sources. To ensure consistency and accuracy, each Real-World Data (RWD) study is guided by custom curation protocols that outline specific instructions and rules for curating each data element. This structured approach allows GRN to systematically convert unstructured documentation into valuable, standardized data that can be reliably used in research and analysis. |
A wide range of methods and algorithms which could be used to create datasets intended to aggregate data | GRN recognizes the complexity introduced by the wide range of methods and algorithms available for aggregating data into comprehensive datasets. To address this challenge, GRN adopts a standardized yet flexible approach to data aggregation. By leveraging best practices and proven methodologies, GRN ensures that data is consistently processed and integrated, regardless of the specific methods or algorithms employed. If a sponsor intends for GRN to use a specific algorithm to create a dataset, GRN will collaborate closely with the sponsor to identify potential risks associated with the algorithm and develop tailored plans to mitigate these risks. This approach allows for the selection of the most appropriate techniques for each unique dataset while maintaining consistency, reliability, and safety across the aggregated data. Additionally, GRN implements rigorous validation processes to verify the integrity and accuracy of the aggregated datasets, ensuring they meet the highest standards for research and analysis. |
The many aspects of health care data that can affect the overall quality of the data, including business processes and database structure, inconsistent vocabularies and coding systems, and de-identification methodologies used to protect patient data when shared | GRN standardizes its databases to consistent ontologies, ensuring uniformity and accuracy across all data elements. Specifically, GRN standardizes data to widely accepted coding systems such as ICD-10- CM, LOINC, and RxNorm, which helps in maintaining consistency and reliability in the data. Additionally, GRN takes rigorous measures to protect patient privacy by de-identifying datasets before they are submitted to the sponsor. This approach not only enhances the quality and integrity of the data but also ensures that it meets the highest standards for compliance and confidentiality. |
The various levels of access a sponsor has to any data sources used in the study |
Sponsor has only access to the de-identified dataset, data dictionary, and any study-specific documentation which outlines data preparation methodologies.
|
Section III.B indicates that documentation of data curation and transformation processes should be in place. This documentation may include but is not limited to electronic documentation (e.g., audit trails, quality control procedures, etc.) of data additions, deletions, or alterations from the source data system to the final study analytic data set(s). For data curation, GRN utilizes a proprietary data collection system, compliant with FDA 21 CFR Part 11, for gathering unstructured data from patient progress notes, pathology reports, and various Electronic Health Record (EHR) documents. The data collection system maintains an audit trail which includes information on the user(s) that entered data, deletions, and alterations of curated data. Data transformations are done for the purposes of providing a de-identified data set, such as anonymizing dates to maintain patient privacy. Data transformations are documented and auditable.
Section III.C indicates that absent a waiver, sponsors submitting clinical and nonclinical study data (including those derived from RWD sources) in submissions subject to section 745A(a) of the FD&C Act are required to use the formats described in the Study Data Guidance and the supported study data standards listed in the Data Standards Catalog. GRN supports the conformance of RWD to Clinical Data Interchange Standards Consortium (CDISC) standards. Once a sponsor indicates that a dataset will be submitted to the FDA that may require conformance to FDA Data Standards, GRN will either transform the RWD dataset to CDISC standards independently or will assist the sponsor in transforming the data. GRN considers the sponsor responsible for selecting the appropriate FDA Data Standard(s) for the submission. In addition to transforming the data, GRN implements rigorous Quality Control (QC) and validation processes to ensure that the conformed datasets meet the required standards. These QC measures involve systematic checks to verify the accuracy, completeness, and consistency of the data.
Section III.D presents the consideration that terminologies in RWD may have differences in the definition between RWD sources and FDA-supported data standards. This section indicates that the sponsor should select an appropriate mapping approach that best fits the characteristics of the data and the nature of the study. When a sponsor requires a GRN dataset to conform to FDA data standards, the data dictionary for the non-conformed dataset is updated to include columns that describe the mapping. The data dictionary updates include the following details on conformance. A sponsor may request additional details be included based on internal preference or feedback from FDA.
Section III.E outlines challenges of transforming RWD into data consistent with FDA-supported data standards. When conducting transformation of RWD to FDA-supported standards, GRN will first inform the sponsor of the challenge and discuss options for resolution. Once a resolution has been agreed upon by GRN and the sponsor, GRN will document the challenges to data transformation, and provide a justification of the approach. Documentation of transformations will be shared with the sponsor using DefineXML and/or as a narrative description that can be incorporated by the sponsor into their Study Data Reviewer’s Guide.
© 2025 Guardian Research Network- All Rights Reserved| Legal