The Evolution of the CDISC Open Rules Engine (CORE) and the CDISC Open Source Alliance (COSA)

How Open-Source Innovation Is Reshaping Standards Compliance in Clinical Research

Introduction

For over two decades, the clinical research industry has relied on CDISC standards to bring consistency, quality, and regulatory compliance to clinical trial data. However, a persistent challenge has remained: the interpretation and enforcement of conformance rules has historically been fragmented, with different organizations creating their own executable implementations from human-readable rule specifications. This inconsistency has led to discrepancies at the point of regulatory submission and has complicated the data exchange process between sponsors, CROs, and regulatory agencies.

In response to this longstanding industry need, CDISC launched two interconnected initiatives that are now fundamentally reshaping how the clinical programming community approaches standards compliance: the CDISC Open Rules Engine (CORE) and the CDISC Open Source Alliance (COSA). Together, these initiatives represent a paradigm shift from proprietary, siloed validation approaches toward a transparent, community-driven, open-source ecosystem.

This article provides a comprehensive deep dive into the evolution of CORE, the structure and mission of COSA, and the growing portfolio of open-source tools that are emerging from this collaboration. For statistical programmers, these developments carry significant practical implications for day-to-day work in SDTM/ADaM validation, regulatory submission preparation, and end-to-end clinical data pipeline automation.

The CDISC Open Rules Engine (CORE)

The Problem CORE Was Built to Solve

Before CORE, CDISC Foundational Standards teams authored conformance rule specifications in human-readable form. Implementers across the industry — CROs, pharmaceutical companies, software vendors, and regulatory agencies — were each responsible for transforming these specifications into their own executable rules. The inherent ambiguity in this translation process meant that different organizations often interpreted the same rule differently, producing conflicting validation results against the same datasets.

This created a significant operational burden. Sponsors and CROs spent considerable time reconciling validation discrepancies, and the lack of a single, authoritative executable rule set made it difficult to establish a consistent threshold for standards conformance across the industry. The problem was particularly acute during regulatory submissions to FDA, EMA, and PMDA, where inconsistent rule interpretation could trigger review questions or delays.

CORE was conceived to address this gap head-on by delivering two key components: a governed set of unambiguous, machine-executable Conformance Rules for each CDISC Foundational Standard, and an open-source reference implementation of a rules execution engine that would serve as the authoritative benchmark.

CORE Architecture and Components

The CORE ecosystem comprises two primary software components and a critical metadata layer:

The Conformance Rules Engine is the execution engine that runs machine-executable rules against clinical datasets and produces validation reports. Written in Python, it uses the Venmo Business Rule Engine as its foundation and provides a command-line interface (CLI) that operates on Windows, macOS, and Linux. The Engine retrieves conformance rules from the CDISC Library via its API and supports multiple input formats including SAS v5 XPT, Dataset-JSON, and CSV. As of early 2025, it is also available as a Python package on PyPI (cdisc-rules-engine), making it straightforward to integrate into custom Python-based pipelines.

The Conformance Rules Editor is a web-based authoring tool built in TypeScript on the VS Code framework. It provides an interactive development environment for creating, testing, and publishing conformance rules in YAML format. Features include code completion, real-time syntax checking against the Conformance Rule Schema, and integrated testing with sample datasets. The Rule Editor is deployed in the CDISC Azure platform for volunteer rule authors and is also available as open source on GitHub.

The Conformance Rules themselves are metadata stored in the CDISC Library alongside other standards metadata. They exist in two expressions: the human-readable specification and the machine-executable YAML format. Once authored and tested, they are published in the CDISC Library and accessible via the Library API for any rule engine to consume. This design ensures that CORE rules are not locked into any single execution platform.

Timeline of Key Milestones

Period Milestone
Late 2021	CORE Program announced at CDISC EU Interchange; Call for Volunteers webinar launched
Early 2022	Microsoft engaged as key collaborator; Sprint 0 of agile-scrum Engine development begins; YAML schema established for executable rules
April 2022	MVP Engine deployed to CDISC Azure cloud; initial SDTMIG 3.4 conformance rules published in CDISC Library
June 2022	Azure Marketplace deployment; Volunteer Onboarding Training webinar held
Summer 2022	Transition from traditional CDISC project to open-source: Engine released on GitHub under MIT license and registered with COSA; CORE Roadmap Board and Technical Committee established
2023	Iterative CLI releases; Formedix releases first free desktop deployment with custom UI; rule authoring continues for SDTM, SEND, and Define cross-checks
2024	Versions 0.8.0 and 0.8.1 released; expanding rule coverage across foundational standards; growing vendor adoption
2025	Rapid release cadence (v0.9.0 through v0.14.1); USDM JSON Schema validation support added; TIG support; Dask processing for large datasets; available as PyPI package for direct Python integration; SAS integration via PROC FCMP demonstrated

Running CORE: Practical Considerations for Statistical Programmers

For the statistical programming community, CORE offers multiple deployment paths. The simplest approach is downloading the pre-compiled CLI from the GitHub releases page for the target operating system. Users need a CDISC Library API key (available free of charge through a Library account) to retrieve rules and metadata, which the engine caches locally as Python pickle files. Once the cache is populated, validation can run against local datasets with a single command specifying the standard and version.

For organizations that require deeper integration, the engine is available as a Python package on PyPI. This allows programmers to import the rules engine library directly into custom Python environments and run rules against data programmatically, including against pandas DataFrames, without requiring data to be in XPT format. This flexibility opens up possibilities for embedding CORE validation directly into data processing pipelines.

SAS programmers are not left behind. As demonstrated at PharmaSUG 2025, the CORE engine can be invoked from within BASE SAS using PROC FCMP Python objects, which embed and execute Python functions within SAS programs. While this approach requires some infrastructure setup (a compatible Python installation accessible to SAS, plus environment variable configuration), it offers a path for SAS-centric organizations to adopt CORE without abandoning their existing toolsets.

The engine supports parallel processing through a configurable pool size parameter, and for very large datasets, it can leverage Dask for distributed computation. The DATASET_SIZE_THRESHOLD environment variable controls when Dask kicks in, defaulting to one-quarter of available RAM. Validation output is available in both JSON and Excel formats, with the Excel report providing a familiar interface for review and remediation workflows.

Custom Rules and Extensibility

One of CORE's most powerful capabilities is support for custom, sponsor-defined rules. While the CDISC-governed rule set covers standard conformance checks, organizations routinely need additional validation logic for internal data cleaning, vendor-specific data transfers, or study-specific requirements that go beyond the published standards.

Using the Rule Editor, programmers can author custom rules in the same YAML format used by CDISC-governed rules, test them against sample data, and maintain a local rule library. The CORE engine is designed to execute both CDISC Library rules and locally maintained custom rules, providing a unified validation framework. This extensibility has been demonstrated for use cases including validation of non-CDISC data transfers from external vendors and creation of data listing-style outputs.

The Certification Program and Vendor Ecosystem

Because CORE is a reference implementation, its interpretation and execution of each rule has been confirmed to match the intent established by the CDISC standards development teams. CDISC is developing a certification program that will allow proprietary rule engines to verify they produce the same results as CORE for the governed rule set. This approach opens the conformance rule engine market to alternative solutions while maintaining a common standard for correctness. Several vendors, including Formedix and SGS, have already developed desktop applications and workflow integrations around the CORE engine.

The CDISC Open Source Alliance (COSA)

Mission and Governance

The CDISC Open Source Alliance (COSA) was established by CDISC to drive innovative approaches to standards-based automation through open-source software. COSA supports, promotes, and in some cases sponsors open-source projects that create tools for implementing or developing CDISC standards. It serves as both a quality seal and a community hub, providing visibility, structured governance, and cross-project collaboration opportunities for open-source developers in the clinical research space.

COSA is directed by a Governance Board that evaluates projects for inclusion in the COSA Repository Directory, sets project inclusion criteria, and determines what committees are needed to lead COSA activities. The Governance Board ensures that listed projects meet defined standards for openness, documentation, and relevance to the CDISC ecosystem.

COSA activities extend well beyond directory management. The alliance organizes regular communications, quarterly spotlight webinars, conference sessions, and hackathons — all designed to foster community engagement and accelerate tool development. COSA also actively seeks collaboration with parallel open-source initiatives in the pharmaceutical space, including pharmaverse, openpharma, and the PHUSE Data Visualization and Open Source Technology working group.

The COSA Repository Directory

The COSA Repository Directory is the central registry of officially recognized open-source projects focused on implementing or developing CDISC standards. Projects must meet specific inclusion criteria to be listed, and the directory serves as a trusted resource for organizations evaluating open-source tools for their clinical data operations. The directory is accessible online at cosa.cdisc.org and also maintained as a GitHub repository under the cdisc-org organization.

The breadth of projects in the COSA directory reflects the full lifecycle of clinical data management, from study design and data collection through dataset creation, validation, analysis, and regulatory submission. Below is an overview of the major COSA-registered projects and their relevance to statistical programmers.

Key COSA Projects and Tools

Project Language Description & Relevance
CORE Engine	Python	The reference implementation for executing CDISC Conformance Rules. Validates SDTM, ADaM, SEND, and Define-XML against governed rule sets. The foundational COSA project.
Conformance Rules Editor	TypeScript	Web-based IDE for authoring, testing, and publishing conformance rules in YAML format. Used by CDISC volunteers and available for custom rule development.
{admiral}	R	Modularized toolbox for building ADaM datasets collaboratively in R. Started by Roche and GSK, now with broad industry contribution. A cornerstone of the pharmaverse ecosystem.
{sdtm.oak}	R	EDC-agnostic, modular framework for SDTM programming in R. Developed at Roche/Genentech, now used across all their studies. Enables standards-based SDTM automation.
Dataset-JSON Tools	Multiple	Collection of tools for reading and writing CDISC Dataset-JSON, the modern data exchange format designed for regulatory submissions and API-based exchange. Includes R packages and viewer tools.
TFL Designer	Web/SaaS	Ingests CDISC Library content to design study-specific TFL shells while producing machine-readable metadata aligned with CDISC Analysis Results Standards (ARS). Supports metadata-driven automation of ADaM and TFL generation.
OpenStudyBuilder	Multiple	Open-source metadata and study definition repository with a graph database, UI, and APIs. Developed by Novo Nordisk. Drives end-to-end consistency from protocol through submission.
Visual Define-XML Editor	Desktop	WYSIWYG editor for creating and reviewing Define-XML documents. Supports development of dataset specifications as Define-XML from the start of the dataset development process.
Smart Submission Dataset Viewer	Java	Viewer for inspecting SDTM, SEND, and ADaM submission files in Dataset-XML format. Can generate Dataset-XML from SAS XPT on the fly.
Tplyr	R	A grammar of data format and summary for building clinical safety summary tables. Provides a layered approach to constructing outputs ready for clinical reports.
R4DSXML / datasetjson	R	R packages for importing CDISC Dataset-XML, Define-XML, and Dataset-JSON as R data frames. Essential for R-based workflows dealing with CDISC data exchange formats.

COSA Hackathons: Accelerating Innovation

A distinctive feature of the COSA model is its use of hackathons to rapidly advance tool development and community engagement. The Dataset-JSON Hackathon series is the most prominent example. The first Dataset-JSON Hackathon in 2022 produced 21 open-source solutions for working with the new data exchange format. A second hackathon in 2023 focused on creating a draft REST API specification for Dataset-JSON, while a third hackathon addressed Dataset-JSON v1.1 viewer development.

These hackathons serve multiple purposes: they generate practical tools that immediately benefit the community, they stress-test emerging standards against real implementation scenarios, they build a contributor community around new standards, and they surface edge cases and improvement opportunities that feed back into the standards development process. For statistical programmers, hackathon outputs often represent the fastest path to working implementations of new CDISC standards in their preferred languages and environments.

The Quarterly Spotlight Series

COSA maintains community momentum through its quarterly Spotlight webinar series, where open-source developers present their tools, share implementation experiences, and demonstrate new capabilities. Recent spotlights have featured presentations on custom rule authoring with the CORE engine, SDTM automation using {sdtm.oak}, Dataset-JSON hackathon results, TFL Designer updates aligned with CDISC Analysis Results Standards, and OpenStudyBuilder integration with EDC systems. These webinars provide statistical programmers with direct access to the tool developers and practical demonstrations of open-source solutions they can adopt immediately.

Impact on the Statistical Programming Community

Standardized Validation Across the Industry

The most immediate impact of CORE is the establishment of a single, authoritative source of truth for conformance rule execution. When a sponsor, CRO, and regulatory agency all run the same governed rules through the same (or certified-equivalent) engine, the results are consistent and reproducible. This eliminates the reconciliation burden that has historically consumed significant programming hours during submission preparation.

For statistical programmers, this means validation results from CORE carry a level of authority that proprietary tool outputs have not previously had. The rules are developed and tested by the same CDISC standards teams that author the foundational standards, closing the gap between rule intent and rule execution.

Lowered Barriers to Quality Tools

The open-source nature of COSA projects means that even small organizations, academic research groups, and individual programmers can access enterprise-grade validation and data management tools at no cost. Prior to COSA, comprehensive CDISC validation software was typically available only through commercial vendors at significant license costs. The availability of CORE as a free, open-source tool democratizes access to standards-compliant validation.

Similarly, tools like {admiral} and {sdtm.oak} provide production-quality frameworks for ADaM and SDTM creation in R that would previously have required either significant internal development or commercial licenses. This levels the playing field and accelerates the adoption of R as a viable language for regulatory-grade clinical programming.

Integration into Modern Development Workflows

CORE's design as a CLI tool with PyPI package availability makes it naturally suited for integration into continuous integration and continuous deployment (CI/CD) pipelines. Organizations can configure automated nightly validation runs against their study datasets, catching conformance issues early and throughout the study lifecycle rather than only during submission preparation. This shift-left approach to quality is a significant evolution from traditional batch validation workflows.

The combination of CORE with other COSA tools creates the foundation for end-to-end metadata-driven pipelines: OpenStudyBuilder for study definition, {sdtm.oak} for SDTM creation, {admiral} for ADaM derivation, TFL Designer for analysis metadata, and CORE for continuous validation throughout. While no single organization has fully realized this vision yet, the building blocks are now available as open-source components that can be assembled and customized.

Looking Ahead

The CORE and COSA ecosystem continues to evolve rapidly. The CORE engine release cadence has accelerated significantly in 2025, with version numbers climbing from 0.9.0 in December 2024 to 0.14.1 by December 2025, reflecting active development and expanding standards coverage. Key areas of near-term development include expansion of conformance rule coverage to additional foundational standards (including ADaM and SEND), the CORE certification program for proprietary engines, USDM validation support for the Unified Study Definitions Model, and the development of AI-assisted rule authoring tools that allow non-programmers to create custom rules using natural language.

For the COSA ecosystem broadly, the integration between projects is deepening. The interplay between Dataset-JSON tools, OpenStudyBuilder, and standards like Biomedical Concepts and the Analysis Results Standard is creating an increasingly connected web of open-source capabilities that support the entire clinical data lifecycle.

Conclusion

The CDISC Open Rules Engine and the CDISC Open Source Alliance represent a fundamental shift in how the clinical research industry approaches standards compliance and tooling. By making conformance rule execution transparent, consistent, and freely available, CORE addresses a decades-old source of friction in the regulatory submission process. By providing a governed community and directory for open-source clinical data tools, COSA accelerates the availability of production-quality solutions that any organization can adopt.

For statistical programmers, these developments are not abstract infrastructure changes — they are practical tools that can be downloaded, installed, and integrated into daily workflows today. Whether you are running CORE validation from the command line, embedding it into a Python pipeline, calling it from SAS via PROC FCMP, or building ADaM datasets with {admiral}, the COSA ecosystem is delivering real, usable capabilities that reduce manual effort, improve data quality, and strengthen regulatory submissions.

The message for the statistical programming community is clear: open-source is no longer a fringe consideration for clinical data operations. It is rapidly becoming foundational infrastructure, and CORE and COSA are leading that transformation.

Key Resources

CDISC CORE: https://www.cdisc.org/core
CORE Engine GitHub: https://github.com/cdisc-org/cdisc-rules-engine
COSA Directory: https://cosa.cdisc.org
CDISC Library: https://www.cdisc.org/cdisc-library
Conformance Rules Editor: https://github.com/cdisc-org/conformance-rules-editor
PyPI Package: https://pypi.org/project/cdisc-rules-engine/