WHITEPAPER
ARS-CORE Workbench
Structured clinical analysis intent for AI-assisted ADaM derivations, TLF code generation, CORE validation, and review packaging
| Central thesis | AI-assisted clinical programming becomes more reliable when the model receives structured analysis intent instead of being asked to infer datasets, populations, outputs, and validation expectations from prose. |
The ARS-CORE Workbench is a browser-based pipeline for moving from clinical reporting intent to reviewable programming artifacts. It captures TLF shell design, ADaM metadata, ARS-aligned analysis concepts, AI compiler controls, validation findings, and audit package context as structured artifacts that can be inspected before and after generation.
The product is not positioned as an autonomous production system. Its purpose is to make the handoff into AI explicit, reproducible, and reviewable, so statisticians and clinical programmers can see exactly what the model was asked to build and where validation findings should be routed.
Clinical reporting is full of information that looks simple in prose but becomes brittle in code: population definitions, denominator rules, treatment-level values, parameter-specific derivations, AE row logic, time-to-event endpoints, and output-level footnotes. A language model can write plausible programs from prose, but plausible is not enough when a programmer needs traceability, repeatability, and a clear review trail.
The common failure mode is not that the model cannot write SAS or R. The failure mode is that the model is forced to guess the contract. If the treatment column label says Placebo but the data level value is different, if an AE summary row depends on a where clause that is only implied, or if a display ID is not tied to an analysis object, the generated code may look fluent while encoding the wrong assumption.
ARS-CORE addresses that gap by making intent first-class. It separates display intent from data intent, merges them into a checked bundle, and asks the AI compiler to operate only inside that bundle.
The workbench is organized as a six-step pipeline. Each step produces or consumes a structured artifact, and the artifacts are preserved in browser draft state and, when a work package is active, in package context. The pipeline supports a full ARS path and an ADaM-derivation-only path for teams that want to focus the AI context on dataset programming.
Figure 1. The six-stage ARS-CORE workflow moves structured design artifacts forward and routes review findings back to the source metadata.
| Step | Workbench Area | Primary Role |
| 01 | TLF Mock-Shell Designer | Captures display identity, treatment columns, row hierarchy, filters, denominator rules, footnotes, and output format. |
| 02 | ADaM Spec Designer | Defines datasets, variables, VLM, codelists, derivations, analysis sets, methods, analyses, outputs, and shell links. |
| 03 | Handoff Bundle | Checks both JSON inputs, validates shell variables and output mapping, filters scope, and creates the Stage 1 bundle. |
| 04 | AI Code Generator | Builds the compiler request and generates ADaM, analysis, and TLF renderer programs as review-gated artifacts. |
| 05 | CORE Validation Workbench | Runs browser subset checks or localhost CORE validation, then normalizes findings for triage. |
| 06 | Review Package + Audit Trail | Packages specs, compiler context, generated code, validation findings, provenance, and reviewer notes. |
The central architecture divides the workflow into two authoring contracts. The TLF shell owns display intent. The ADaM specification owns data intent. The Stage 1 bundle is the point where those contracts meet and where the system can detect missing variables, incomplete output mapping, and scope problems before AI generation starts.
Figure 2. The structured handoff architecture narrows model discretion by making display intent, data intent, compiler scope, validation evidence, and review context explicit.
The TLF Designer creates one or more DisplaySpec JSON objects. It supports safety-oriented templates such as AE SOC/PT and AE summary tables, as well as oncology efficacy templates for response, ORR, PFS, DOR, and OS. Users configure display identity, treatment columns, row sections, population filters, denominator overrides, formatting, footnotes, and abbreviations.
This matters because a shell is more than presentation. It is a compact statement of the analysis the table expects: what rows should exist, which variables those rows depend on, which population defines the denominator, and how the output should be named and reviewed.
The ADaM Spec Designer defines the dataset contract that the compiler should use. It includes dataset-level metadata, variable metadata, BDS parameters, value-level metadata, codelists, derivation hints, SDTM source context, and ADaM-to-ADaM dependencies. It also includes ARS-aligned bridge objects such as analysis sets, data subsets, grouping factors, methods, analyses, and outputs.
The workbench uses CDISC ARS as a conceptual vocabulary rather than treating schema compliance as the immediate product goal. The quality bar is pragmatic: can the Stage 1 bundle give an AI compiler enough precise information to generate useful, reviewable SAS or R code without follow-up questions?
The handoff layer is deliberately conservative. It parses shell JSON and ADaM JSON, detects selected shells and datasets, verifies that referenced shell variables exist in the selected ADaM metadata, warns when shell display IDs are not mapped to analyses or outputs, preserves SDTM source hints, and emits a compiler-ready Stage 1 bundle.
| Boundary rule | The compiler request instructs the AI to use only the validated handoff bundle as its source of truth. It explicitly avoids inventing SDTM structure or generating SDTM mapping programs. |
The AI Code Generator then creates an artifact plan and returns an ARSCoreGeneratedCodeBundle with generated files, review notes, warnings, dependencies, usage metadata, and a validation checklist. The generator can operate in full ARS mode or in a lean ADaM derivation mode that excludes analyses, methods, grouping factors, TLF shells, and outputs when they are not needed.
Validation is not treated as a final checkbox. It is a feedback loop. The CORE Validation Workbench supports a browser-only subset path for lightweight checks and a localhost Python bridge for the CDISC CORE rules engine. The hosted app does not need to receive clinical datasets for the local path; files are posted to 127.0.0.1, processed by the bridge, and normalized into findings for review.
The package-audit step assembles a review package containing the available design, compiler, validation, and provenance artifacts. Dataset outputs and generated clinical data are intentionally outside the package boundary. That separation helps make the review package a traceability aid, not a substitute for controlled execution, statistical review, or regulated validation.
| Risk | Workbench Control | Reviewer Payoff |
| Model invents missing context | Compiler boundary uses the validated bundle as source of truth. | Reviewers can inspect exactly what the model was allowed to use. |
| Shell and ADaM drift apart | Handoff checks referenced shell variables and output mapping. | Gaps are caught before code generation instead of after output review. |
| Validation findings become disconnected | CORE findings are normalized into the same review package context. | Teams can route issues back to metadata, derivations, or generated code. |
| Preview output is mistaken for production | Preview warnings and review-gated artifacts remain visible. | Human approval stays explicit. |
| Principle | Meaning in ARS-CORE |
| Structured first | Output and data intent become JSON before a model is asked to generate code. |
| Standards-aware, not schema-bound | ARS terms provide the analysis vocabulary; preview quality is judged by AI-readable handoff clarity. |
| Review-gated generation | Generated code remains a proposed artifact with warnings, dependencies, and review notes. |
| Local data boundary | The local CORE bridge validates through localhost without sending clinical datasets to the hosted app. |
| Traceability over magic | Compiler requests and review packages stay visible for human audit. |
The most important next improvements are those that make the compiler bundle more complete and less ambiguous: controlled vocabulary for analysis reasons and purposes, richer operation definitions, structured where conditions, explicit dependencies, more stable bundle extraction paths, and usage tracking that can attribute AI generation calls to signed-in users. These investments improve generation quality without over-rotating toward formal ARS schema validation before the AI handoff problem is solved.
ARS-CORE is a practical answer to a near-term clinical programming problem: AI can help write code, but only when the workbench around it makes intent explicit, inspectable, and reviewable. By combining TLF shell design, ADaM metadata, ARS-aligned analysis concepts, compiler request controls, CORE validation, and review packaging, the workbench turns AI generation from a prose prompt into a governed workflow.
The result is not autonomy for its own sake. It is a better interface between human statistical judgment and machine-generated programming drafts.
No comments yet. Be the first!
