WHITEPAPER

ARS-CORE Workbench

Structured clinical analysis intent for AI-assisted ADaM derivations, TLF code generation, CORE validation, and review packaging

Central thesis

AI-assisted clinical programming becomes more reliable when the model receives structured analysis intent instead of being asked to infer datasets, populations, outputs, and validation expectations from prose.

Executive Summary

The ARS-CORE Workbench is a browser-based pipeline for moving from clinical reporting intent to reviewable programming artifacts. It captures TLF shell design, ADaM metadata, ARS-aligned analysis concepts, AI compiler controls, validation findings, and audit package context as structured artifacts that can be inspected before and after generation.

The product is not positioned as an autonomous production system. Its purpose is to make the handoff into AI explicit, reproducible, and reviewable, so statisticians and clinical programmers can see exactly what the model was asked to build and where validation findings should be routed.

The workbench starts with structured output intent: display IDs, titles, treatment columns, row hierarchy, denominator logic, filters, and footnotes.
It pairs that display intent with ADaM metadata: datasets, variables, VLM, codelists, derivation snippets, analysis sets, subsets, groupings, methods, and outputs.
A handoff layer validates the connection between shells and ADaM before generation, including missing variables and output mapping warnings.
The AI Code Generator creates review-gated artifacts for ADaM derivations, analysis code, and TLF rendering, while preserving the compiler request envelope.
CORE validation and the review package close the loop by turning generated outputs, findings, and reviewer notes into traceable evidence.

The Problem: AI Needs Contracts, Not Hints

Clinical reporting is full of information that looks simple in prose but becomes brittle in code: population definitions, denominator rules, treatment-level values, parameter-specific derivations, AE row logic, time-to-event endpoints, and output-level footnotes. A language model can write plausible programs from prose, but plausible is not enough when a programmer needs traceability, repeatability, and a clear review trail.

The common failure mode is not that the model cannot write SAS or R. The failure mode is that the model is forced to guess the contract. If the treatment column label says Placebo but the data level value is different, if an AE summary row depends on a where clause that is only implied, or if a display ID is not tied to an analysis object, the generated code may look fluent while encoding the wrong assumption.

ARS-CORE addresses that gap by making intent first-class. It separates display intent from data intent, merges them into a checked bundle, and asks the AI compiler to operate only inside that bundle.

Workbench Overview

The workbench is organized as a six-step pipeline. Each step produces or consumes a structured artifact, and the artifacts are preserved in browser draft state and, when a work package is active, in package context. The pipeline supports a full ARS path and an ADaM-derivation-only path for teams that want to focus the AI context on dataset programming.

Figure 1. The six-stage ARS-CORE workflow moves structured design artifacts forward and routes review findings back to the source metadata.

How the Pipeline Works

Step	Workbench Area	Primary Role
01	TLF Mock-Shell Designer	Captures display identity, treatment columns, row hierarchy, filters, denominator rules, footnotes, and output format.
02	ADaM Spec Designer	Defines datasets, variables, VLM, codelists, derivations, analysis sets, methods, analyses, outputs, and shell links.
03	Handoff Bundle	Checks both JSON inputs, validates shell variables and output mapping, filters scope, and creates the Stage 1 bundle.
04	AI Code Generator	Builds the compiler request and generates ADaM, analysis, and TLF renderer programs as review-gated artifacts.
05	CORE Validation Workbench	Runs browser subset checks or localhost CORE validation, then normalizes findings for triage.
06	Review Package + Audit Trail	Packages specs, compiler context, generated code, validation findings, provenance, and reviewer notes.

Structured Intent Model

The central architecture divides the workflow into two authoring contracts. The TLF shell owns display intent. The ADaM specification owns data intent. The Stage 1 bundle is the point where those contracts meet and where the system can detect missing variables, incomplete output mapping, and scope problems before AI generation starts.

Figure 2. The structured handoff architecture narrows model discretion by making display intent, data intent, compiler scope, validation evidence, and review context explicit.

Display Intent: TLF Mock-Shell Designer

The TLF Designer creates one or more DisplaySpec JSON objects. It supports safety-oriented templates such as AE SOC/PT and AE summary tables, as well as oncology efficacy templates for response, ORR, PFS, DOR, and OS. Users configure display identity, treatment columns, row sections, population filters, denominator overrides, formatting, footnotes, and abbreviations.

This matters because a shell is more than presentation. It is a compact statement of the analysis the table expects: what rows should exist, which variables those rows depend on, which population defines the denominator, and how the output should be named and reviewed.

Data Intent: ADaM Spec Designer

The ADaM Spec Designer defines the dataset contract that the compiler should use. It includes dataset-level metadata, variable metadata, BDS parameters, value-level metadata, codelists, derivation hints, SDTM source context, and ADaM-to-ADaM dependencies. It also includes ARS-aligned bridge objects such as analysis sets, data subsets, grouping factors, methods, analyses, and outputs.

The workbench uses CDISC ARS as a conceptual vocabulary rather than treating schema compliance as the immediate product goal. The quality bar is pragmatic: can the Stage 1 bundle give an AI compiler enough precise information to generate useful, reviewable SAS or R code without follow-up questions?

Handoff and Compiler Boundary

The handoff layer is deliberately conservative. It parses shell JSON and ADaM JSON, detects selected shells and datasets, verifies that referenced shell variables exist in the selected ADaM metadata, warns when shell display IDs are not mapped to analyses or outputs, preserves SDTM source hints, and emits a compiler-ready Stage 1 bundle.

Boundary rule

The compiler request instructs the AI to use only the validated handoff bundle as its source of truth. It explicitly avoids inventing SDTM structure or generating SDTM mapping programs.

The AI Code Generator then creates an artifact plan and returns an ARSCoreGeneratedCodeBundle with generated files, review notes, warnings, dependencies, usage metadata, and a validation checklist. The generator can operate in full ARS mode or in a lean ADaM derivation mode that excludes analyses, methods, grouping factors, TLF shells, and outputs when they are not needed.

Validation, Review, and Auditability

Validation is not treated as a final checkbox. It is a feedback loop. The CORE Validation Workbench supports a browser-only subset path for lightweight checks and a localhost Python bridge for the CDISC CORE rules engine. The hosted app does not need to receive clinical datasets for the local path; files are posted to 127.0.0.1, processed by the bridge, and normalized into findings for review.

The package-audit step assembles a review package containing the available design, compiler, validation, and provenance artifacts. Dataset outputs and generated clinical data are intentionally outside the package boundary. That separation helps make the review package a traceability aid, not a substitute for controlled execution, statistical review, or regulated validation.

Risk	Workbench Control	Reviewer Payoff
Model invents missing context	Compiler boundary uses the validated bundle as source of truth.	Reviewers can inspect exactly what the model was allowed to use.
Shell and ADaM drift apart	Handoff checks referenced shell variables and output mapping.	Gaps are caught before code generation instead of after output review.
Validation findings become disconnected	CORE findings are normalized into the same review package context.	Teams can route issues back to metadata, derivations, or generated code.
Preview output is mistaken for production	Preview warnings and review-gated artifacts remain visible.	Human approval stays explicit.

Practical Use Cases

Design an AE table shell, map ADAE variables and analysis sets, generate SAS or R analysis code, and validate resulting dataset packages through CORE before review.
Import an ADaM metadata workbook, refine derivation snippets and VLM, and generate ADaM derivation programs without carrying TLF context into the model.
Create a review package for a workbench run that preserves shell JSON, ADaM JSON, compiler request metadata, generated code bundle, validation findings, and reviewer notes.
Use cross-spec validation as an early quality gate before involving the AI compiler, reducing rework caused by missing variables or mismatched output IDs.

Design Principles

Principle	Meaning in ARS-CORE
Structured first	Output and data intent become JSON before a model is asked to generate code.
Standards-aware, not schema-bound	ARS terms provide the analysis vocabulary; preview quality is judged by AI-readable handoff clarity.
Review-gated generation	Generated code remains a proposed artifact with warnings, dependencies, and review notes.
Local data boundary	The local CORE bridge validates through localhost without sending clinical datasets to the hosted app.
Traceability over magic	Compiler requests and review packages stay visible for human audit.

Roadmap Considerations

The most important next improvements are those that make the compiler bundle more complete and less ambiguous: controlled vocabulary for analysis reasons and purposes, richer operation definitions, structured where conditions, explicit dependencies, more stable bundle extraction paths, and usage tracking that can attribute AI generation calls to signed-in users. These investments improve generation quality without over-rotating toward formal ARS schema validation before the AI handoff problem is solved.

Replace free-text analysis fields with controlled ARS-aligned choices where the user intent is categorical.
Represent multi-operation methods explicitly so the compiler can generate analysis code in the intended sequence.
Create a clearer Stage 1 bundle shape that reduces fragile nested path extraction in the code generator.
Keep ADaM-only mode lean so the model receives only selected datasets, codelists, dependencies, and derivation snippets.
Improve per-user usage logging for ARS generation calls so administrators can understand adoption and cost patterns.

Conclusion

ARS-CORE is a practical answer to a near-term clinical programming problem: AI can help write code, but only when the workbench around it makes intent explicit, inspectable, and reviewable. By combining TLF shell design, ADaM metadata, ARS-aligned analysis concepts, compiler request controls, CORE validation, and review packaging, the workbench turns AI generation from a prose prompt into a governed workflow.

The result is not autonomy for its own sake. It is a better interface between human statistical judgment and machine-generated programming drafts.

ARS-CORE Workbench - White paper