{admiral}: A Deep Dive into Open-Source ADaM Dataset Generation in R

Introduction

The Analysis Data Model (ADaM) is a foundational CDISC standard that defines the structure and content of analysis-ready datasets required for regulatory submissions to the FDA and other global health authorities. For electronic regulatory submissions to the FDA under current study data standards requirements, ADaM datasets are required for analysis data. Historically, the creation of these datasets has been the exclusive domain of SAS programming — a language deeply entrenched in the pharmaceutical industry for decades.

That paradigm is changing. The {admiral} package — short for ADaM in R Asset Library — represents a landmark open-source effort to bring ADaM dataset generation into the R ecosystem. Developed collaboratively by Roche and GSK under the pharmaverse umbrella, {admiral} provides a modular, well-documented, and comprehensively tested toolbox that enables statistical programmers to create CDISC-compliant ADaM datasets entirely in R.

This article provides a comprehensive, technical deep dive into {admiral}: its architecture, its function taxonomy, practical code examples for common ADaM datasets, the broader ecosystem of extension packages, and strategic considerations for organizations evaluating adoption.

Background and Origin

The {admiral} project was born from a recognized gap in the R ecosystem. While the R/Pharma community had developed numerous packages for tables, listings, and figures (TLFs), no comprehensive solution existed for the upstream challenge of creating the ADaM datasets that feed those outputs.

Roche and GSK — two of the world's largest pharmaceutical companies — joined forces to address this gap. Their collaboration, formalized under the pharmaverse initiative, produced {admiral} as an open-source, cross-company R package licensed under Apache License 2.0. The project has since expanded to include contributors from Cytel, Johnson & Johnson, Bayer, and numerous other organizations.

R has been used in regulatory submissions to the FDA and EMA, including production workflows built with {admiral}, marking a significant milestone in the industry's acceptance of R for regulatory submissions.

Current Version

As of early 2026, the latest stable release of {admiral} can be found on CRAN. The package follows a regular release cycle, with core updates and ongoing expansion into therapeutic area extensions.

Installation and Setup

Installing from CRAN

The simplest way to install {admiral} is from CRAN:


install.packages("admiral")

Installing the Development Version

For the latest features and bug fixes, install directly from GitHub:


# install.packages("pak")
pak::pkg_install("pharmaverse/admiral", dependencies = TRUE)

Essential Companion Packages

A typical {admiral} workflow relies on several companion packages:


# Core dependencies and companionslibrary(admiral)            # ADaM derivation functionslibrary(pharmaversesdtm)    # Test SDTM datasets (CDISC Pilot)
library(dplyr)              # Data manipulation (tidyverse)
library(lubridate)          # Date/time handlinglibrary(stringr)            # String manipulationlibrary(tibble)             # Enhanced data frames

# For metadata and transport fileslibrary(metacore)           # Dataset metadata managementlibrary(metatools)          # Metadata utilitieslibrary(xportr)             # XPT file generation for submission

Architecture and Design Philosophy

Modular Derivation Approach

The central design principle of {admiral} is that an ADaM dataset is built through a sequence of derivations. Each derivation is a function call that adds one or more variables or records to the dataset being constructed. This modular approach provides several advantages: derivations can be easily added, removed, or reordered; each step is independently testable; and the resulting code reads as a clear, linear pipeline of transformations.

This stands in deliberate contrast to a "black-box" approach where a single function call generates an entire dataset. The {admiral} team has explicitly stated that their goal is not to automate ADaM creation with a single command, but rather to provide a toolbox of reusable, composable functions that programmers can assemble according to their study-specific requirements.

Function Taxonomy

{admiral} organizes its functions into four primary categories:

1. Derivation Functions — The workhorses of the package. These functions add variables or records to a dataset. They follow a consistent naming convention:

derive_vars_*() — Add one or more variables to the input dataset
derive_var_*() — Add a single variable
derive_param_*() — Add new parameter records (rows)
derive_extreme_*() — Derive extreme (first/last) records or events

2. Computation Functions — Vector-in, vector-out functions for calculations:

compute_bmi() — Calculate Body Mass Index
compute_bsa() — Calculate Body Surface Area
compute_map() — Calculate Mean Arterial Pressure
compute_qtc() — Calculate corrected QT interval
compute_egfr() — Calculate estimated Glomerular Filtration Rate
compute_age_years() — Convert age to years
compute_duration() — Calculate time durations

3. Higher Order Functions — Advanced functions that take other functions as input:

call_derivation() — Call a derivation multiple times with varying arguments
restrict_derivation() — Execute a derivation on a subset of the dataset
slice_derivation() — Execute different derivations on different subsets

4. Utility Functions — Supporting functions for common tasks:

convert_blanks_to_na() — Handle SAS-to-R missing value conversion
convert_dtc_to_dt() — Convert character dates to Date objects
convert_dtc_to_dtm() — Convert character dates to datetime objects
exprs() — Create lists of expressions (used extensively in arguments)

Argument Conventions

{admiral} uses a consistent convention for function arguments built on top of R's non-standard evaluation (NSE):

by_vars — Expects a list of symbols: by_vars = exprs(STUDYID, USUBJID)
filter — Expects a single expression: filter = PARAMCD == "TEMP"
order — Expects a list of expressions: order = exprs(AVISIT, desc(AESEV))
new_vars — Expects expressions for new variable definitions: new_vars = exprs(TRTSDTM = EXSTDTM)

The exprs() function from {admiral} (re-exported from rlang) is used extensively and returns expressions in an unevaluated form, enabling efficient lazy evaluation across large datasets.

ADaM Templates: Your Starting Point

One of the most practical features of {admiral} is its built-in template system. Templates are pre-built R scripts that serve as starting points for common ADaM datasets. They demonstrate the recommended derivation sequence and can be customized to meet study-specific requirements.

Listing Available Templates


library(admiral)
list_all_templates()

#> Existing ADaM templates in package 'admiral':
#>
#> • ADAE     (Adverse Event Analysis Dataset)
#> • ADCM     (Concomitant Medication Analysis Dataset)
#> • ADEG     (ECG Analysis Dataset)
#> • ADEX     (Exposure Analysis Dataset)
#> • ADLB     (Laboratory Analysis Dataset)
#> • ADLBHY   (Hy's Law Analysis Dataset)
#> • ADMH     (Medical History Analysis Dataset)
#> • ADPC     (Pharmacokinetic Concentration Dataset)
#> • ADPP     (Pharmacokinetic Parameter Dataset)
#> • ADPPK    (Population PK Analysis Dataset)
#> • ADSL     (Subject Level Analysis Dataset)
#> • ADVS     (Vital Signs Analysis Dataset)

Generating a Template


# Generate the ADSL template script in your working directoryuse_ad_template(
  adam_name = "adsl",
  save_path = "./ad_adsl.R"
)

# Generate ADAE templateuse_ad_template(
  adam_name = "adae",
  save_path = "./ad_adae.R"
)

# Generate from extension packagesuse_ad_template(
  adam_name = "adrs",
  save_path = "./ad_adrs.R",
  package = "admiralonco"
)

These templates are not meant to be used as-is. They are intentionally designed as starting points that programmers customize for their specific study designs, protocol requirements, and company standards.

Practical Examples

Example 1: Creating ADSL (Subject-Level Analysis Dataset)

ADSL is the most fundamental ADaM dataset — a one-record-per-subject dataset containing demographic information, treatment assignments, disposition dates, and population flags. Below is a condensed but representative example.


library(admiral)
library(dplyr, warn.conflicts = FALSE)
library(pharmaversesdtm)
library(lubridate)
library(stringr)

# ── Load Source SDTM Datasets ──
dm <- pharmaversesdtm::dm %>% convert_blanks_to_na()
ds <- pharmaversesdtm::ds %>% convert_blanks_to_na()
ex <- pharmaversesdtm::ex %>% convert_blanks_to_na()
ae <- pharmaversesdtm::ae %>% convert_blanks_to_na()
lb <- pharmaversesdtm::lb %>% convert_blanks_to_na()

# ── Pre-process Exposure Dates ──
ex_ext <- ex %>%
  derive_vars_dtm(
    dtc = EXSTDTC,
    new_vars_prefix = "EXST"
  ) %>%
  derive_vars_dtm(
    dtc = EXENDTC,
    new_vars_prefix = "EXEN"
  )

# ── Build ADSL Step-by-Step ──
adsl <- dm %>%

  # Step 1: Derive Treatment Variables
  mutate(
    TRT01P = ARM,
    TRT01A = ACTARM
  ) %>%

  # Step 2: Derive Treatment Start Date (TRTSDTM)
  derive_vars_merged(
    dataset_add = ex_ext,
    filter_add = (EXDOSE > 0 |
      (EXDOSE == 0 & str_detect(EXTRT, "PLACEBO"))) &
      !is.na(EXSTDTM),
    new_vars = exprs(TRTSDTM = EXSTDTM, TRTSTMF = EXSTTMF),
    order = exprs(EXSTDTM, EXSEQ),
    mode = "first",
    by_vars = exprs(STUDYID, USUBJID)
  ) %>%

  # Step 3: Derive Treatment End Date (TRTEDTM)
  derive_vars_merged(
    dataset_add = ex_ext,
    filter_add = (EXDOSE > 0 |
      (EXDOSE == 0 & str_detect(EXTRT, "PLACEBO"))) &
      !is.na(EXENDTM),
    new_vars = exprs(TRTEDTM = EXENDTM, TRTETMF = EXENTMF),
    order = exprs(EXENDTM, EXSEQ),
    mode = "last",
    by_vars = exprs(STUDYID, USUBJID)
  ) %>%

  # Step 4: Convert Datetimes to Dates
  derive_vars_dtm_to_dt(
    source_vars = exprs(TRTSDTM, TRTEDTM)
  ) %>%

  # Step 5: Derive Treatment Duration
  derive_var_trtdurd() %>%

  # Step 6: Derive Disposition Date
  derive_vars_merged(
    dataset_add = ds %>%
      derive_vars_dt(dtc = DSSTDTC, new_vars_prefix = "DSST"),
    by_vars = exprs(STUDYID, USUBJID),
    new_vars = exprs(EOSDT = DSSTDT),
    filter_add = DSCAT == "DISPOSITION EVENT" &
      DSDECOD != "SCREEN FAILURE"
  ) %>%

  # Step 7: Derive Age Groups
  derive_vars_cat(
    definition = exprs(
      AGEGR1 = case_when(
        AGE < 18 ~ "<18",
        between(AGE, 18, 64) ~ "18-64",
        AGE > 64 ~ ">64"
      )
    )
  ) %>%

  # Step 8: Derive Population Flags
  derive_var_merged_exist_flag(
    dataset_add = ex,
    by_vars = exprs(STUDYID, USUBJID),
    new_var = SAFFL,
    condition = (EXDOSE > 0 |
      (EXDOSE == 0 & str_detect(EXTRT, "PLACEBO")))
  ) %>%

  mutate(
    ITTFL = if_else(!is.na(ARM), "Y", "N")
  )

Key Takeaway: Each derive_* call adds specific variables to the dataset in a clear, sequential pipeline. The code is self-documenting: you can read it top-to-bottom and understand exactly what each step contributes to the final ADSL.

Example 2: Creating ADVS (Vital Signs Analysis Dataset)

ADVS follows the Basic Data Structure (BDS) pattern — one record per subject per parameter per analysis timepoint.


# ── Load source data ──
vs <- pharmaversesdtm::vs %>% convert_blanks_to_na()
# admiral_adsl is an example dataset included in the package for demonstrationadsl <- admiral::admiral_adsl

# ── Define ADSL variables needed in ADVS ──
adsl_vars <- exprs(TRTSDT, TRTEDT, TRT01P, TRT01A)

# ── Begin ADVS construction ──
advs <- vs %>%

  # Step 1: Merge ADSL variables
  derive_vars_merged(
    dataset_add = adsl,
    new_vars = adsl_vars,
    by_vars = exprs(STUDYID, USUBJID)
  ) %>%

  # Step 2: Map SDTM to ADaM variable names
  mutate(
    PARAMCD = VSTESTCD,
    PARAM   = VSTEST,
    AVAL    = VSSTRESN,
    AVALU   = VSSTRESU
  ) %>%

  # Step 3: Derive Analysis Dates
  derive_vars_dt(
    dtc = VSDTC,
    new_vars_prefix = "A",
    highest_imputation = "D"
  ) %>%

  # Step 4: Derive Analysis Day
  derive_vars_dy(
    reference_date = TRTSDT,
    source_vars = exprs(ADT)
  ) %>%

  # Step 5: Derive Visit Information
  mutate(
    AVISIT = case_when(
      str_detect(VISIT, "SCREEN|UNSCHED|RETRIEVAL|AMBUL") ~
        NA_character_,
      !is.na(VISIT) ~ str_to_title(VISIT),
      TRUE ~ NA_character_
    ),
    AVISITN = as.numeric(case_when(
      VISIT == "BASELINE" ~ "0",
      str_detect(VISIT, "WEEK") ~
        str_trim(str_replace(VISIT, "WEEK", "")),
      TRUE ~ NA_character_
    ))
  ) %>%

  # Step 6: Derive Baseline Flag and Value
  restrict_derivation(
    derivation = derive_var_extreme_flag,
    args = params(
      by_vars  = exprs(STUDYID, USUBJID, PARAMCD),
      order    = exprs(ADT, VSSEQ),
      new_var  = ABLFL,
      mode     = "last"
    ),
    filter = !is.na(AVAL) & ADT <= TRTSDT
  ) %>%

  derive_var_base(
    by_vars = exprs(STUDYID, USUBJID, PARAMCD),
    source_var = AVAL,
    new_var = BASE
  ) %>%

  # Step 7: Derive Change from Baseline
  derive_var_chg() %>%
  derive_var_pchg() %>%

  # Step 8: Derive BMI (computed parameter)
  derive_param_bmi(
    by_vars = exprs(STUDYID, USUBJID, !!!adsl_vars,
                    VISIT, VISITNUM, ADT, ADY),
    set_values_to = exprs(PARAMCD = "BMI"),
    get_unit_expr = VSSTRESU,
    constant_by_vars = exprs(USUBJID)
  ) %>%

  # Step 9: Derive MAP (Mean Arterial Pressure)
  derive_param_map(
    by_vars = exprs(STUDYID, USUBJID, !!!adsl_vars,
                    VISIT, VISITNUM, ADT, ADY),
    set_values_to = exprs(PARAMCD = "MAP"),
    get_unit_expr = VSSTRESU
  )

Example 3: Creating ADAE (Adverse Events Analysis Dataset)

ADAE follows the Occurrence Data Structure (OCCDS) pattern.


# ── Load source data ──
ae   <- pharmaversesdtm::ae %>% convert_blanks_to_na()
# admiral_adsl is an example dataset included in the package for demonstrationadsl <- admiral::admiral_adsl

# ── ADSL variables required for ADAE ──
adsl_vars <- exprs(TRTSDT, TRTEDT, DTHDT, EOSDT)

# ── Build ADAE ──
adae <- ae %>%

  # Step 1: Merge ADSL variables
  derive_vars_merged(
    dataset_add = adsl,
    new_vars = adsl_vars,
    by_vars = exprs(STUDYID, USUBJID)
  ) %>%

  # Step 2: Derive Analysis Start Datetime
  derive_vars_dtm(
    dtc = AESTDTC,
    new_vars_prefix = "AST",
    highest_imputation = "M",
    min_dates = exprs(TRTSDT)
  ) %>%

  # Step 3: Derive Analysis End Datetime
  derive_vars_dtm(
    dtc = AEENDTC,
    new_vars_prefix = "AEN",
    highest_imputation = "M",
    date_imputation = "last",
    time_imputation = "last",
    max_dates = exprs(DTHDT, EOSDT)
  ) %>%

  # Step 4: Convert to Dates
  derive_vars_dtm_to_dt(source_vars = exprs(ASTDTM, AENDTM)) %>%

  # Step 5: Derive Relative Days
  derive_vars_dy(
    reference_date = TRTSDT,
    source_vars = exprs(ASTDT, AENDT)
  ) %>%

  # Step 6: Derive Treatment-Emergent Flag
  derive_var_trtemfl(
    trt_start_date = TRTSDT,
    trt_end_date   = TRTEDT,
    end_window     = 30
  ) %>%

  # Step 7: Derive Severity/Toxicity Grades
  # Note: derive_var_atoxgr() may require specific input variables
  # and may be available via extension packages such as {admiralonco}
  # depending on the grading criteria used.
  derive_var_atoxgr() %>%

  # Step 8: Derive Occurrence Flags
  derive_var_extreme_flag(
    by_vars  = exprs(STUDYID, USUBJID),
    order    = exprs(AESEV, ASTDT, AESEQ),
    new_var  = AOCCIFL,
    mode     = "first"
  )

The {admiral} Ecosystem: Extension Packages

Beyond the core package, {admiral} supports a family of extension packages organized by therapeutic area and company-specific needs:

Therapeutic Area Extensions

Package Therapeutic Area Key Datasets Notable Functions
{admiralonco}	Oncology	ADRS, ADTR, ADTTE	Best overall response (BOR), RECIST 1.1 and iRECIST criteria, tumor response derivations
{admiralophtha}	Ophthalmology	ADOE	Affected eye derivation, BCVA to logMAR conversion, Snellen category mapping
{admiralvaccine}	Vaccines	ADIS, ADCE	Immunogenicity specimen derivations, fever record detection, criteria evaluation flags
{admiralpeds}	Pediatrics	Pediatric ADaMs	Anthropometric indicators, child growth/development chart parameters
{admiralmetabolic}	Metabolic Disorders	Metabolic ADaMs	Specialized metabolic disease derivations

Infrastructure Packages

Package Purpose
{admiraldev}	Developer utilities, common functions shared across all {admiral} packages
{pharmaversesdtm}	Test SDTM datasets from the CDISC Pilot Project
{pharmaverseadam}	Test ADaM datasets generated from {admiral} templates

Company Extension Packages

Companies can create their own extension packages (e.g., {admiralroche}, {admiralgsk}) that plug into the {admiral} framework with company-specific metadata access, naming conventions, and custom derivation logic.

Key Advantages of {admiral}

1. Open Source and Transparent

Every line of {admiral} code is publicly available on GitHub. Functions are comprehensively documented with real-data examples, and the entire test suite is visible. This transparency is critical for regulatory submissions where code reviewability is paramount.

2. Cross-Industry Collaboration

{admiral} is not a single-company effort. Contributors span Roche, GSK, Cytel, Johnson & Johnson, Bayer, and independent contributors. This cross-industry development ensures the package addresses common needs rather than company-specific quirks, and it distributes the maintenance burden across organizations.

3. Designed to Align with CDISC Standards

Derivation functions are designed to align with the CDISC ADaM Implementation Guide. The package's alignment with CDISC standards helps reduce the risk of non-compliance during regulatory review, though compliance ultimately depends on how the functions are applied within each study context.

4. Comprehensive Testing

{admiral} employs extensive unit testing — a practice less common in traditional SAS macro libraries. Each function includes automated tests that verify correct behavior across edge cases, missing values, and boundary conditions. This level of testing provides confidence in the correctness of derivations.

5. Reproducibility

R scripts built with {admiral} are inherently reproducible. Combined with version pinning (via renv or similar tools), an {admiral}-based pipeline produces identical output given the same inputs and package versions — a property that is essential for regulatory submissions and QC workflows.

6. Template-Driven Workflow

The built-in template system reduces the cognitive overhead of starting a new ADaM dataset. Rather than building from scratch, programmers start with a vetted template and customize it, reducing both development time and the risk of missing standard derivations.

7. Regulatory Acceptance

Multiple pharmaceutical regulatory submissions to the FDA and EMA have reportedly used R and {admiral} in production workflows. These successful submissions establish precedent and reduce the regulatory risk perceived by organizations considering adoption.

Complementary Pharmaverse Packages for End-to-End Workflows

{admiral} operates within the broader pharmaverse ecosystem. A complete end-to-end clinical reporting workflow in R typically involves:

Stage Package(s) Purpose
SDTM Creation	{sdtm.oak}, {sdtmchecks}	Transform raw data to SDTM
ADaM Creation	{admiral} + extensions	Transform SDTM to ADaM
Metadata Management	{metacore}, {metatools}	Dataset specifications and metadata
Transport Files	{xportr}	Generate XPT files for submission
Tables & Listings	{rtables}, {Tplyr}, {pharmaRTF}	Create regulatory tables and listings
Figures	{tern}, {ggplot2}	Create clinical trial figures

Getting Started: A Practical Roadmap

For organizations and individuals looking to adopt {admiral}, the following roadmap is recommended:

Step 1: Install and Explore

Install {admiral} and its companion packages, then run the built-in ADSL template against the included CDISC Pilot data:


install.packages(c("admiral", "pharmaversesdtm", "dplyr",
                    "lubridate", "stringr"))
library(admiral)
use_ad_template("adsl", save_path = "./ad_adsl.R")

Run the template script and examine the resulting dataset. Compare it against the ADSL specification from the CDISC Pilot Project.

Step 2: Customize for Your Study

Take the template output and begin modifying it for your study's specific requirements. Add custom derivations, adjust imputation rules, and integrate your company's metadata conventions.

Step 3: Integrate Metadata and Transport

Use {metacore}, {metatools}, and {xportr} to apply variable labels, formats, and lengths from your dataset specification, then generate submission-ready XPT files.

Step 4: Validate and QC

Establish a double-programming or independent QC process. The modular, function-based structure of {admiral} code makes it straightforward to review and verify individual derivation steps.

Step 5: Engage with the Community

Join the pharmaverse Slack workspace for support, contribute to GitHub issues, and consider contributing functions or improvements back to the package.

Considerations and Limitations

While {admiral} represents a significant advancement, there are practical considerations to keep in mind:

Learning Curve: Programmers transitioning from SAS will need to invest time learning R, the tidyverse idiom, and {admiral}'s NSE-based argument conventions.
Not a Black Box: {admiral} deliberately does not generate complete ADaM datasets with a single function call. Each dataset requires assembling a sequence of derivation calls, which gives full control but demands understanding of both ADaM requirements and {admiral}'s function library.
Coverage is Not 100%: The {admiral} team has explicitly acknowledged that complete coverage of all possible ADaM derivations is neither achievable nor the goal. Study-specific and company-specific derivations will still require custom R code.
Validation Infrastructure: Organizations using {admiral} for regulatory submissions should establish a formal validation framework, including documentation of package qualification, testing procedures, and version control.

Conclusion

The {admiral} package is not merely an R package — it represents a philosophical shift in how the pharmaceutical industry approaches clinical data programming. By providing an open-source, collaboratively developed, and rigorously tested toolbox for ADaM dataset generation, {admiral} is enabling a transition that many thought was years away: the viable use of R for end-to-end regulatory submissions.

For statistical programmers, the message is clear: {admiral} has been used in regulatory submissions, is backed by an active cross-industry community, and continues to mature with each release. Whether your organization is beginning to explore R or is already deep into its R transition, {admiral} provides the foundation for building CDISC-aligned ADaM datasets with confidence.

Resources

Resource URL
{admiral} CRAN Page	https://cran.r-project.org/package=admiral
{admiral} Documentation	https://pharmaverse.github.io/admiral/
GitHub Repository	https://github.com/pharmaverse/admiral
Pharmaverse Blog	https://pharmaverse.github.io/blog/
Pharmaverse YouTube Channel	https://www.youtube.com/@pharmaverse
Pharmaverse Slack	https://pharmaverse.slack.com
CDISC ADaM Standard	https://www.cdisc.org/standards/foundational/adam

This article is published on clinstandards.org — a technical publication serving the statistical programming community in pharmaceutical research.