CLINSTANDARDS.ORG Deep Dive Article
PROC R | PROC PYTHON | PROC IML | PROC FCMP
SASPy | {pharmaverse} | R Consortium FDA Submissions Pilots
For decades, SAS has been the dominant programming language in clinical statistical programming. The entire regulatory submission ecosystem, from SDTM and ADaM dataset creation to TFL generation, was built on SAS infrastructure. Pharma companies, CROs, and regulatory agencies like the FDA, EMA, and PMDA developed deep institutional expertise in SAS, and the software's deterministic behavior and validated environment became the industry gold standard.
However, the landscape is shifting. R and Python have matured into powerful platforms with rich ecosystems for statistical computing, machine learning, data visualization, and automation. The R Consortium's successful FDA submission pilots, the emergence of the {pharmaverse} ecosystem, and SAS's own investment in cross-language interoperability all signal a fundamental transition: the future of clinical programming is multi-language.
A notable development in this evolution arrived with the SAS Viya 2026.03 release: PROC R, a standalone procedure listed in the Base SAS Procedures Guide for running R code within SAS, paralleling PROC PYTHON. Its placement in the Base SAS Procedures Guide suggests a more accessible pathway to R than the PROC IML route, though exact licensing and configuration requirements should be confirmed with SAS documentation or your SAS administrator.
This article provides a deep dive into the technical mechanisms that enable SAS, R, and Python to interoperate within clinical trial workflows. We examine the new PROC R, PROC PYTHON, the legacy PROC IML pathway, PROC FCMP with Python objects, the reverse pathways (SASPy, haven, and the {procs} package), and the regulatory landscape that is making multi-language submissions a reality.
| Why This Matters for Statistical Programmers | The FDA's Statistical Software Clarifying Statement confirmed that the agency does not require use of any specific software for statistical analyses. Combined with the R Consortium's successful Pilots 1 through 5, this means that R-based (and potentially Python-based) submission packages are now proven pathways. PROC R's appearance alongside PROC PYTHON in the Base SAS Procedures Guide indicates SAS's continued investment in open-source language interoperability. Statistical programmers who can bridge multiple languages will be well positioned as the industry evolves. |
With the SAS Viya 2026.03 release (April 2026), SAS Institute introduced PROC R as an official procedure listed in the Base SAS Procedures Guide. This is a significant development for the clinical programming community. PROC R provides a standalone procedure for executing R code within a SAS session, following the same design pattern as PROC PYTHON. Its placement in the Base SAS Procedures Guide (rather than the SAS/IML documentation) suggests it may offer a more accessible pathway to R integration, though organizations should verify the exact licensing terms with SAS for their specific deployment.
The PROC R documentation references several R packages: R6 (object-oriented programming), arrow (high-performance columnar data exchange), haven (SAS/SPSS/Stata file I/O), plotly (interactive visualizations), and svglite (lightweight SVG graphics output). The exact nature of this association (pre-installed, recommended, or dependency) should be confirmed with the full SAS documentation for your deployment. If available out of the box, the inclusion of {arrow} would be particularly significant for clinical programming, enabling high-performance data transfer between SAS and R.
proc r;
submit;
# R code executes here
library(haven)
library(dplyr)
# Access SAS dataset as R data frame
adsl <- sas.sd2df("ADAM.ADSL")
# Perform R analysis
summary_stats <- adsl %>%
group_by(TRT01A) %>%
summarise(
N = n(),
Mean_Age = mean(AGE, na.rm = TRUE),
SD_Age = sd(AGE, na.rm = TRUE)
)
# Push results back to SAS
sas.df2sd(summary_stats, "WORK.TRT_SUMMARY")
endsubmit;
run;
/* Continue in SAS */
proc print data=work.trt_summary noobs; run;
proc r;
submit;
library(survival)
library(survminer)
# Pull ADTTE from SAS
adtte <- sas.sd2df("ADAM.ADTTE")
# Fit KM model
fit <- survfit(Surv(AVAL, CNSR==0) ~ TRT01A, data = adtte)
# Generate publication-quality KM plot
# Displays directly in SAS Studio results
ggsurvplot(fit,
data = adtte, risk.table = TRUE, pval = TRUE,
palette = c("#2E86C1", "#E74C3C"),
xlab = "Time (Months)", ylab = "Survival Probability",
title = "Overall Survival by Treatment Arm",
legend.labs = c("Placebo", "Treatment"),
risk.table.height = 0.25)
endsubmit;
run;
proc r;
submit;
library(ggplot2)
library(dplyr)
adsl <- sas.sd2df("ADAM.ADSL")
# Subgroup analysis results (pre-computed)
subgroups <- data.frame(
Subgroup = c("Overall", "Age < 65", "Age >= 65",
"Male", "Female", "White", "Non-White"),
HR = c(0.72, 0.68, 0.78, 0.70, 0.75, 0.73, 0.69),
LCL = c(0.58, 0.49, 0.59, 0.52, 0.55, 0.57, 0.44),
UCL = c(0.89, 0.94, 1.03, 0.94, 1.02, 0.94, 1.08),
N = c(500, 280, 220, 260, 240, 380, 120)
)
ggplot(subgroups, aes(x = HR, y = reorder(Subgroup, -HR))) +
geom_point(size = 3) +
geom_errorbarh(aes(xmin=LCL, xmax=UCL), height=0.2) +
geom_vline(xintercept = 1, linetype = "dashed") +
labs(x = "Hazard Ratio (95% CI)", y = "",
title = "Forest Plot: Overall Survival by Subgroup") +
theme_minimal()
endsubmit;
run;
| PROC R vs. PROC IML: What Changed? | PROC R appears in the Base SAS Procedures Guide for SAS Viya, separate from the SAS/IML documentation. The syntax is simpler (proc r; submit; ... endsubmit; run;) compared to the PROC IML approach (proc iml; submit / R; ... endsubmit; run;) with its ExportDataSetToR/ImportDataSetFromR subroutines. According to the documentation, PROC R natively displays R graphics in SAS Studio, including ggplot2 output. For SAS Viya deployments, PROC R appears to be the intended pathway going forward. PROC IML remains available for SAS 9.4 environments and for users who need the IML matrix language capabilities alongside R. Confirm licensing details with your SAS administrator. |
Since SAS/IML 9.22 (released in 2008), SAS has supported a formal interface for calling R from the SAS/IML matrix language. This interface requires the RLANG system option to be enabled at SAS startup and the R_HOME environment variable to point to a compatible R installation. The RLANG option status can be verified with PROC OPTIONS.
/* Verify RLANG status */
proc options option=rlang; run;
/* Call R via PROC IML */
proc iml;
call ExportDataSetToR("work.adsl", "adsl");
submit / R;
model <- glm(TRT01PN ~ AGE + factor(SEX),
data = adsl, family = "binomial")
preds <- data.frame(
USUBJID = adsl$USUBJID,
PRED_PROB = predict(model, type = 'response'))
endsubmit;
call ImportDataSetFromR("work.predictions", "preds");
run;
PROC IML remains the only option for organizations running SAS 9.4 on-premise. It requires the SAS/IML license (a paid add-on), supports ExportDataSetToR and ImportDataSetFromR for data transfer, and can display R graphics only when running locally (not on remote SAS servers). For SAS 9.4 environments without an IML license, the community-developed %PROC_R macro by Xin Wei provides an alternative that uses Base SAS file I/O to exchange data with R.
| SAS Type | R Type | Notes | Clinical Example |
| Numeric | numeric (double) | Direct mapping | AGE, AVAL, CHG |
| Character | character / factor | R version dependent | USUBJID, PARAMCD |
| SAS Date | numeric | Different epoch; manual conversion needed | TRTSDT, ADT |
| SAS DateTime | numeric | SAS epoch: 1960; R epoch: 1970 | ADTM |
| Missing (.) | NA | Special missing (.A-.Z) become NA | BASE, CHG |
PROC PYTHON, introduced in SAS Viya and available in SAS 9.4M8+, provides a direct interface for embedding Python code within SAS programs. Like PROC R, it uses a SUBMIT/ENDSUBMIT block syntax and provides methods for bidirectional data transfer between SAS datasets and pandas DataFrames.
proc python;
submit;
import pandas as pd
import numpy as np
# SAS.sd2df() - SAS dataset to pandas DataFrame
adlb = SAS.sd2df("ADAM.ADLB")
# SAS.df2sd() - pandas DataFrame to SAS dataset
SAS.df2sd(result_df, "WORK.RESULT")
# SAS.submit() - execute SAS code from Python
SAS.submit("proc means data=work.result; run;")
# SAS.symget() / SAS.symput() - macro variable exchange
study = SAS.symget("STUDYID")
SAS.symput("NOBS", str(len(adlb)))
endsubmit;
run;
proc python;
submit;
import pandas as pd
adlb = SAS.sd2df("ADAM.ADLB")
adlb = adlb[(adlb["SAFFL"] == "Y") & (adlb["ANL01FL"] == "Y")]
def categorize(val, lo, hi):
if pd.isna(val): return 'Missing'
if val < lo: return 'Low'
if val > hi: return 'High'
return 'Normal'
adlb["BASE_CAT"] = adlb.apply(
lambda r: categorize(r["BASE"], r["A1LO"], r["A1HI"]), axis=1)
adlb["POST_CAT"] = adlb.apply(
lambda r: categorize(r["AVAL"], r["A1LO"], r["A1HI"]), axis=1)
shift = pd.crosstab(
[adlb["PARAMCD"], adlb["BASE_CAT"]],
adlb["POST_CAT"], margins=True).reset_index()
SAS.df2sd(shift, "WORK.SHIFT_TABLE")
endsubmit;
run;
While PROC PYTHON enables block-level Python execution, PROC FCMP (Function Compiler) allows individual Python functions to be compiled and called directly from SAS DATA steps. This is valuable for embedding reusable Python logic (API calls, complex string parsing, ML scoring) into standard SAS data processing pipelines without switching to a PROC PYTHON block.
proc fcmp outlib=work.funcs.pyfuncs;
function clean_term(raw_term $) $ 200;
declare object py(python);
submit into py;
def clean_term(raw_term):
import re
return re.sub(r'\\s+', ' ', raw_term.strip().upper())
endsubmit;
rc = py.publish();
rc = py.call("clean_term", raw_term);
result = py.results['clean_term'];
return(result);
endsub;
run;
options cmplib=work.funcs;
data adae_clean;
set adam.adae;
AEDECOD_CLEAN = clean_term(AEDECOD);
run;
Interoperability is not one-directional. The open-source community has built robust tools for accessing SAS data and functionality from R and Python environments.
import saspy
sas = saspy.SASsession(cfgname='winlocal')
# Submit SAS code from Python
sas.submitLST('proc means data=sashelp.class; run;')
# Convert SAS dataset to pandas
adsl = sas.sd2df('ADSL', libref='ADAM')
# Push pandas DataFrame to SAS
sas.df2sd(results_df, table='STATS', libref='WORK')
library(haven)
# Read XPT (submission transport files)
adsl <- read_xpt("path/to/adsl.xpt")
# Read native SAS dataset
adlb <- read_sas("path/to/adlb.sas7bdat")
# Write back to XPT
write_xpt(adsl, path = "adsl.xpt", version = 5)
The {procs} R package recreates PROC FREQ, PROC MEANS, PROC TTEST, and PROC REG with output matching SAS results, reducing validation burden during SAS-to-R migration.
library(procs)
freq_result <- proc_freq(adsl, tables = "TRT01A * SEX")
means_result <- proc_means(adsl,
var = "AGE BMIBL HEIGHTBL WEIGHTBL",
stats = v(n, mean, std, median, min, max),
class = "TRT01A")
| Method | Direction | SAS License | Use Case | Complex. | Clin. Value | Era |
| PROC R | SAS → R | Base SAS Proc Guide (Viya) | Full R integration | Low | High | 2026+ |
| PROC PYTHON | SAS → Python | Base SAS (Viya/9.4M8) | ML, automation | Low | Very High | 2020+ |
| PROC IML + R | SAS → R | SAS/IML (paid) | Advanced stats, matrix ops | Medium | High | 2008+ |
| PROC FCMP + Py | SAS → Python | Base SAS | DATA step functions | Med-High | Medium | 2019+ |
| SASPy | Python → SAS | SAS session needed | Legacy code reuse | Low | High | 2017+ |
| {haven} | R ↔ SAS data | None | XPT / SAS7BDAT I/O | Low | Essential | 2015+ |
| {procs} | SAS procs in R | None | SAS-to-R migration | Low | High | 2023+ |
| %PROC_R macro | SAS → R | Base SAS only | R without IML (SAS 9.4) | Medium | Medium | 2013+ |
The R Consortium Submissions Working Group has conducted a series of landmark pilot submissions to the FDA that progressively demonstrate R's viability for regulatory submissions. These are cross-company, FDA-industry collaborations with all materials publicly available.
| Pilot | Year | Scope | Technology | Status |
| Pilot 1 | 2021 | TLFs generated entirely in R, submitted via eCTD | R, renv | FDA Accepted |
| Pilot 2 | 2022 | Interactive R Shiny app delivering TLFs | R, Shiny | FDA Reviewed |
| Pilot 3 | 2023-24 | Full ADaM + TLF pipeline in R; first R-based ADaM submission | R, {admiral} | FDA Approved Aug 2024 |
| Pilot 4 | 2024-25 | Shiny app via WebAssembly and Docker containers | WebAssembly, Docker | WebAssembly submitted; Container in progress |
| Pilot 5 | 2025 | R submission with Dataset-JSON replacing XPT files | R, {datasetjson} | Submitted Fall 2025 |
| Pilots 6-7 | 2026+ | Planned future pilots (details forthcoming) | TBD | In Planning |
| Key FDA Feedback Themes from Pilot 3 | The FDA emphasized three themes: (1) clear ADRG documentation on computing environment, package dependencies, and expected warnings; (2) thorough documentation of data processing rules and statistical method implementation; and (3) adherence to good statistical practice in confirmatory trials. Themes 2 and 3 are language-agnostic and apply regardless of whether SAS, R, or Python is used. |
The established SAS infrastructure handles SDTM mapping, ADaM derivation, and standard TFL generation. R is called (via PROC R on Viya or PROC IML on SAS 9.4) specifically for advanced visualizations: Kaplan-Meier curves with risk tables, forest plots, waterfall plots for tumor response, and swimmer plots for oncology timelines. This preserves the validated SAS pipeline while leveraging R's {ggplot2} and {survminer} ecosystems.
PROC PYTHON enables automation of repetitive tasks: metadata extraction from SAP documents using NLP, dynamic define.xml generation via Python's lxml library, batch PDF parsing for safety narratives, API-driven data ingestion from LIMS systems, and ML model scoring for patient stratification.
For organizations moving toward all-R submissions, the {pharmaverse} ecosystem provides a complete pipeline: {admiral} for ADaM creation, {teal} for interactive exploration, {rtables} and {gt} for clinical TFLs, and {haven} for SAS data I/O. The R Consortium pilots have validated this approach through the FDA's eCTD gateway.
| Pattern | Data Pipeline | TFL Generation | Visualization | Best For |
| SAS Only | SAS | SAS | SAS/GRAPH, ODS | Legacy systems |
| SAS + R (PROC R / IML) | SAS | SAS | R via PROC R or IML | Enhanced graphics |
| SAS + Python | SAS | SAS | Python via PROC PYTHON | Automation, ML |
| Full R ({pharmaverse}) | R ({admiral}) | R ({rtables}) | R ({ggplot2}) | R-first orgs |
| Tri-language | SAS or R | SAS or R | R or Python | Maximum flexibility |
Multi-language workflows introduce validation complexity: R package versioning (managed via {renv} or {pak}), Python environment management (via conda, venv, or Docker), and cross-language numerical precision differences all require careful governance. Organizations must develop IQ/OQ protocols for each language environment.
SAS and R use different date epochs (1960 vs. 1970), and SAS and Python handle character encoding differently (SAS typically uses WLATIN1, while R and Python default to UTF-8). Always specify encoding explicitly when moving data between environments.
Adopting multi-language workflows requires training, updated SOPs, modified code review processes, and potentially restructured teams. The transition is typically incremental, starting with visualization-only R usage via PROC R and gradually expanding to full R-based pipelines.
The clinical statistical programming landscape is undergoing a significant transformation. The appearance of PROC R in the SAS Viya 2026.03 Base SAS Procedures Guide is notable: SAS now lists dedicated procedures for both R and Python integration, suggesting a continued investment in open-source interoperability. Organizations should verify the exact licensing and configuration requirements for their specific SAS deployment before making infrastructure decisions.
Combined with the R Consortium's FDA submission pilots (now through Pilot 5 with Dataset-JSON), the {pharmaverse} ecosystem, and tools like SASPy, {haven}, and {procs}, the technical and regulatory foundations for multi-language clinical programming continue to solidify. For statistical programmers, building skills across SAS, R, and Python is increasingly valuable for career growth and organizational contribution.
1. SAS Institute. PROC R Documentation. Base SAS Procedures Guide, v2026.03. documentation.sas.com.
2. SAS Institute. PROC PYTHON Documentation. SAS 9.4 / SAS Viya. documentation.sas.com.
3. SAS Institute. SAS/IML Interface to R. SAS 9.4 Programming Documentation. documentation.sas.com.
4. R Consortium Submissions Working Group. Pilot 1-5 Materials. r-consortium.org.
5. FDA. Statistical Software Clarifying Statement. fda.gov.
6. Bosak, D. {procs}: Recreates Some SAS Procedures in R. procs.r-sassy.org.
7. Wickham, H. & Miller, E. {haven}: Import and Export SAS, SPSS, and Stata Files. CRAN.
8. SAS Institute. SASPy. github.com/sassoftware/saspy.
9. PharmaSUG 2023. QT-165: Running Python Code inside a SAS Program. pharmasug.org.
10. Appsilon (2025). R in FDA Submissions: Lessons Learned from 5 FDA Pilots. appsilon.com.
clinstandards.org
Free, non-commercial education for the CDISC and statistical programming community
No comments yet. Be the first!