A Statistical Programmer's Guide to RECIST 1.1, BOR, and CDISC Implementation
Objective Response Rate (ORR) is the proportion of patients in a trial whose tumor burden shrinks by a pre-specified minimum amount—and stays shrunk long enough to be confirmed. It is the most widely used response-based endpoint in oncology and is defined by the US FDA as “the proportion of patients with tumor size reduction of a predefined amount and for a minimum time period.” Response duration is generally measured from the time of initial response to documented tumor progression. Under the FDA definition, ORR excludes stable disease and is a direct measure of antitumor activity attributable to the drug.
Operationally, ORR aggregates per-patient Best Overall Response (BOR) categories derived from serial tumor assessments, most commonly using Response Evaluation Criteria in Solid Tumors (RECIST) version 1.1, the reference criteria set by Eisenhauer et al. in the European Journal of Cancer (2009). Analogous criteria exist for specific tumor types (e.g., Lugano criteria for lymphoma, IWG for leukemia, RANO for CNS tumors), but RECIST 1.1 remains the default for solid-tumor submissions.
For the statistical programmer, ORR is rarely “just a proportion.” It sits on top of a multi-layered derivation chain: TR (tumor measurements) and TU (tumor identification) feed RS (overall response per visit), which feeds ADRS (BOR and confirmed BOR), which finally yields the ORR responder flag and its exact binomial confidence interval. Each layer has specific CDISC controlled terminology, and each has confirmation and censoring rules that are easy to get wrong.
ORR has three properties that make it attractive as a primary endpoint—particularly for early-phase oncology trials and for single-arm registrational studies seeking accelerated or conditional approval.
The FDA's 2018 guidance Clinical Trial Endpoints for the Approval of Cancer Drugs and Biologics lists ORR as an acceptable basis for both regular and accelerated approval, explicitly noting that a durable ORR can support approval when the magnitude and duration of response are clinically meaningful and when existing therapies are inadequate. Recent examples in this program include oncogene-targeted agents in NSCLC, cholangiocarcinoma, and other rare indications where randomized survival trials were not feasible at the time of submission.
The trade-off is that ORR is a surrogate—it measures a biological effect, not a patient-centered outcome. Regulators therefore pair it with DoR, time to response (TTR), and usually a confirmatory OS or PFS study post-approval. Statistical programmers supporting these programs need to deliver not just the ORR number but the full response narrative that lets regulators judge durability.
RECIST 1.1 classifies each post-baseline tumor assessment into one of five overall response categories. The per-visit response is the worst category triggered by that visit's target, non-target, and new-lesion findings.
| Code | RECIST 1.1 definition (target lesions) |
| CR | Complete Response. Disappearance of all target lesions. Any pathological lymph node must have reduction in short axis to < 10 mm. |
| PR | Partial Response. At least a 30% decrease in the sum of diameters of target lesions, taking as reference the baseline sum. |
| SD | Stable Disease. Neither sufficient shrinkage to qualify for PR nor sufficient increase to qualify for PD, taking as reference the smallest sum on study (nadir). |
| PD | Progressive Disease. At least a 20% increase in the sum of diameters over the nadir, plus an absolute increase of at least 5 mm; or unequivocal progression of non-target disease; or the appearance of one or more new lesions. |
| NE | Not Evaluable. Assessment cannot be completed (e.g., missed or non-diagnostic imaging) and the response cannot be determined. |
Table 1. RECIST 1.1 overall response categories. Definitions summarized from Eisenhauer et al., European Journal of Cancer, 2009.
For trials where ORR is the primary endpoint, RECIST 1.1 requires confirmation of a CR or PR at a subsequent assessment performed at least 4 weeks after the initial response. Unconfirmed CR or PR is demoted, typically to SD. Confirmation is not required for endpoints where response is not the primary objective (e.g., PFS-primary trials), but every sponsor using ORR for registration should plan to confirm.
BOR is the best confirmed response recorded from the start of study treatment until disease progression or the analysis cut-off. The algorithm, in order of precedence:
Responder definition: a subject is an ORR responder if BOR ∈ {CR, PR}. ORR is then the count of responders divided by the analysis population (typically the Response-Evaluable or ITT population), expressed as a proportion with an exact binomial (Clopper–Pearson) two-sided 95% confidence interval.
The CDISC model for oncology response splits tumor data across three SDTM domains, with the ADaM ADRS dataset built on top. The key controlled terminology comes from the CDISC Oncology Response Assessment and Tumor/Lesion Results codelists.
| Domain | Purpose | Key variables |
| TU | Tumor Identification. Catalogue of target, non-target, and new lesions. | TULNKID (lesion link), TULOC, TUMETHOD, TUEVAL (Investigator / Independent Central Review), TUDTC |
| TR | Tumor Results. Per-lesion measurements at each assessment. | TRLNKID, TRTESTCD (LDIAM, SAXIS, SUMDIAM), TRSTRESN, TREVAL, TRDTC, VISITNUM |
| RS | Disease Response. Per-assessment overall response. | RSTESTCD (OVRLRESP, TRGRESP, NTRGRESP, NEWLIND), RSSTRESC (CR/PR/SD/PD/NE), RSEVAL, RSDTC |
| ADRS | ADaM response dataset. One record per subject per parameter per assessment. | PARAMCD (OVRLRESP, BOR, CBOR, OBJRSPN), AVALC, AVAL, ADT, ANL01FL, AVISIT, AVISITN |
Table 2. CDISC oncology response footprint. Mappings per SDTMIG and CDISC Therapeutic Area User Guide for Oncology.
The ADaM ADRS dataset is structured as one record per subject per analysis parameter per analysis visit. For an ORR analysis, the minimum parameter set is:
ANL01FL. The analysis flag ANL01FL='Y' is traditionally set on the one record per subject per parameter that enters the primary summary. For BOR/CBOR/OBJRSPN there is only one record per subject per parameter anyway, but the flag documents the selection explicitly and is expected by FDA reviewers. AVISIT is typically “End of Treatment” or a protocol-defined analysis visit. Evaluator should be preserved via AVALC parameter naming (e.g., separate OBJRSPN records for Investigator vs Independent Central Review) or via a supplementary PARCAT1 variable, per program conventions.
The code below shows a compact, review-ready derivation of BOR, confirmed BOR, and ORR in SAS. It assumes an ADRS-style input with PARAMCD='OVRLRESP' records per assessment, and demonstrates the Clopper–Pearson exact 95% CI using PROC FREQ. Confirmation logic is simplified for illustration—production code should enforce the ≥ 28-day interval from the protocol.
/*---------------------------------------------------------------
Step 1. Rank per-visit response so min() picks the best.
CR=1 is "best", NE=5 is "worst".
----------------------------------------------------------------*/
data ovr;set adrs (where=(paramcd='OVRLRESP' and anl01fl='Y'));length respn 3;select (avalc);when ('CR') respn = 1;when ('PR') respn = 2;when ('SD') respn = 3;when ('PD') respn = 4;when ('NE') respn = 5;otherwise respn = .; /* protect against bad CT */
end;run;
/*---------------------------------------------------------------Step 2. Unconfirmed BOR = best respn per subject.
----------------------------------------------------------------*/
proc sql;create table bor asselect usubjid,min(respn) as bor_n,case min(respn) when 1 then 'CR'when 2 then 'PR'when 3 then 'SD'when 4 then 'PD'when 5 then 'NE'else 'MISSING' end as bor length=8
from ovrgroup by usubjid;quit;
/*---------------------------------------------------------------Step 3. Confirmation: for each CR or PR, require a subsequentassessment >= 28 days later with response at least as good.
(PR confirmed by CR or PR; CR confirmed only by CR.)
----------------------------------------------------------------*/
proc sort data=ovr; by usubjid adt; run;
data conf;set ovr;by usubjid adt;length confflg $1;retain confflg;if first.usubjid then confflg = 'N';
/* Self-join via set look-ahead using a hash would be cleaner inproduction; the DoW loop shown here is clearest for reviewers. */
if avalc in ('CR','PR') then do;
_iter = 0;do until (eof2 or _iter);set ovr (keep=usubjid adt avalcrename=(adt=_adt2 avalc=_av2)) end=eof2;if usubjid = _self_usubjid and _adt2 >= adt + 28 then do;if (avalc='PR' and _av2 in ('CR','PR')) or
(avalc='CR' and _av2='CR') then do;confflg = 'Y';
_iter = 1;end;end;end;end;
_self_usubjid = usubjid;retain _self_usubjid;drop _iter _adt2 _av2 _self_usubjid;run;
/*---------------------------------------------------------------Step 4. Confirmed BOR: demote unconfirmed CR/PR to SD.
----------------------------------------------------------------*/
data cbor;merge bor (in=b) conf (keep=usubjid confflgwhere=(confflg='Y')
rename=(confflg=_cfl));by usubjid;if b;length cbor $8;if bor in ('CR','PR') and _cfl ne 'Y' then cbor = 'SD';else cbor = bor;objrspfl = (cbor in ('CR','PR')); /* 1 = responder, 0 = non */
run;
/*---------------------------------------------------------------Step 5. ORR with Clopper-Pearson exact 95% CI.
----------------------------------------------------------------*/
proc freq data=cbor;tables objrspfl / binomial(exact level='1') alpha=0.05;ods output BinomialCLs = orr_ci;run;
The BinomialCLs output dataset carries the point estimate and the exact lower and upper limits in the same row, ready for merging into the t_orr summary. If the program compares arms, replace BinomialCLs with RiskDiffCol1 and specify riskdiff(cl=exact).
The same derivation in R, using the tidyverse for data shaping and the binom package for exact confidence intervals. In a pharmaverse workflow the same result can be achieved with admiral::derive_param_extreme_record() for BOR and tern::stat_propdiff_ci() for inference, but the base implementation below is useful for validation and teaching.
library(dplyr)
library(tidyr)
library(binom)
# Assumes adrs has columns: USUBJID, PARAMCD, AVALC, ADT, ANL01FLovr <- adrs %>%
filter(PARAMCD == "OVRLRESP", ANL01FL == "Y") %>%
mutate(RESPN = recode(AVALC,
"CR" = 1L, "PR" = 2L, "SD" = 3L,
"PD" = 4L, "NE" = 5L, .default = NA_integer_))
# Unconfirmed BOR: best per subjectbor <- ovr %>%
group_by(USUBJID) %>%
summarise(BOR_N = min(RESPN, na.rm = TRUE), .groups = "drop") %>%
mutate(BOR = recode(BOR_N, `1` = "CR", `2` = "PR", `3` = "SD",
`4` = "PD", `5` = "NE"))
# Confirmation: any later CR/PR >= 28 days after an index CR/PRconf <- ovr %>%
filter(AVALC %in% c("CR", "PR")) %>%
inner_join(ovr, by = "USUBJID",relationship = "many-to-many",suffix = c("", "_nxt")) %>%
filter(ADT_nxt >= ADT + 28,
(AVALC == "PR" & AVALC_nxt %in% c("CR","PR")) |
(AVALC == "CR" & AVALC_nxt == "CR")) %>%
distinct(USUBJID) %>%
mutate(CONFFLG = "Y")
# Confirmed BOR and responder flagcbor <- bor %>%
left_join(conf, by = "USUBJID") %>%
mutate(CONFFLG = tidyr::replace_na(CONFFLG, "N"),CBOR = if_else(BOR %in% c("CR","PR") & CONFFLG != "Y", "SD", BOR),RESP = as.integer(CBOR %in% c("CR","PR")))
# ORR with Clopper-Pearson exact 95% CIorr <- cbor %>%
summarise(N = n(), x = sum(RESP)) %>%
bind_cols(binom.confint(x = .$x, n = .$N, conf.level = 0.95,methods = "exact") %>%
select(mean, lower, upper))
The ORR presentation table follows the FDA-preferred layout: n/N, percent, exact 95% CI, and a counts row for each BOR category. Subjects contribute to exactly one BOR row, and percent denominators are fixed at the Response-Evaluable population.
| Response category | n | % | 95% CI (exact) |
| Response-evaluable population | 40 | ||
| Complete response (CR) | 3 | 7.5 | (1.6, 20.4) |
| Partial response (PR) | 10 | 25.0 | (12.7, 41.2) |
| Stable disease (SD) | 20 | 50.0 | — |
| Progressive disease (PD) | 7 | 17.5 | — |
| Not evaluable (NE) | 0 | 0.0 | — |
| Objective response (CR + PR) | 13 | 32.5 | (18.6, 49.1) |
Table 3. Objective Response Rate summary (simulated N=40). ORR and the CR/PR row each report a two-sided exact Clopper–Pearson 95% CI; SD/PD/NE show counts only. Percent denominators are the response-evaluable population.
The waterfall plot is the single most recognizable figure in oncology. Each bar is one subject, ordered left-to-right from worst (largest increase) to best (largest decrease) best-percent-change from baseline in the sum of target lesion diameters. Two reference lines mark the RECIST 1.1 thresholds:
Figure 1. Waterfall plot of best percent change in the sum of target lesion diameters (N=40, simulated). Bars are coloured by confirmed BOR; the dashed lines mark the RECIST 1.1 −30% PR and +20% PD thresholds. Bars at −100% indicate complete disappearance of target lesions.
A waterfall is not sufficient to read ORR directly: a bar crossing −30% is only a candidate PR because overall response also depends on non-target lesions and the appearance of new lesions, and because the response must be confirmed. Programmers should colour bars by confirmed BOR, not by target-lesion change alone, to avoid misleading readers. The figure is typically produced from a per-subject dataset containing USUBJID, PCHG (best percent change), CBOR, and an on-treatment flag.
Where the waterfall captures magnitude of response, the swimmer plot captures time course. Each horizontal bar is one subject’s time on treatment; markers show time to first response, time to best response, and whether the patient remains on treatment at the data cut. For accelerated-approval submissions, durability is as important as the ORR point estimate, and the swimmer is the figure regulators expect to see alongside the DoR Kaplan–Meier.
Figure 2. Swimmer plot (N=20, simulated subset). Bars are coloured by confirmed BOR, stars mark time to first response, and arrows indicate subjects still on treatment at the data cut. Built from ADTTE (duration) and ADRS (BOR, TTR).
For single-arm ORR primary trials, sample size is driven by a one-sample exact binomial test against a historical-control ORR. Two common designs:
For randomized trials with ORR as primary endpoint, comparison across arms typically uses a stratified Cochran–Mantel–Haenszel test or the stratified Miettinen–Nurminen risk-difference CI. The stratification factors are the randomization strata, and programmers should document stratum definitions explicitly in the SAP and ADaM metadata.
ORR looks simple—a proportion with a confidence interval—but almost every submission that has ever stumbled on an ORR analysis has stumbled on the derivation layer, not the inference layer. The programming discipline around ORR is fundamentally about getting BOR right: applying confirmation consistently, handling NEs the way the SAP says, keeping evaluators separate, and presenting the waterfall and swimmer on the same population as the table. Do that well, and the Clopper–Pearson CI becomes the easy part.
References. Eisenhauer EA, Therasse P, Bogaerts J, et al. New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). European Journal of Cancer 2009;45(2):228–247. • US FDA. Clinical Trial Endpoints for the Approval of Cancer Drugs and Biologics, December 2018. • US FDA, 21 CFR 314 Subpart H (Accelerated Approval of New Drugs for Serious or Life-Threatening Illnesses). • CDISC SDTM Implementation Guide v3.4. • CDISC Therapeutic Area User Guide: Oncology. • CDISC ADaM Implementation Guide v1.3. • Clopper CJ, Pearson ES. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 1934;26(4):404–413.
Thanks a lot for the nuce summary! I think RECIST v1.1 OR confirmation is not required for randomised trials too, even when ORR is the 1ry.
