A Statistical Programmer's Guide to RECIST 1.1, BOR, and CDISC Implementation

1. What is Objective Response Rate?

Objective Response Rate (ORR) is the proportion of patients in a trial whose tumor burden shrinks by a pre-specified minimum amount—and stays shrunk long enough to be confirmed. It is the most widely used response-based endpoint in oncology and is defined by the US FDA as “the proportion of patients with tumor size reduction of a predefined amount and for a minimum time period.” Response duration is generally measured from the time of initial response to documented tumor progression. Under the FDA definition, ORR excludes stable disease and is a direct measure of antitumor activity attributable to the drug.

Operationally, ORR aggregates per-patient Best Overall Response (BOR) categories derived from serial tumor assessments, most commonly using Response Evaluation Criteria in Solid Tumors (RECIST) version 1.1, the reference criteria set by Eisenhauer et al. in the European Journal of Cancer (2009). Analogous criteria exist for specific tumor types (e.g., Lugano criteria for lymphoma, IWG for leukemia, RANO for CNS tumors), but RECIST 1.1 remains the default for solid-tumor submissions.

For the statistical programmer, ORR is rarely “just a proportion.” It sits on top of a multi-layered derivation chain: TR (tumor measurements) and TU (tumor identification) feed RS (overall response per visit), which feeds ADRS (BOR and confirmed BOR), which finally yields the ORR responder flag and its exact binomial confidence interval. Each layer has specific CDISC controlled terminology, and each has confirmation and censoring rules that are easy to get wrong.

2. Why ORR is the Go-To Primary Endpoint in Oncology

ORR has three properties that make it attractive as a primary endpoint—particularly for early-phase oncology trials and for single-arm registrational studies seeking accelerated or conditional approval.

It is directly attributable to the drug. Tumor shrinkage in a solid tumor is not explained by placebo effect, regression to the mean, or background care. A confirmed PR or CR on serial imaging is a biological signal of antitumor activity.
It can be measured in a single arm. Unlike overall survival (OS) or progression-free survival (PFS), ORR does not require a comparator to interpret, because an untreated advanced-cancer cohort is not expected to shrink. This makes ORR the preferred primary endpoint for Phase 2 expansion cohorts and for single-arm pivotal studies in settings with high unmet need.
It matures quickly. Because response is typically assessable within the first two to four imaging cycles, ORR matures months—sometimes years—before OS. That time advantage is central to the FDA's accelerated approval pathway under 21 CFR 314 Subpart H, where ORR plus duration of response (DoR) commonly serves as the surrogate endpoint reasonably likely to predict clinical benefit.

The FDA's 2018 guidance Clinical Trial Endpoints for the Approval of Cancer Drugs and Biologics lists ORR as an acceptable basis for both regular and accelerated approval, explicitly noting that a durable ORR can support approval when the magnitude and duration of response are clinically meaningful and when existing therapies are inadequate. Recent examples in this program include oncogene-targeted agents in NSCLC, cholangiocarcinoma, and other rare indications where randomized survival trials were not feasible at the time of submission.

The trade-off is that ORR is a surrogate—it measures a biological effect, not a patient-centered outcome. Regulators therefore pair it with DoR, time to response (TTR), and usually a confirmatory OS or PFS study post-approval. Statistical programmers supporting these programs need to deliver not just the ORR number but the full response narrative that lets regulators judge durability.

3. RECIST 1.1 Response Categories and Best Overall Response

RECIST 1.1 classifies each post-baseline tumor assessment into one of five overall response categories. The per-visit response is the worst category triggered by that visit's target, non-target, and new-lesion findings.

Code	RECIST 1.1 definition (target lesions)
CR	Complete Response. Disappearance of all target lesions. Any pathological lymph node must have reduction in short axis to < 10 mm.
PR	Partial Response. At least a 30% decrease in the sum of diameters of target lesions, taking as reference the baseline sum.
SD	Stable Disease. Neither sufficient shrinkage to qualify for PR nor sufficient increase to qualify for PD, taking as reference the smallest sum on study (nadir).
PD	Progressive Disease. At least a 20% increase in the sum of diameters over the nadir, plus an absolute increase of at least 5 mm; or unequivocal progression of non-target disease; or the appearance of one or more new lesions.
NE	Not Evaluable. Assessment cannot be completed (e.g., missed or non-diagnostic imaging) and the response cannot be determined.

Table 1. RECIST 1.1 overall response categories. Definitions summarized from Eisenhauer et al., European Journal of Cancer, 2009.

3.1 Confirmation Requirement

For trials where ORR is the primary endpoint, RECIST 1.1 requires confirmation of a CR or PR at a subsequent assessment performed at least 4 weeks after the initial response. Unconfirmed CR or PR is demoted, typically to SD. Confirmation is not required for endpoints where response is not the primary objective (e.g., PFS-primary trials), but every sponsor using ORR for registration should plan to confirm.

3.2 BOR Derivation Algorithm

BOR is the best confirmed response recorded from the start of study treatment until disease progression or the analysis cut-off. The algorithm, in order of precedence:

Assign each post-baseline visit a per-visit overall response (CR, PR, SD, PD, NE).
Require at least one prior SD assessment at a minimum interval (protocol-specified, typically 6–8 weeks) for SD to count.
Confirm CR/PR with a subsequent qualifying assessment at least 28 days later.
Select BOR as, in order: confirmed CR > confirmed PR > SD > PD > NE.
Subjects without any evaluable post-baseline assessment are BOR = NE (or missing per analysis convention).

Responder definition: a subject is an ORR responder if BOR ∈ {CR, PR}. ORR is then the count of responders divided by the analysis population (typically the Response-Evaluable or ITT population), expressed as a proportion with an exact binomial (Clopper–Pearson) two-sided 95% confidence interval.

4. CDISC Data Structure: From TR/TU/RS to ADRS

The CDISC model for oncology response splits tumor data across three SDTM domains, with the ADaM ADRS dataset built on top. The key controlled terminology comes from the CDISC Oncology Response Assessment and Tumor/Lesion Results codelists.

Domain	Purpose	Key variables
TU	Tumor Identification. Catalogue of target, non-target, and new lesions.	TULNKID (lesion link), TULOC, TUMETHOD, TUEVAL (Investigator / Independent Central Review), TUDTC
TR	Tumor Results. Per-lesion measurements at each assessment.	TRLNKID, TRTESTCD (LDIAM, SAXIS, SUMDIAM), TRSTRESN, TREVAL, TRDTC, VISITNUM
RS	Disease Response. Per-assessment overall response.	RSTESTCD (OVRLRESP, TRGRESP, NTRGRESP, NEWLIND), RSSTRESC (CR/PR/SD/PD/NE), RSEVAL, RSDTC
ADRS	ADaM response dataset. One record per subject per parameter per assessment.	PARAMCD (OVRLRESP, BOR, CBOR, OBJRSPN), AVALC, AVAL, ADT, ANL01FL, AVISIT, AVISITN

Table 2. CDISC oncology response footprint. Mappings per SDTMIG and CDISC Therapeutic Area User Guide for Oncology.

4.1 ADRS Parameter Conventions

The ADaM ADRS dataset is structured as one record per subject per analysis parameter per analysis visit. For an ORR analysis, the minimum parameter set is:

PARAMCD=OVRLRESP — overall response at each assessment (AVALC ∈ {CR, PR, SD, PD, NE}); numeric AVAL optional.
PARAMCD=BOR — best overall response (unconfirmed), one record per subject (AVALC ∈ {CR, PR, SD, PD, NE}).
PARAMCD=CBOR — confirmed best overall response, one record per subject. For registrational ORR, this is the analysis variable.
PARAMCD=OBJRSPN — objective response indicator derived from CBOR (AVALC = 'Y' if CBOR ∈ {CR, PR}, else 'N'; AVAL = 1/0). This is the record the ORR summary counts.

ANL01FL. The analysis flag ANL01FL='Y' is traditionally set on the one record per subject per parameter that enters the primary summary. For BOR/CBOR/OBJRSPN there is only one record per subject per parameter anyway, but the flag documents the selection explicitly and is expected by FDA reviewers. AVISIT is typically “End of Treatment” or a protocol-defined analysis visit. Evaluator should be preserved via AVALC parameter naming (e.g., separate OBJRSPN records for Investigator vs Independent Central Review) or via a supplementary PARCAT1 variable, per program conventions.

5. SAS Implementation: BOR, Confirmed BOR, and ORR

The code below shows a compact, review-ready derivation of BOR, confirmed BOR, and ORR in SAS. It assumes an ADRS-style input with PARAMCD='OVRLRESP' records per assessment, and demonstrates the Clopper–Pearson exact 95% CI using PROC FREQ. Confirmation logic is simplified for illustration—production code should enforce the ≥ 28-day interval from the protocol.

5.1 Deriving BOR from per-visit overall response


/*---------------------------------------------------------------

Step 1. Rank per-visit response so min() picks the best.
CR=1 is "best", NE=5 is "worst".
----------------------------------------------------------------*/
data ovr;set adrs (where=(paramcd='OVRLRESP' and anl01fl='Y'));length respn 3;select (avalc);when ('CR') respn = 1;when ('PR') respn = 2;when ('SD') respn = 3;when ('PD') respn = 4;when ('NE') respn = 5;otherwise respn = .; /* protect against bad CT */
end;run;


/*---------------------------------------------------------------Step 2. Unconfirmed BOR = best respn per subject.
----------------------------------------------------------------*/
proc sql;create table bor asselect usubjid,min(respn) as bor_n,case min(respn) when 1 then 'CR'when 2 then 'PR'when 3 then 'SD'when 4 then 'PD'when 5 then 'NE'else 'MISSING' end as bor length=8
from ovrgroup by usubjid;quit;

5.2 Confirming CR/PR


/*---------------------------------------------------------------Step 3. Confirmation: for each CR or PR, require a subsequentassessment >= 28 days later with response at least as good.
(PR confirmed by CR or PR; CR confirmed only by CR.)
----------------------------------------------------------------*/
proc sort data=ovr; by usubjid adt; run;

data conf;set ovr;by usubjid adt;length confflg $1;retain confflg;if first.usubjid then confflg = 'N';

/* Self-join via set look-ahead using a hash would be cleaner inproduction; the DoW loop shown here is clearest for reviewers. */
if avalc in ('CR','PR') then do;
_iter = 0;do until (eof2 or _iter);set ovr (keep=usubjid adt avalcrename=(adt=_adt2 avalc=_av2)) end=eof2;if usubjid = _self_usubjid and _adt2 >= adt + 28 then do;if (avalc='PR' and _av2 in ('CR','PR')) or
(avalc='CR' and _av2='CR') then do;confflg = 'Y';
_iter = 1;end;end;end;end;
_self_usubjid = usubjid;retain _self_usubjid;drop _iter _adt2 _av2 _self_usubjid;run;

5.3 Confirmed BOR, responder flag, and ORR with exact 95% CI


/*---------------------------------------------------------------Step 4. Confirmed BOR: demote unconfirmed CR/PR to SD.
----------------------------------------------------------------*/
data cbor;merge bor (in=b) conf (keep=usubjid confflgwhere=(confflg='Y')
rename=(confflg=_cfl));by usubjid;if b;length cbor $8;if bor in ('CR','PR') and _cfl ne 'Y' then cbor = 'SD';else cbor = bor;objrspfl = (cbor in ('CR','PR')); /* 1 = responder, 0 = non */
run;

/*---------------------------------------------------------------Step 5. ORR with Clopper-Pearson exact 95% CI.
----------------------------------------------------------------*/
proc freq data=cbor;tables objrspfl / binomial(exact level='1') alpha=0.05;ods output BinomialCLs = orr_ci;run;

The BinomialCLs output dataset carries the point estimate and the exact lower and upper limits in the same row, ready for merging into the t_orr summary. If the program compares arms, replace BinomialCLs with RiskDiffCol1 and specify riskdiff(cl=exact).

6. R Implementation and the ORR Summary Table

The same derivation in R, using the tidyverse for data shaping and the binom package for exact confidence intervals. In a pharmaverse workflow the same result can be achieved with admiral::derive_param_extreme_record() for BOR and tern::stat_propdiff_ci() for inference, but the base implementation below is useful for validation and teaching.


library(dplyr)
library(tidyr)
library(binom)

# Assumes adrs has columns: USUBJID, PARAMCD, AVALC, ADT, ANL01FLovr <- adrs %>%
filter(PARAMCD == "OVRLRESP", ANL01FL == "Y") %>%
mutate(RESPN = recode(AVALC,
"CR" = 1L, "PR" = 2L, "SD" = 3L,
"PD" = 4L, "NE" = 5L, .default = NA_integer_))

# Unconfirmed BOR: best per subjectbor <- ovr %>%
group_by(USUBJID) %>%
summarise(BOR_N = min(RESPN, na.rm = TRUE), .groups = "drop") %>%
mutate(BOR = recode(BOR_N, `1` = "CR", `2` = "PR", `3` = "SD",
`4` = "PD", `5` = "NE"))

# Confirmation: any later CR/PR >= 28 days after an index CR/PRconf <- ovr %>%
filter(AVALC %in% c("CR", "PR")) %>%
inner_join(ovr, by = "USUBJID",relationship = "many-to-many",suffix = c("", "_nxt")) %>%
filter(ADT_nxt >= ADT + 28,
(AVALC == "PR" & AVALC_nxt %in% c("CR","PR")) |
(AVALC == "CR" & AVALC_nxt == "CR")) %>%
distinct(USUBJID) %>%
mutate(CONFFLG = "Y")

# Confirmed BOR and responder flagcbor <- bor %>%
left_join(conf, by = "USUBJID") %>%
mutate(CONFFLG = tidyr::replace_na(CONFFLG, "N"),CBOR = if_else(BOR %in% c("CR","PR") & CONFFLG != "Y", "SD", BOR),RESP = as.integer(CBOR %in% c("CR","PR")))

# ORR with Clopper-Pearson exact 95% CIorr <- cbor %>%
summarise(N = n(), x = sum(RESP)) %>%
bind_cols(binom.confint(x = .$x, n = .$N, conf.level = 0.95,methods = "exact") %>%
select(mean, lower, upper))

6.1 Presentation table

The ORR presentation table follows the FDA-preferred layout: n/N, percent, exact 95% CI, and a counts row for each BOR category. Subjects contribute to exactly one BOR row, and percent denominators are fixed at the Response-Evaluable population.

Response category	n	%	95% CI (exact)
Response-evaluable population	40
Complete response (CR)	3	7.5	(1.6, 20.4)
Partial response (PR)	10	25.0	(12.7, 41.2)
Stable disease (SD)	20	50.0	—
Progressive disease (PD)	7	17.5	—
Not evaluable (NE)	0	0.0	—
Objective response (CR + PR)	13	32.5	(18.6, 49.1)

Table 3. Objective Response Rate summary (simulated N=40). ORR and the CR/PR row each report a two-sided exact Clopper–Pearson 95% CI; SD/PD/NE show counts only. Percent denominators are the response-evaluable population.

7. Visualizing Response: The Waterfall Plot

The waterfall plot is the single most recognizable figure in oncology. Each bar is one subject, ordered left-to-right from worst (largest increase) to best (largest decrease) best-percent-change from baseline in the sum of target lesion diameters. Two reference lines mark the RECIST 1.1 thresholds:

+20% — PD threshold for target-lesion progression (paired in RECIST 1.1 with a minimum absolute increase of 5 mm).
−30% — PR threshold for target-lesion response. Bars meeting or exceeding this reduction qualify as PR on target lesions (subject to confirmation and to non-target / new-lesion findings for the overall response).

Figure 1. Waterfall plot of best percent change in the sum of target lesion diameters (N=40, simulated). Bars are coloured by confirmed BOR; the dashed lines mark the RECIST 1.1 −30% PR and +20% PD thresholds. Bars at −100% indicate complete disappearance of target lesions.

A waterfall is not sufficient to read ORR directly: a bar crossing −30% is only a candidate PR because overall response also depends on non-target lesions and the appearance of new lesions, and because the response must be confirmed. Programmers should colour bars by confirmed BOR, not by target-lesion change alone, to avoid misleading readers. The figure is typically produced from a per-subject dataset containing USUBJID, PCHG (best percent change), CBOR, and an on-treatment flag.

8. Swimmer Plot, Statistical Considerations, and Pitfalls

Where the waterfall captures magnitude of response, the swimmer plot captures time course. Each horizontal bar is one subject’s time on treatment; markers show time to first response, time to best response, and whether the patient remains on treatment at the data cut. For accelerated-approval submissions, durability is as important as the ORR point estimate, and the swimmer is the figure regulators expect to see alongside the DoR Kaplan–Meier.

Figure 2. Swimmer plot (N=20, simulated subset). Bars are coloured by confirmed BOR, stars mark time to first response, and arrows indicate subjects still on treatment at the data cut. Built from ADTTE (duration) and ADRS (BOR, TTR).

8.1 Sample size and hypothesis testing

For single-arm ORR primary trials, sample size is driven by a one-sample exact binomial test against a historical-control ORR. Two common designs:

Simon’s two-stage design (optimal or minimax). Most common for Phase 2. Specify null ORR p₀, target ORR p₁, type I error α, and power 1−β. The clinfun::ph2simon() function in R, or PROC POWER with custom two-stage logic, returns n₁, r₁, N, and r.
Single-stage exact binomial. Used when enrollment is fast and interim analysis adds no value. The operating characteristic is reported as the minimum number of responders to reject the null, and the expected width of the exact 95% CI.

For randomized trials with ORR as primary endpoint, comparison across arms typically uses a stratified Cochran–Mantel–Haenszel test or the stratified Miettinen–Nurminen risk-difference CI. The stratification factors are the randomization strata, and programmers should document stratum definitions explicitly in the SAP and ADaM metadata.

8.2 Common pitfalls

Confirmation interval. Must be measured in days, not visits. Two consecutive CR/PR assessments only 21 days apart do not confirm; they are re-coded to SD at BOR derivation.
Evaluator handling. Investigator and Independent Central Review (ICR) are separate analyses, not a consensus. Create one OBJRSPN analysis set per evaluator and report both; do not silently pick the better one.
NE vs missing. A subject with no post-baseline imaging is BOR=NE and counts in the denominator unless the SAP pre-specifies a different analysis population. Dropping NEs inflates ORR; regulators will ask for both conventions.
Non-target and new-lesion findings. A +29% target change with unequivocal non-target progression is PD, not SD. The per-visit OVRLRESP must fold in all three components; don’t derive overall response from target lesions alone.
Data-cut snapshotting. ORR, DoR, and the swimmer plot must all draw from the same data cut. Mixing cut dates across figures is a frequent cause of review cycle delays.

8.3 Closing: ORR as a programming discipline

ORR looks simple—a proportion with a confidence interval—but almost every submission that has ever stumbled on an ORR analysis has stumbled on the derivation layer, not the inference layer. The programming discipline around ORR is fundamentally about getting BOR right: applying confirmation consistently, handling NEs the way the SAP says, keeping evaluators separate, and presenting the waterfall and swimmer on the same population as the table. Do that well, and the Clopper–Pearson CI becomes the easy part.

References. Eisenhauer EA, Therasse P, Bogaerts J, et al. New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). European Journal of Cancer 2009;45(2):228–247. • US FDA. Clinical Trial Endpoints for the Approval of Cancer Drugs and Biologics, December 2018. • US FDA, 21 CFR 314 Subpart H (Accelerated Approval of New Drugs for Serious or Life-Threatening Illnesses). • CDISC SDTM Implementation Guide v3.4. • CDISC Therapeutic Area User Guide: Oncology. • CDISC ADaM Implementation Guide v1.3. • Clopper CJ, Pearson ES. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 1934;26(4):404–413.

Objective Response Rate in Oncology Trials