Logo
Clinical Data Standards Hub
Sign In
Non-profit Community HubNot affiliated with CDISC/SASContributions WelcomeNon-profit Community HubNot affiliated with CDISC/SASContributions Welcome
Back to Insights
Regulatory April 28, 2026 11 min read

GxP Guidelines in Pharma and Biotech

A working reference for biostatisticians and statistical programmers

GxP is shorthand for the family of regulations and quality guidelines that apply to any work that touches a regulated medicinal product. The G stands for Good, the x is a placeholder for the discipline (Clinical, Laboratory, Manufacturing, Pharmacovigilance, Distribution), and the P stands for Practice. For a biostatistician or statistical programmer, GxP is the framework that decides whether the analysis you ran on Tuesday can be trusted by an FDA reviewer two years later.

This piece keeps the survey of GLP, GMP, and 21 CFR Part 11 short, and spends most of its weight on Good Clinical Practice, which is the GxP that statistical programming actually lives inside. Examples are given in SAS, R, and Python because submission shops increasingly run on all three.

The GxP family

GxPDomainPrimary regulator referencesProgrammer relevance
GLPNon-clinical / preclinical labsFDA 21 CFR Part 58, OECD PrinciplesLow to medium (tox, PK datasets)
GCPClinical trials in humansICH E6(R3), FDA 21 CFR 312, EMA Annex IHigh (SDTM, ADaM, TLF, eCTD m5)
GMPDrug substance and product manufacturingFDA 21 CFR 210/211, EU GMP Vol. 4Low (CMC stability tables, batch data)
GVPPost-market safety and pharmacovigilanceEMA GVP Modules, FDA 21 CFR 314.80Medium (PSUR, DSUR, signal datasets)
GDP / GSPDistribution and storageEU GDP 2013/C 343/01, WHO TRS 957Low

Table 1. The GxP family at a glance, with the regulator references a programmer is most likely to encounter on a global submission.

Good Laboratory Practice

GLP governs how non-clinical safety studies (tox, carcinogenicity, reproductive, safety pharmacology) are planned, performed, monitored, recorded, archived and reported. It is the oldest formal GxP, born from the 1976 Industrial BioTest Labs scandal and codified in 21 CFR Part 58 in 1979. The OECD Principles of GLP are the international counterpart and are mutually accepted across most ICH regions.

For a programmer the GLP touchpoints are narrow but real. Datasets coming out of a GLP-compliant tox lab carry a Quality Assurance Unit (QAU) signature. If your team does PK/PD or tox-table programming on top of those data, the source data, derivation specifications, and final outputs need to be traceable to that QAU-released source. Mixing GLP source with non-GLP data in the same dataset without flagging it is a finding waiting to happen.

Good Manufacturing Practice

GMP applies to the drug substance and drug product itself, not the analyses around them. FDA codifies it in 21 CFR 210 and 211; the EU codifies it in EudraLex Volume 4. A statistical programmer rarely writes GMP-relevant code, but stability analysis, content uniformity, and batch-release decisions are statistical activities, and the data systems that hold them are GMP-validated. If you are pulled into CMC, expect tighter change control, a formal validated environment, and a different SOP set than the one you use for clinical work.

21 CFR Part 11 and EU Annex 11

Part 11 is the FDA rule that defines when an electronic record or electronic signature is acceptable as the equivalent of paper. EU Annex 11 covers the same ground for computerised systems used in any GxP activity in the EU. Part 11 is the rule that makes a SAS or R script production-grade, because it is what forces validated environments, audit trails, role-based access, and qualified backups.

Part 11 / Annex 11 areaWhat the rule wantsWhat it looks like in a programming shop
ValidationDocumented evidence the system does what it shouldIQ/OQ/PQ for SAS, R, Python build; URS, FS, Test Cases under change control
Audit trailIndependent, computer-generated, time-stamped record of changesVersioned repository (Git, SVN), JOBSCAN logs, RStudio Workbench audit logs
Access controlAuthority checks; unique user IDsAD groups for SDTM/ADaM/TLF folders; locked production after database lock
E-signaturesIdentification of signer, meaning, link to recordSign-off in the eTMF or DocuSign on SAP, ADaM specs, validation reports
Copies of recordsAccurate, complete copies for inspectionReproducible builds; archived program + log + lst + dataset for every run

Table 2. How Part 11 / Annex 11 expectations land on the day-to-day work of a statistical programming team.

Good Clinical Practice — the programmer's core competency

GCP is an international ethical and scientific quality standard for designing, conducting, recording and reporting trials that involve human subjects. The current global anchor is ICH E6(R3), adopted by ICH in January 2025. FDA implemented the previous revision (R2) through 21 CFR 312 and the 2018 guidance, and is moving toward harmonisation with R3. EMA implements GCP through Directive 2001/20/EC and the Clinical Trials Regulation (EU) No 536/2014, with the EMA/INS/GCP inspection procedures supplying enforcement detail.

ICH E6 is not the only guideline that matters for a programmer. The bundle that actually drives day-to-day work is the table below.

GuidelineSubjectWhy a programmer cares
ICH E6(R3)Good Clinical PracticeDefines the quality system, sponsor responsibilities, computerised system requirements (Annex 1)
ICH E8(R1)General Considerations for Clinical StudiesQuality by Design, critical-to-quality factors that flow into the SAP
ICH E9Statistical Principles for Clinical TrialsDefines analysis populations, multiplicity, missing data, interim analyses
ICH E9(R1)Estimands and Sensitivity AnalysesFive-attribute estimand framework; intercurrent event strategies in the SAP and ADaM
ICH E3Structure and Content of Clinical Study ReportsSection 14 tables, listings; annex 16 patient data listings
ICH E2A / E2B(R3)Safety reporting and ICSRSAE/SUSAR datasets; CIOMS, E2B(R3) XML
CDISC SDTM, ADaM, Define-XMLData standards required by FDA, PMDA; recommended by EMAConformance is the mechanical face of GCP for the programmer

Table 3. ICH guidelines and CDISC standards a stat programmer touches on a typical NDA, BLA, or MAA.

What changed with ICH E6(R3)

R3 is the first GCP revision built around fit-for-purpose quality and computerised systems rather than the paper-trial model E6(R2) inherited. Three shifts matter for programmers. First, Annex 1 (interventional trials) carries an explicit set of expectations for computerised systems used to generate or hold trial data, formalising what most sponsors had already pulled from EMA's Notice to Sponsors and FDA's 2007 Computerised Systems guidance. Second, the principle of proportionate quality means a SAP risk assessment is no longer an internal nicety, it is the basis for what gets validated and how heavily. Third, sponsor oversight of vendors (CRO, EDC, central lab, IRT, ePRO) is sharpened, which directly affects the audit trails and dataset provenance a programmer has to be able to defend.

What is programmer's role in the GCP lifecycle

The flow below is the practical version of what E6(R3) calls the data lifecycle. Every box has a GCP control around it. If you can name the SOP, the validated tool, the approver, and the audit trail for each box on your own study, you are GCP-ready.

StageProgrammer activityGCP / regulatory hook
Protocol & SAPReview estimands, analysis populations, planned tablesICH E8, E9, E9(R1); SAP signed before database lock
CRF / EDC buildReview of CRF annotations, edit checks, controlled terminologyE6(R3) Annex 1; CDASH; SDTM IG
Source data → SDTMSDTM mapping, define-xml, reviewer's guide (cSDRG)FDA Study Data Technical Conformance Guide; PMDA notification
SDTM → ADaMADaM build, ADaM IG conformance, validation logsADaM IG v1.3+; CDISC ADaM Validation Checks
ADaM → TLFProduction + QC of section 14 tables, listings, figuresICH E3; SAP traceability
DBL → LockFinal run, freeze, archival; signatures on outputs21 CFR 11; Annex 11; E6(R3) record retention
Submission (eCTD m5)Transport datasets (XPT/Dataset-JSON), ADRG, define-xmlFDA TCG, EMA eCTD EU M1, PMDA validator rules

Figure 1. The clinical data lifecycle viewed from the programmer's chair, with the GCP and regional regulatory hook for each stage.

Practical compliance: what GCP looks like in code

Three habits separate a GCP-ready programming team from a team that will fail a sponsor audit. The first is independent verification, usually called double programming. The second is full traceability, source data through to displayed result. The third is a clean, reviewable log for every production run.

Independent verification (double programming)

ICH E6 does not literally say "double programme your tables", but it requires that data are recorded, handled and stored in a way that allows accurate reporting, interpretation and verification. Sponsors operationalise this through a producer/QC workflow on every key derivation and every primary efficacy / safety output. In SAS the comparison is conventionally PROC COMPARE; in R it is most often diffdf or arsenal::compare; in Python it is pandas.testing.assert_frame_equal.

/* SAS: production vs QC ADaM compare for ADSL */proc compare base = prod.adslcompare = qc.adslout = work.adsl_diff outnoequal outbase outcomplistall criterion = 1e-8;id usubjid;run;/* Read the return code; fail the job if not equal */%if &sysinfo ne 0 %then %do;%put ERROR: ADSL prod vs QC mismatch, sysinfo=&sysinfo;endsas;%end;

R and Python equivalents

# R: diffdf for ADaM QClibrary(haven); library(diffdf)prod <- read_xpt("prod/adsl.xpt")qc <- read_xpt("qc/adsl.xpt")diff <- diffdf(prod, qc, keys = "USUBJID", strict_numeric = TRUE)if (diffdf::diffdf_has_issues(diff)) stop("ADSL QC failed")# Python: pandas-based ADaM compareimport pandas as pd, pyreadstat, sysprod, _ = pyreadstat.read_xport("prod/adsl.xpt")qc, _ = pyreadstat.read_xport("qc/adsl.xpt")prod = prod.sort_values("USUBJID").reset_index(drop=True)qc = qc.sort_values("USUBJID").reset_index(drop=True)try:pd.testing.assert_frame_equal(prod, qc, check_exact=False, atol=1e-8)except AssertionError as e:sys.exit(f"ADSL QC failed: {e}")

Traceability and the audit trail

Define-XML is the artefact that carries traceability into the submission. Inside the team, the same idea has to live in the program headers, the spec, and the reviewer's guide. A compliant header tells an inspector who wrote the program, when, against which spec version, and what input data it consumed. Anything the inspector cannot reconstruct from your archive is, for GCP purposes, not reproducible.

/*--------------------------------------------------------------------Program : t_ae_soc_pt.sasPurpose : Table 14.3.1.1 AEs by SOC and PT, Safety populationSpec : SAP v3.0 dated 2026-02-14, Table shell T-14.3.1.1 v2Inputs : ADaM.ADSL, ADaM.ADAE (locked snapshot 2026-04-10)Outputs : t_ae_soc_pt.rtf, t_ae_soc_pt.lstAuthor : V. Doth (programmer), A. Reviewer (QC)History : 1.0 2026-02-20 First production run1.1 2026-04-12 Updated PT MedDRA v27.0 dictionary--------------------------------------------------------------------*/

Log review and the "clean log" rule

An auditor will not read your code first, they will read your log. A production run that emits WARNING, ERROR, uninitialized, or NOTE: MERGE statement has more than one... is, by sponsor SOP, a failed run. Most shops automate a log scrubber that fails the job if forbidden strings are found, and stores the scrubbed log alongside the dataset and the lst output.

# Python log scrubber, runs at the end of every batchimport re, sys, pathlibFORBIDDEN = [r"^ERROR", r"^WARNING", r"uninitialized",r"more than one", r"converted from char",r"Invalid (data|numeric|argument)"]rx = re.compile("|".join(FORBIDDEN), re.I | re.M)fails = []for log in pathlib.Path("prod/logs").glob("*.log"):text = log.read_text(errors='ignore')if rx.search(text):fails.append(log.name)if fails:sys.exit(f"Log scrubber failed for: {fails}")

Inspection readiness for the programming team

An FDA BIMO inspection or an EMA GCP inspection rarely starts in the statistics function, but if the inspector has questions about how a primary efficacy result was derived, statistics is where they end. Three artefacts are normally requested: the locked SAP, the reviewer's guides (cSDRG and ADRG), and the program-plus-log archive that produced the result. If those three line up, the conversation is short.

Common findings the programming team owns

Finding categoryWhat it usually looks like in practice
SAP / ADaM divergenceADaM derives a parameter (e.g. baseline definition) differently from the SAP without a documented amendment
Untraceable derivationADaM variable has no source mapping in the spec or define-xml; reviewer cannot trace it back to SDTM
Unresolved log warningsProduction run archived with WARNING messages; no documented rationale
Late or missing QCPrimary table not independently programmed, or QC done after sign-off
Version driftSubmission dataset built with a different SAS / R / Python version than was qualified
Open production environmentProgrammers retain write access to the production folder after database lock
Missing 21 CFR 11 controlsNo audit trail on the validated environment; shared user accounts

Table 4. The findings that, in the author's submission experience across NDA, BLA, and MAA filings, recur most often in the statistical programming function.

Closing the loop

GxP is not paperwork bolted onto the analysis after the fact. It is the set of habits that make the analysis defensible: a signed SAP before the lock, an ADaM that traces cleanly to SDTM, a production run with a clean log, an independent QC that compared equal, and an archive an auditor can open three years from now and re-run. Build the habits into the project plan and the inspection looks after itself.

A pre-lock checklist for the programming team

The questions below are the ones a sponsor lead programmer should be able to answer yes to on the morning of database lock. They are stitched together from the GCP, Part 11, and CDISC requirements covered above, and from the recurring findings in Table 4.

#QuestionAnchor
1Is the SAP signed and dated, with all amendments captured before lock?ICH E9, E6(R3) §5
2Does every ADaM variable trace cleanly to SDTM via the spec and define-xml?ADaM IG, Define-XML 2.1
3Has every primary and key secondary table been independently QC'd to PROC COMPARE / diffdf / assert_frame_equal pass?Sponsor SOP, E6(R3) Annex 1
4Is the production environment locked to read-only, with named approvers on file?21 CFR 11, EU Annex 11
5Are program, log, lst, and dataset archived together with the validated tool versions?21 CFR 11.10(c), GCP record retention
6Is the cSDRG / ADRG aligned with the as-built submission package?FDA TCG, PMDA validator

Table 5. Six questions to clear before any GCP-regulated database lock.

References:

ICH E6(R3) Good Clinical Practice (Step 4, January 2025); ICH E8(R1) General Considerations for Clinical Studies (2021); ICH E9 Statistical Principles for Clinical Trials and ICH E9(R1) Estimands; FDA 21 CFR Parts 11, 50, 54, 56, 312, 314; FDA Study Data Technical Conformance Guide (current version); EU Clinical Trials Regulation (EU) No 536/2014; EMA Annex 11 to EU GMP; CDISC SDTM IG, ADaM IG, Define-XML 2.1, Dataset-JSON 1.1; PMDA Notification 0427001 on Electronic Study Data Submission. For working examples and CDISC tooling, see clinstandards.org.

Find this article useful?

Discussion (0)

No comments yet. Be the first!

GxP Guidelines in Pharma and Biotech | Clinical Standards Hub