# Spec-to-Code skills.md Sample

Use this file to describe study-specific source data conventions, raw/CDASH formats, sponsor macros, and programming assumptions. Paste the relevant content into the Spec-to-Code Generator "skills.md raw/CDASH guidance" box before generating code.

## Source Libraries

- RAW contains source CRF/raw datasets.
- SDTM contains previously built SDTM domains.
- ADAM contains previously built ADaM datasets.
- Use placeholder LIBNAME paths if physical paths are unknown.

## Subject Identifiers

- RAW subject identifier is `SUBJID`.
- SDTM `USUBJID` should be derived as `catx("-", STUDYID, SUBJID)`.
- Site identifier is `SITEID`.

## Date and Time Formats

- RAW date variables ending in `DAT` are character dates in `DD-MMM-YYYY` format.
- RAW datetime variables ending in `DTC` are ISO 8601 character strings when already standardized.
- Partial dates may appear as `YYYY`, `YYYY-MM`, or `YYYY-MM-DD`.
- For SDTM character timing variables, output ISO 8601 strings.
- For ADaM numeric date variables, convert to SAS dates and apply `DATE9.` format.

## Common RAW/CDASH Date Examples

- `RAW.DM.BRTHDAT`: character, may be partial.
- `RAW.AE.AESTDAT`: character date in `DD-MMM-YYYY`.
- `RAW.AE.AEENDAT`: character date in `DD-MMM-YYYY`, may be missing for ongoing events.
- `RAW.EX.EXSTDAT`: character date in `DD-MMM-YYYY`.
- `RAW.EX.EXENDAT`: character date in `DD-MMM-YYYY`.

## Controlled Terminology

- Sex values should map to CDISC CT: `M`, `F`, `U`, `UNDIFFERENTIATED`.
- Race and ethnicity should map to CDISC controlled terminology.
- AE severity should map to `MILD`, `MODERATE`, `SEVERE`.
- Seriousness flags should be `Y` or null.

## SAS Programming Preferences

- Include explicit `LIBNAME` statements for every source/output library.
- Do not use `RAW.DM.VARIABLE` as right-hand side DATA step syntax.
- Use `SET raw.dm;` or `MERGE raw.dm raw.ex; by USUBJID;`.
- Use `PROC SQL` joins for multi-source derivations where keys are clearer.
- Use `ATTRIB` or `LABEL` statements for all target labels.
- Use `LENGTH` before variables are assigned.
- Use comments for major derivation blocks.

## Study Day Rules

- Study day is derived relative to treatment start date.
- There is no day 0.
- If analysis date is on or after treatment start: `ADY = ADT - TRTSDT + 1`.
- If analysis date is before treatment start: `ADY = ADT - TRTSDT`.

## Baseline Rules

- Baseline is the last non-missing value before first treatment date unless SAP says otherwise.
- If multiple records occur on the same baseline date, use the latest timepoint if time is available.
- Set `BASETYPE = "LAST NON-MISSING PRE-DOSE"` for this convention.

## Visit Rules

- Use collected visit when it maps directly to an analysis visit.
- For unscheduled visits, retain the collected visit name unless SAP-defined windowing assigns it to a scheduled analysis visit.

## Example Custom Validation Rules

Paste these into the "Custom validation checks" box if you want deterministic checks:

```txt
contains: libname raw
contains: attrib
not_contains: raw.dm.subjid
not_contains: raw.ae.aeterm
regex: \bdata\s+sdtm\.dm\b
```
