Skip to content

Annotation Versioning & Entity Model Design

Purpose: Captures design refinements from the annotation versioning brainstorming session (Feb 2026). This document refines design-decisions.md — where they conflict, this document takes precedence and design-decisions.md must be updated to align.

Parent: Annotation Management & Reconciliation


What This Document Covers

Three areas where brainstorming refined the prior design:

  1. Entity lifecycle patterns — DraftAQ vs AQ, property classification, pending buffers
  2. Annotation and session versioning — auto-save, AV/ASV mechanics, cross-stage behavior
  3. MongoDB aggregate design — collection boundaries, consistency model

1. Entity Lifecycle Patterns

1.1 DraftAQ and AQ Are Separate Types

design-decisions.md treats AQ as a single entity with append-only AQVersions. This conflates two domain operations with different invariants:

Operation What's mutable Impact assessment needed?
Initial creation (building from scratch) Everything — dataType, parent, groupAsSingle, text, options No — nothing depends on this entity
Content change (modifying an activated AQ) Only content properties — text, options, helpText, answerFilters Yes — annotations may exist

This distinction justifies separate types:

DraftAQ (factory — lives on Project aggregate)
  draftId: GUID (temporary)
  dataType: string (mutable)
  parentId: GUID? (mutable)
  groupAsSingle: boolean (mutable)
  text: string (mutable)
  options: [] (mutable)
  helpText: string? (mutable)
  answerFilters: [] (mutable)
  Everything is mutable. No versions. No impact.

     ──── PA clicks "Publish" / activates stage ────>

AQ (committed entity — lives on Project aggregate, or pmAnnotationQuestion collection)
  aqId: GUID (permanent, assigned at publish — aggregate root ID)
  dataType: string (FROZEN)
  parentId: GUID? (FROZEN)
  groupAsSingle: boolean (FROZEN)
  currentVersionNumber: int (pointer to latest committed AQV)
  pendingChanges: {text, options, helpText, answerFilters} | null
  aqvs: AQV[] (immutable version history)

AQV (first version created automatically at publish time)
  versionNumber: int (sequential: 1, 2, 3, ... Identity is (aqId, versionNumber))
  content: {text, options, helpText, answerFilters}
  createdAt: timestamp
  createdBy: GUID
  reason: string?

The DraftAQ ceases to exist once published. The AQ is born with its first AQV.

1.2 Property Classification

Properties split into three categories:

Category Properties Rule
Identity (defines what this question IS) aqId, dataType, parentId, groupAsSingle Set at creation/publish, immutable forever
Content (describes the question) text, options, helpText, answerFilters Versionable via AQV. PA edits through pendingChanges buffer, commits to create new AQV.
Derived (computed from context) currentVersionNumber, aqvs[] Managed by the system

Why parent is identity, not versionable: Changing which question this is a child of restructures entity subtrees. If Q3 moves from being under Q1 (cohort label) to Q2 (outcome label), the annotation form shape changes entirely between versions. Existing annotations for Q3 have parentId pointing to Q1's annotation — those tree references become wrong. It's effectively a different question. Create a new AQ instead.

Why dataType is identity: Changing from boolean to string invalidates all existing AVs. A boolean true cannot meaningfully become a string. Create a new AQ instead.

Contradiction with design-decisions.md: Section 1 includes AnswerType and ParentQuestionId in AQVersion (lines 73-74), implying they can change between versions. product-overview.md already correctly classifies these as structural/identity properties (lines 95-109). design-decisions.md must be updated to move these to the AQ identity level.

1.3 SQS Does Not Need a Draft Type

Unlike AQ, SQS (Stage Question Set) has no identity properties that are mutable during creation but frozen after. Its only "content" is questionIds — the set of AQVersionIds assigned to a stage. This is the same shape whether it's the first assignment or a modification.

SQS (per stage — lives on Project aggregate)
  stageId: GUID
  currentVersionNumber: int (pointer to latest committed SQSV)
  pendingChanges: AQVersionRef[] | null (buffered question version changes)
  sqsvs: SQSV[] (immutable version history)

SQSV
  versionNumber: int (sequential. Identity is (stageId, versionNumber))
  aqVersionRefs: OrderedList<AQVersionRef> (each ref is (questionId, versionNumber))
  createdAt: timestamp
  createdBy: GUID
  impactChoices: { [questionId]: "requireReanswer" | "autoUpdate" | "doNothing" }

1.4 Terminology: Draft vs Pending

Term Meaning Used on
Draft Under construction, not yet published/committed. All properties mutable. Separate type from the committed entity. DraftAQ
Pending Buffered changes to an existing committed entity. Awaiting explicit commit action. AQ.pendingChanges, SQS.pendingChanges, Annotation.pendingAnswer

This avoids confusion: "draft" always means "not yet real." "Pending" always means "real entity with uncommitted edits."


2. Annotation and Session Versioning

2.1 The Annotation Entity

The Annotation is a stable identity entity that owns its version history. It maps directly to the current Annotation class — the existing Annotation.Id (GUID) becomes the permanent annotationId.

Annotation (own aggregate — pmAnnotation collection)
  annotationId: GUID (stable, aggregate root ID, = current Annotation.Id for migration)
  questionId: GUID (which AQ this answers)
  studyId: GUID
  annotatorId: GUID | null (null for reconciliation annotations)
  reconciled: boolean
  createdInStageId: GUID (metadata — where first created)
  parentId: GUID? (entity subtree parent — references another annotationId)
  children: GUID[] (entity subtree children — annotationIds)
  root: boolean
  answerType: string
  pendingAnswer: {value, notes} | null (mutable, auto-saved to server)
  currentVersionNumber: int (pointer to latest committed AV)
  crossStageConsistency: {pacValid, staleSince?, reason?} | null (D51 — materialised by domain event)
  avs: AV[] (embedded, immutable version history)

Key properties:

  • annotationId is stable and never changes. Entity subtree references (parentId, children) use annotationId, so they survive versioning.
  • pendingAnswer is the mutable auto-save buffer. Updated on server with each auto-save. Cleared on Save/Complete/Revert.
  • AVs are embedded in the Annotation document — the Annotation and its AVs are the same aggregate.

2.2 Annotation Version (AV)

AV (immutable, embedded in Annotation document)
  versionNumber: int (sequential. Identity is (annotationId, versionNumber))
  value: polymorphic answer (string | bool | int | decimal | arrays)
  notes: string?
  createdAt: timestamp
  createdByAction: "save" | "complete" | "impact-update"
  committedBy: GUID (who committed — important for reconciliation)
  stageId: GUID (which stage this was committed from)
  sessionVersionRef: (sessionId, versionNumber) (which session version created this)
  aqVersionRef: (questionId, versionNumber) (which question version was active when answered)

2.3 Annotation Session

The session is a mutable working entity. It does not need a "draft" wrapper because it is never shared or published — it is always one reviewer's personal workspace for one study in one stage.

AnnotationSession (own aggregate — pmAnnotationSession collection)
  sessionId: GUID (aggregate root ID)
  studyId: GUID
  stageId: GUID
  investigatorId: GUID
  reconciliation: boolean
  status: Incomplete | Complete
  annotationIds: GUID[] (write-time materialisation of working set — D52)
  currentVersionNumber: int? (latest committed snapshot)
  asvs: ASV[] (embedded, immutable version history)

2.4 Annotation Session Version (ASV)

ASV (immutable, embedded in AnnotationSession document)
  versionNumber: int (sequential. Identity is (sessionId, versionNumber))
  annotationAVMap: { [annotationId]: versionNumber } (pinned AV version number for each annotation at save time)
  createdAt: timestamp
  createdByAction: "save" | "complete"
  qsvRef: (questionSetId, versionNumber) (which question set version was active)

2.5 Session Lifecycle

Action What happens Versions created
Open study Session created (or resumed) with status: Incomplete. Annotations loaded or created. None
Auto-save pendingAnswer updated on individual Annotation documents. Server-side, per-field. None
Save AV created on each annotation from pendingAnswer, pendingAnswer cleared, currentAVId updated. ASV created on session with snapshot of all {annotationId: avId} pairs. AV per annotation + 1 ASV
Complete Same as Save + status = Complete. AV per annotation + 1 ASV
Revert Discard pendingAnswer on all session annotations. Restore from last ASV's annotationAVMap. None
Clear Set pendingAnswer = null on the annotation. Remove from session's annotationIds. None

2.6 Cross-Stage Annotation Behavior

Annotations are study-scoped, not stage-scoped. When a shared question appears in multiple stages, both stages' sessions reference the same Annotation entity.

Scenario: AQ-shared is assigned to Stage 1 and Stage 2. Reviewer A answers it in Stage 1.

  1. Stage 1 session opens → Annotation ann-1 created for AQ-shared, createdInStageId: Stage 1
  2. Reviewer answers → pendingAnswer set
  3. Reviewer saves Stage 1 → AV version 1 created, ann-1.currentVersionNumber = 1. Stage 1 ASV records { ann-1: 1 }
  4. Stage 2 session opens → ann-1 included in Stage 2 session's annotationIds (same annotation, shared reference)
  5. Reviewer sees version 1's value (via ann-1.currentVersionNumber)
  6. Reviewer edits → pendingAnswer updated on ann-1
  7. Reviewer saves Stage 2 → AV version 2 created on the same ann-1, ann-1.currentVersionNumber = 2. Stage 2 ASV records { ann-1: 2 }

What Stage 1 sees after this: - If the Stage 1 session is Completed: the ASV is frozen at { ann-1: 1 }. Viewing this completed session always shows the answer as it was at completion time. - If the Stage 1 session is Incomplete: the reviewer will see currentVersionNumber = 2 (the latest answer, updated from Stage 2) when they reopen. An impact indicator can show "This answer was updated in Stage 2 since your last save." Their last ASV still has { ann-1: 1 } for revert.

Key: No forking, no copying. One annotation identity, multiple AVs. Sessions pin to specific AVs via ASV.

2.7 Reconciliation Annotations

Reconciliation annotations differ from candidate annotations in one critical way: they are shared across stages and reviewers.

Candidate Annotation Reconciliation Annotation
Identity scope {studyId, questionId, annotatorId} — personal {studyId, questionId} — shared
Who contributes One reviewer Any reconciler via RAVs
Shared across stages Yes (same reviewer's answer) Yes (the gold standard)
Shared across reviewers No Yes
annotatorId Set (owner) null (authorship tracked on AVs)

Each AV on a reconciliation annotation records committedBy (which reconciler) and stageId (which stage's reconciliation produced it). The gold standard is always currentAVId.

This aligns with the ReconciliationSession model in design-decisions.md Section 2, but uses the AV mechanism rather than a separate ReconciliationAnswer entity. The ReconciliationSessionVersion's Answers map becomes the ASV's annotationAVMap, with each reconciliation annotation's latest AV being the authoritative answer.


3. Impact Assessment on Commit

There are no "immutable structural properties" that prevent editing. Instead, there is a graduated impact scale assessed at commit time.

3.1 AQ Impact Assessment (on pendingChanges commit)

When PA commits pendingChanges to create a new AQV:

Scenario Impact
No SQSV references this AQ Commit freely. No dependencies.
SQSV exists but no annotations Commit freely. Structure can change without breaking anything.
Annotations exist, only text/helpText changed Low — existing AVs still valid. Options: "auto-update" or "do nothing."
Annotations exist, options changed and existing AVs reference removed options Medium — affected AVs may need reanswer.

3.2 SQS Impact Assessment (on pendingChanges commit)

When PA commits SQS pendingChanges to create a new SQSV:

Change Impact
Questions added No impact on existing sessions. New questions appear unanswered.
Questions removed Sessions with answers for removed questions: PA chooses (archive, flag, require review).
Question version upgraded PA chooses per question: "require reanswer", "auto-update", "do nothing".

3.3 PA Workflow: Design Tab + Assign Tab

AQV and SQSV commits are two separate processes:

  1. Design tab: PA edits question content → commits pendingChanges → new AQV created. Impact assessment covers annotations for this specific question.
  2. Assign tab: PA changes stage question assignments → commits pendingChanges → new SQSV created. Impact assessment covers all active sessions in the stage. PA must confirm impact choices before commit.

4. MongoDB Aggregate Design

4.1 Collection Boundaries

Collection Aggregate root Contents Bounded?
pmStudy Study Study metadata only — no back-references to annotations or sessions (D50-revised) Yes
pmAnnotation Annotation Annotation identity + pendingAnswer + crossStageConsistency (D51) + embedded avs: AV[] Per doc, grows with saves
pmAnnotationSession Session Session metadata + annotationIds (write-time materialisation, D52) + embedded asvs: ASV[] Per doc, grows with saves
pmProject Project DraftAQs, AQs (or refs), SQS config Yes

Contradiction with design-decisions.md: Section 1 says "All changes are embedded within the existing pmProject and pmStudy aggregates. No new collections are created." and README.md says "No New Collections". Our design creates pmAnnotation and pmAnnotationSession as separate collections. This is necessary because:

  • The Study document becomes a contention hotspot with multiple reviewers auto-saving concurrently
  • Document grows unboundedly with annotation version history
  • Loading study metadata doesn't need all annotation version history
  • Annotation and its AVs form a natural aggregate (identity + versions must be consistent)
  • Session and its ASVs form a natural aggregate (same reasoning)

Further refinement (D50-revised): The original D50 proposed Study hold reference arrays (sessionIds, annotationIds). This is removed because: (a) back-references add write amplification on every annotation/session creation — exactly the contention D41/D42 were designed to avoid; (b) consistency risk if the $push fails; © annotations and sessions carry studyId and are indexed, so queries by StudyId replace the reference arrays.

4.2 Consistency Model

Operation Collections touched Consistency
Auto-save Update pendingAnswer on N pmAnnotation docs N independent single-doc writes (each atomic)
Save/Complete Create AV + update currentVersionNumber on N pmAnnotation docs, create ASV on pmAnnotationSession doc Multi-document transaction (N+1 docs)
Revert Update pendingAnswer on N pmAnnotation docs N independent single-doc writes
Load session Read 1 pmAnnotationSession + find({ _id: { $in: annotationIds } }) on pmAnnotation Two reads
Find study's sessions pmAnnotationSession.find({ studyId }) Single indexed query (replaces Study back-reference)
Find study's annotations pmAnnotation.find({ studyId }) Single indexed query (prefix on composite index)

Save/Complete uses MongoDB multi-document transactions. This is acceptable because: - Save is infrequent (explicit user action, not auto-save) - N = number of questions answered (tens, not thousands) - MongoDB Atlas supports multi-document transactions - Each AV creation is idempotent (immutable), so partial failure is recoverable

Auto-save is single-document writes only (no transaction overhead).

4.3 Why AVs Are Embedded in Annotation (Not Separate Collection)

An Annotation and its AVs are the same aggregate. currentVersionNumber must be consistent with the AV list. Embedding means: - Creating an AV + updating currentVersionNumber is a single atomic write (no transaction needed for this part) - Version history is a single document read - No orphaned AVs possible


5. Contradictions Resolved

Source Contradiction Resolution
design-decisions.md S1 AnswerType and ParentQuestionId in AQVersion (versionable) Move to AQ identity level (immutable after publish). See Section 1.2.
design-decisions.md S1 No DraftAQ concept Add DraftAQ as separate type. See Section 1.1.
design-decisions.md S1 No pendingChanges / pendingAnswer concept Add pending buffers for auto-save. See Section 1.4.
design-decisions.md S1 "No new collections are created" Create pmAnnotation and pmAnnotationSession. See Section 4.1.
design-decisions.md S1 Annotation identity is composite (StudyId, AnnotatorId, QuestionId) Still true for candidate annotations. Reconciliation annotations: (StudyId, QuestionId) with annotatorId: null. See Section 2.7.
design-decisions.md S2 ReconciliationAnswer as separate entity with Resolution enum Reconciliation answers are AVs on reconciliation Annotations. committedBy and stageId on AV replace separate ReconciliationAnswer. Resolution tracking moves to session-level metadata.
README.md "All data stays embedded in existing aggregates" Annotations and sessions move to own collections. Study holds references.
product-overview.md No auto-save mechanics described Server-side auto-save via pendingAnswer. See Section 2.5.
D50 (this document) Study document holds reference arrays (sessionIds, annotationIds) Revised by D50-revised: Study holds no back-references. Annotations and sessions carry studyId and are indexed. Back-references add write amplification and consistency risk.
D41-D50 (this document) Version entities use independent GUIDs (avId, asvId, aqvId, sqsvId) Revised by D57: versions use composite identity (rootId, versionNumber). Cross-aggregate references use composite identity Value Objects. Follows DDD aggregate boundary principles.

6. Updated Decision Log

Decisions from this brainstorming session, extending the D1-D36 log in design-decisions.md:

# Decision Rationale
D37 DraftAQ is a separate type from AQ Creation phase has fundamentally different invariants (all properties mutable) vs modification phase (identity frozen, only content versionable). Conditional logic signals two domain concepts forced into one model.
D38 dataType, parentId, groupAsSingle are identity properties — immutable after publish Changing parent restructures entity subtrees and form shape. Changing dataType invalidates all AVs. These are not "new versions of the same question" — they are different questions.
D39 SQS does not need a Draft type SQS has no identity properties. Its only content (questionIds) is the same shape for initial assignment and subsequent modification. pendingChanges is sufficient.
D40 "Pending" terminology for existing entities, "Draft" for under-construction Avoids confusion. "Draft" = not yet real. "Pending" = real entity with uncommitted edits.
D41 Annotations are own aggregate in pmAnnotation collection Avoids Study document contention and unbounded growth. Annotation + embedded AVs form natural aggregate boundary.
D42 Sessions are own aggregate in pmAnnotationSession collection Same reasoning as D41. Session + embedded ASVs form natural aggregate boundary.
D43 AVs embedded in Annotation document (not separate collection) Annotation and its AVs must be consistent (currentAVId must match). Embedding gives single-doc atomicity for AV creation.
D44 Cross-stage annotation sharing via same Annotation entity with multiple AVs No forking, no copying. Both stages' sessions reference the same annotationId. Editing creates a new AV; other stages' ASVs pin to the older AV.
D45 Reconciliation annotations have annotatorId: null and track authorship on AVs Reconciliation annotations are shared across stages AND reviewers. The gold standard has no single owner — authorship tracked per-AV via committedBy.
D46 Server-side auto-save via pendingAnswer field Users change machines/browsers. Client-only auto-save (IndexedDB) is insufficient. pendingAnswer is a mutable field on the Annotation document, updated with each auto-save.
D47 Save/Complete operations use MongoDB multi-document transactions Save touches N annotation docs + 1 session doc. Transactions ensure consistency. Acceptable overhead for infrequent explicit user actions.
D48 Graduated impact assessment at commit time, not immutable property restrictions No property is "forbidden to edit" — the system assesses impact and warns proportionally. A dataType change with annotations is allowed, it just requires creating a new AQ (because it IS a new question).
D49 entityInstanceId concept removed — annotationId serves as stable entity identity The existing Annotation.Id already provides stable identity for entity subtree references. No new concept needed. Migration is direct: current Id becomes annotationId.
D50 Study document holds only references (sessionIds, annotationIds) Revised by D50-revised. Original reasoning: Study slimmed to metadata + reference arrays.
D50-revised Study document holds no back-references to annotations or sessions Back-references add write amplification on every annotation/session creation and consistency risk if $push fails. Annotations and sessions carry studyId and are indexed; queries by StudyId replace the reference arrays. Eliminates the contention D41/D42 were designed to avoid.
D51 Cross-stage PAC consistency materialised via domain events When a parent AV is committed, a handler evaluates PAC for child annotations in other stages. crossStageConsistency field on child Annotation is a denormalized projection — non-authoritative, rebuildable. Materialised because read paths (reconciliation, admin, session review) need this frequently.
D52 Session retains annotationIds as write-time materialisation Annotation-session relationship is many-to-many (one annotation in multiple sessions across stages); annotation side cannot hold the reference. Populated once at session creation or QSV transition. Justified as hot-path optimisation for form load latency.
D53 Cross-stage PAC warning shown immediately on parent answer change Annotator has mental context of "I'm changing Q-parent from X to Y" at the moment of change. Non-blocking inline indicator (info icon) with clickable modal showing affected stages, descendant tree, PAC evaluation. Waiting until save/complete loses that context.
D54 Reconciler must acknowledge stale annotations before submitting Stale annotations flagged prominently. On save/submit, summary of affected annotations requires explicit acknowledgement. Reconciler may suppress future prompts for that session ("ignore future warnings"), but inline indicators remain. Balances rigour with workflow efficiency.
D55 Cross-stage PAC enforcement configurable per project Three levels: flag (default — indicators + acknowledgement), block-reconciliation (individual review required), inform-only (indicators only). Different SR protocols have different rigour requirements; sensible default with customisability.
D56 Update design documents to incorporate D50-revised through D57 Ensures all decisions, rationale, and alternatives captured consistently across README.md and design-session.md.
D57 Version entities use composite identity (rootId, versionNumber) not independent GUIDs Follows DDD: non-root entities identified relative to aggregate root — same pattern as event sourcing (streamId, sequenceNumber). Cross-aggregate references use composite identity Value Objects (AQVersionRef, QSVersionRef, AVRef). Reduces structural coupling. GUIDs can be generated deterministically at bounded-context boundaries if needed.