Annotation Versioning¶
The foundational Identity + Immutable Versions pattern that makes annotation questions, annotations, and annotation sessions version-tracked and audit-safe.
Overview¶
Feature Name: Annotation Versioning Target Users: Project Administrators, Annotators, Reconcilers, Data Consumers Business Value: Full audit traceability from question creation through annotation to reconciliation, enabling safe mid-project question evolution and trustworthy systematic review data Phase: Foundation (Phases 2-4 of the Annotation Management & Reconciliation initiative)
Annotation Versioning introduces an immutable versioning system across all core entities in SyRF's annotation pipeline: annotation questions, annotations, and annotation sessions. Every entity follows the same pattern -- a stable identity with append-only immutable version snapshots. When content changes, a new version is created; previous versions are preserved and remain linked to the data collected against them.
This is the foundational layer that all other features in the Annotation Management & Reconciliation initiative depend on. Question management needs versioning to allow safe mid-project edits. The annotation form needs versioning to record which question version each answer was collected against. Reconciliation needs versioning to compare answers to the same question version and produce an auditable gold-standard record. Data export needs versioning to provide full traceability.
Without this pattern, none of the downstream capabilities are possible. With it, SyRF gains the ability to evolve annotation protocols without breaking past data, reconstruct the exact context of any annotation, and produce structured reconciliation records that journals, funders, and regulators can audit.
Problem Statement¶
SyRF's current annotation system treats questions, annotations, and sessions as mutable-in-place entities. When a project administrator edits an annotation question, the change overwrites the previous definition. Although SyRF snapshots the question text onto each annotation at creation time, this capture is incomplete -- it records only the text string, not the full question context (available options, answer type, help text, conditional logic, parent-child relationships).
Because of these risks, SyRF currently locks annotation questions entirely once they have been answered. This prevents silent data corruption but creates a different problem: projects cannot evolve their annotation protocol, even when improvements are clearly needed.
The mutable-in-place model also makes it impossible to:
- Reconstruct what the annotator saw when they recorded their answer (only partial text is preserved)
- Compare answers meaningfully because there is no guarantee two annotators answered the same version of a question
- Build structured reconciliation because the system cannot verify semantic equivalence between candidate answers
- Produce auditable exports because version references do not exist
This feature replaces the mutable model with an immutable versioning pattern that preserves complete history while allowing safe evolution.
Solution¶
All core entities adopt the Identity + Immutable Versions pattern. Each entity has a stable identity (a permanent ID and fixed structural properties) paired with an append-only list of immutable version snapshots. Content changes always create a new version rather than overwriting the previous one.
This pattern applies uniformly across four entity types:
| Entity | Identity (fixed) | Versions (append-only, identified by VersionNumber within parent) |
|---|---|---|
| Annotation Question (AQ) | aqId, dataType, parentId, groupAsSingle | AQVersion: text, options, helpText, answerFilters |
| Question Set (QS) | qsId, scope, ownerId | QSV: ordered list of AQVersionRefs |
| Annotation | studyId, annotatorId, questionId | AnnotationVersion: answer, aqVersionRef, stageId |
| Annotation Session | sessionId, stageId, investigatorId | SessionVersion: annotationAVMap, qsvRef, status |
Before activation, annotation questions exist as fully mutable DraftAQ entities with no version history. On activation, they convert to versioned AQ entities with their first AQVersion. This two-phase lifecycle separates the unconstrained design phase from the audited modification phase.
Server-side pending buffers (pendingChanges on AQ, pendingAnswer on Annotation) provide auto-save capability without creating versions. Versions are created only on explicit commit actions (Save, Complete), ensuring the version history captures deliberate decisions rather than intermediate typing.
Scope¶
In Scope¶
- DraftAQ entity and the DraftAQ-to-AQ activation lifecycle
- AQ and AQVersion entity model with identity/content/derived property classification
- QS and QSV entity model with parent integrity constraint
- Annotation and AnnotationVersion entity model with cross-stage sharing
- AnnotationSession and SessionVersion entity model with explicit AV pinning
- Reconciliation annotation identity model (annotatorId: null, per-AV authorship)
- Server-side pending buffers for auto-save (pendingChanges, pendingAnswer)
- New MongoDB collections: pmAnnotationQuestion, pmQuestionSet, pmAnnotation, pmAnnotationSession
- Scope and ownership fields (Scope, OwnerId, DerivedFrom) on AQ and QS entities
- Optimistic concurrency on Study document writes
- Migration strategy: existing questions backfilled as v1 AQVersions
Out of Scope / Future¶
- Question management UI (version history panel, diff viewer, versioning wizard) -- see Question Management feature
- Annotation form v2 (virtual scroll, per-question rendering) -- see Annotation Form V2 feature
- Reconciliation workflow (random assignment, candidate blinding, bulk approve) -- see Reconciliation feature
- Cross-project question sharing (import, fork, publish workflows) -- future v2 capability
- Organisation model and org-scoped question libraries -- future v2 capability
- Impact assessment wizard (what happens to existing sessions when QSV changes) -- deferred
- Question reorganization wizards (Copy, Copy & Disable, Copy & Delete) -- deferred
- Agreement metrics computation (Percent Agreement, Cohen's Kappa) -- see Reconciliation feature
Key Concepts¶
Annotation Question (AQ)¶
A versioned question entity that annotators answer. Has a stable identity and an append-only list of content versions. Created by activating a DraftAQ.
AQVersion¶
An immutable snapshot of an annotation question's content properties (text, options, helpText, answerFilters). Each content edit creates a new AQVersion. An AQVersion is identified by its composite identity (QuestionId, VersionNumber) -- its position within its parent AQ aggregate. There is no independent GUID; cross-aggregate references use composite identity Value Objects (D57).
DraftAQ¶
A fully mutable factory object representing a question under construction. All properties (including structural ones like dataType and parent) can be changed freely. Lives on the Project aggregate. Ceases to exist when activated -- the AQ is born with its first AQVersion.
Question Set (QS)¶
A named, versioned collection of question versions. Within a project, referred to as a "Question Set"; in the cross-project catalogue, presented as a "Template". Each stage references a specific QSV.
Question Set Version (QSV)¶
An immutable, ordered list of AQVersionIds defining which question versions in which order a stage presents to annotators. Any change to the composition (add, remove, reorder, upgrade a question version) creates a new QSV.
Annotation¶
A stable identity entity representing one annotator's relationship with one question for one study. Owns an append-only list of AnnotationVersions. Identity is composite: (StudyId, AnnotatorId, QuestionId) for candidates, (StudyId, QuestionId) with annotatorId=null for reconciliation.
AnnotationVersion (AV)¶
An immutable snapshot of an answer. Records the value, which AQVersion was answered, which QSV was active, which stage the answer was committed from, and who committed it. Created on Save or Complete actions. Identified by composite identity (AnnotationId, VersionNumber) within its parent Annotation aggregate (D57).
Annotation Session¶
A mutable working entity representing one annotator's annotation workspace for one study in one stage. Owns an append-only list of SessionVersions.
Session Version (ASV)¶
An immutable snapshot of a session's state at save time. Identified by composite identity (SessionId, VersionNumber) within its parent AnnotationSession aggregate (D57). Contains an explicit annotationAVMap pinning each annotation to the specific AV version number that was current when the session was saved. No computed filters or "latest" lookups -- the session knows exactly which answer versions it contains.
Identity Properties¶
Properties that define what an entity IS. Fixed at creation/activation and immutable thereafter. Changing these would fundamentally alter the entity's meaning. For AQ: dataType, parentQuestionId, groupAsSingle.
Content Properties¶
Properties that describe an entity's current state. Versionable -- editing creates a new version. For AQ: text, options, helpText, answerFilters.
Derived Properties¶
Properties computed by the system from context. Not directly editable. For AQ: currentVersionNumber, versions list.
Pending Buffer¶
A mutable field on a committed entity that holds uncommitted edits (auto-saved to the server). Not a version. Cleared when the user commits (creating a new version) or reverts. Examples: pendingChanges on AQ, pendingAnswer on Annotation.
Entity Model¶
DraftAQ (Pre-Activation)¶
DraftAQs are fully mutable factory objects that live on the Project aggregate. They have no version history, no impact assessment requirements, and no downstream dependencies.
| Property | Type | Mutability | Notes |
|---|---|---|---|
| DraftId | Guid | Fixed | Temporary, replaced by AqId on activation |
| DataType | AnswerType | Mutable | boolean, select, checklist, text, numeric, autocomplete |
| ParentId | Guid? | Mutable | Reference to parent DraftAQ or AQ |
| GroupAsSingle | boolean | Mutable | Whether child questions group under parent |
| Text | string | Mutable | Question wording |
| Options | List\<AnswerOption> | Mutable | Dropdown/checkbox choices |
| HelpText | string? | Mutable | Guidance shown alongside question |
| AnswerFilters | List\<AnswerFilter> | Mutable | Conditional display rules |
Lifecycle: When a project admin activates a stage (or explicitly publishes a question), each DraftAQ converts to an AQ. The DraftAQ ceases to exist; the AQ is born with its first AQVersion containing the draft's content properties. The DraftId is discarded and a permanent AqId is assigned.
Annotation Question (AQ)¶
| Property | Category | Type | Notes |
|---|---|---|---|
| Id | Identity | Guid | Stable across all versions, assigned at activation |
| Scope | Identity | System | Organisation | Researcher | Project | Ownership scope level |
| OwnerId | Identity | Guid | SystemId, OrgId, InvestigatorId, or ProjectId |
| Category | Identity | QuestionCategory | Classification |
| DataType | Identity | AnswerType | Immutable after activation (D38) |
| ParentQuestionId | Identity | Guid? | Immutable after activation (D38) |
| GroupAsSingle | Identity | boolean | Immutable after activation (D38) |
| DerivedFrom | Identity | Guid? | null if original, source QuestionId if forked |
| PendingChanges | Derived | object? | Mutable auto-save buffer: |
| CurrentVersionNumber | Derived | int | Number of the latest AQVersion |
| Versions | Derived | List\<AQVersion> | Append-only version history |
AQVersion (Immutable Content Snapshot)¶
| Property | Type | Notes |
|---|---|---|
| VersionNumber | int | Sequential: 1, 2, 3, ... Identity within AQ is (QuestionId, VersionNumber) |
| DerivedFrom | AQVersionRef? | null if original, (QuestionId, VersionNumber) of source if forked |
| QuestionText | string | The question wording |
| Options | List\<AnswerOption> | Available answer choices |
| HelpText | string? | Guidance text |
| AnswerFilters | List\<AnswerFilter> | Conditional display/option filters |
| CreatedAt | DateTime | Timestamp of version creation |
| CreatedBy | Guid | InvestigatorId of the admin who committed |
| ChangeReason | string? | Optional explanation of what changed |
| BreakingChange | boolean | Whether this version change invalidates existing answers (set by admin at creation) |
Key rules:
- Editing content properties (text, options, helpText, answerFilters) always creates a new AQVersion. The previous version is preserved.
- Identity properties (dataType, parentQuestionId, groupAsSingle) are immutable after activation. Changing these would fundamentally alter the question -- create a new AQ instead.
- The PendingChanges buffer supports auto-save of in-progress edits. The admin commits pending changes to create a new AQVersion.
- AQVersions are immutable and safe to share across scope boundaries (like Git commits or Docker layers).
Question Set (QS)¶
| Property | Category | Type | Notes |
|---|---|---|---|
| Id | Identity | Guid | QuestionSetId, stable across versions |
| Scope | Identity | System | Organisation | Researcher | Project | Ownership scope |
| OwnerId | Identity | Guid | Owner at the scope level |
| Name | Content | string | Display name |
| Description | Content | string? | Optional description |
| DerivedFrom | Identity | Guid? | null if original, source QuestionSetId if forked |
| Published | Content | bool | Visible in community catalogue when true |
| Metadata | Content | QuestionSetMetadata? | Optional catalogue fields (domain, species, tags) |
| ImportPolicy | Content | AllOrNothing | SelectiveImport | Controls import behavior |
| EditPolicy | Content | Editable | ReadOnly | EditableWithWarning | Controls fork behavior |
| CurrentVersionNumber | Derived | int | Latest QSV number |
| Versions | Derived | List\<QSV> | Append-only version history |
Question Set Version (QSV -- Immutable)¶
| Property | Type | Notes |
|---|---|---|
| VersionNumber | int | Sequential within the QS lineage. Identity is (QuestionSetId, VersionNumber) |
| AQVersionRefs | OrderedList\<AQVersionRef> | The question versions in display order. Each ref is (QuestionId, VersionNumber) |
| CreatedAt | DateTime | Timestamp |
| CreatedBy | Guid | Admin who committed |
| ChangeReason | string? | Optional explanation |
Parent integrity constraint: All ancestors (via ParentQuestionId hierarchy) of any AQVersion in a QSV must also be present in the same QSV. This is enforced at QSV composition time. When a child question is assigned to a stage, its parent AQVersions are automatically included. This creates structural cross-stage question overlap when different stages share parent questions -- an expected and common pattern.
Annotation¶
| Property | Category | Type | Notes |
|---|---|---|---|
| StudyId | Identity | Guid | Which study this answers |
| AnnotatorId | Identity | Guid? | Which annotator (null for reconciliation) |
| QuestionId | Identity | Guid | Which AQ this answers |
| PendingAnswer | Derived | object? | Mutable auto-save buffer: |
| CurrentVersionNumber | Derived | int | Latest AV number |
| Versions | Derived | List\<AV> | Append-only, embedded in document |
Candidate annotation identity: (StudyId, AnnotatorId, QuestionId) -- study-scoped, not stage-scoped. This supports cross-stage question sharing via explicit version references.
Reconciliation annotation identity: (StudyId, QuestionId) with AnnotatorId=null. Authorship is tracked per-AV via CommittedBy. The gold standard has no single owner -- multiple reconcilers may contribute AVs across stages.
AnnotationVersion (AV -- Immutable)¶
| Property | Type | Notes |
|---|---|---|
| VersionNumber | int | Sequential within the Annotation. Identity is (AnnotationId, VersionNumber) |
| AQVersionRef | AQVersionRef | (QuestionId, VersionNumber) of the question version answered |
| QSVRef | QSVersionRef | (QuestionSetId, VersionNumber) of the active question set version |
| Answer | TypedAnswer | bool, int, decimal, string, or arrays |
| Notes | string? | Annotator's notes on this answer |
| CommittedBy | Guid | Who committed (important for reconciliation) |
| StageId | Guid | Which stage this was committed from |
| CreatedAt | DateTime | Timestamp |
| CreatedByAction | string | "save", "complete", or "impact-update" |
| SessionVersionRef | ASVersionRef | (SessionId, VersionNumber) of the session submission that included this AV |
Annotation Session¶
| Property | Category | Type | Notes |
|---|---|---|---|
| Id | Identity | Guid | SessionId (aggregate root) |
| StudyId | Identity | Guid | Which study |
| StageId | Identity | Guid | Which stage |
| InvestigatorId | Identity | Guid | Which annotator |
| Reconciliation | Identity | boolean | Is this a reconciliation session? |
| AnnotationIds | Derived | List\<Guid> | Write-time materialisation of the session's working set (D52). Populated at session creation and QSV transition. See Cross-Stage Behavior. |
| CurrentVersionNumber | Derived | int | Latest ASV number |
| Versions | Derived | List\<ASV> | Append-only, embedded in document |
Session Version (ASV -- Immutable)¶
| Property | Type | Notes |
|---|---|---|
| VersionNumber | int | Sequential within the Session. Identity is (SessionId, VersionNumber) |
| Status | SessionStatus | Incomplete or Completed |
| QSVRef | QSVersionRef | (QuestionSetId, VersionNumber) of the active question set version |
| AnnotationAVMap | Map\<AnnotationId, int> | Pinned AV version number for each annotation at save time |
| ResolvedAQVersionRefs | Set\<AQVersionRef> | Materialized: the (QuestionId, VersionNumber) pairs actually active in this ASV (see Resolved Question Set) |
| CreatedAt | DateTime | Timestamp |
| CreatedByAction | string | "save", "complete", or "qsv-transition" |
| SubmittedAt | DateTime? | Set when status is Completed |
| TransitionMetadata | object? | Present only for qsv-transition ASVs: |
Key rule: The AnnotationAVMap is an explicit snapshot -- it records the exact AV version number for each annotation at the moment of save. The map key is the AnnotationId (aggregate root), the value is the version number within that Annotation's AV list. There are no computed filters or "latest" lookups. This replaces the current AnnotationSession.MatchingAnnotationsPredicate pattern.
ResolvedAQVersionRefs: A materialized field computed on write. It records which AQVersionRefs (QuestionId, VersionNumber) from the QSV are actually "live" in this ASV, accounting for parent-child conditionality and answer filters. Two annotators under the same QSV may have different resolved sets because their answers activate different conditional branches. See the Resolved Question Set and Parent Annotation Condition sections below.
Versioning Rules¶
When New AQVersions Are Created¶
A new AQVersion is created when a project admin commits changes to an activated question's content properties. The flow is:
- Admin edits text, options, helpText, or answerFilters in the question management UI
- Changes are auto-saved to the
pendingChangesbuffer on the AQ document (no version created yet) - Admin clicks "Save as New Version" -- the system creates a new AQVersion from pendingChanges and clears the buffer
- The new AQVersionId becomes available for inclusion in QSVs
Identity properties (dataType, parentQuestionId, groupAsSingle) cannot be changed on an activated AQ. To change these, the admin must create a new AQ, which is correct because changing them produces a fundamentally different question.
When New QSVs Are Created¶
A new QSV is created when the composition of a stage's question set changes:
- A question is added to or removed from the set
- The display order of questions changes
- A question is upgraded to a newer AQVersion
- A question is downgraded to an older AQVersion
Any of these changes produce a new immutable QSV. The stage's reference is updated to point to the new QSVId. What happens to existing in-progress sessions under the old QSV is an admin decision, not system-prescribed.
When New AnnotationVersions Are Created¶
A new AV is created on each affected Annotation when the annotator performs a Save or Complete action:
- Annotator answers questions -- each answer is auto-saved to the
pendingAnswerbuffer on the corresponding Annotation document (no AV created) - Annotator clicks Save or Complete -- the system creates a new AV on each Annotation that has a pendingAnswer, clears the buffer, and updates currentVersionNumber
- Each AV records the AQVersionId and QSVId that were active at commit time
Revert discards all pendingAnswer buffers without creating versions.
When New SessionVersions Are Created¶
A new ASV is created on the session at each Save or Complete:
- On Save: ASV created with status=Incomplete and an AnnotationAVMap capturing the current AV for each annotation in the session
- On Complete: ASV created with status=Completed, marking the session as finished
The ASV's AnnotationAVMap is a complete snapshot. Historical ASVs are preserved, enabling time-travel queries and revert operations.
Auto-Save Does Not Create Versions¶
Auto-save updates the mutable pending buffers (pendingChanges on AQ, pendingAnswer on Annotation) but never creates versions. This is deliberate: version history captures explicit user decisions, not intermediate keystrokes. Pending buffers persist server-side so that work survives browser crashes and device switches.
When New ASVs Are Created by QSV Transition¶
When a stage's QSV changes from QSV-old to QSV-new, the system creates a new ASV on each affected annotation session. This ASV carries forward compatible answers and leaves gaps where re-annotation is required. See the Resolved Question Set and Parent Annotation Condition sections below for how the new ASV's AnnotationAVMap is computed, and Question Management for the full admin decision framework.
Breaking Change Transitivity¶
The BreakingChange flag on each AQVersion marks whether that version invalidates answers from the previous version of the same AQ. When a QSV transition jumps across multiple AQVersions (e.g., v2 → v5), the transition is breaking if any intermediate version was marked breaking:
The admin sets breakingChange when creating each AQVersion. The system computes transitivity automatically during QSV transitions.
Resolved Question Set¶
The Resolved Question Set is the subset of AQVersionIds from a QSV that are actually "live" in a given ASV — accounting for parent-child conditionality and answer filters. It is a computed projection, not a tracked entity.
Why it exists: A QSV defines all questions that could appear in the annotation form. But parent-child relationships and answer filters mean that the questions actually shown depend on the specific answers given. Two annotators under the same QSV may have different resolved sets because their answers activate different conditional branches.
Why it is not an independent entity: It changes with every answer change (auto-save could alter it), it is fully deterministic from QSV + AVMap + AQ hierarchy, and the ASV's AnnotationAVMap already implicitly captures the same information. Tracking it separately would create churn and consistency risk.
How it is stored: Materialized as ResolvedAQVersionIds on each ASV at write time. Computed once when the ASV is created and never changes (ASVs are immutable). Available for queries ("which ASVs include AQVersion X?") without recomputing the tree.
Formal definition:
ResolvedQuestionSet(QSV, AVMap) → Set<AQVersionId>
ResolvedQuestionSet(QSV, AVMap) = { q ∈ QSV | PAC(q, QSV, AVMap) }
Parent Annotation Condition (PAC)¶
The Parent Annotation Condition determines whether a given AQVersion is included in the Resolved Question Set for a specific ASV. It is a recursive condition evaluated top-down from root questions through the parent-child hierarchy.
Formal definition:
For an AQVersionId q in a QSV, with AVMap representing the (partially or fully constructed) annotation-to-AV mapping:
PAC(q, QSV, AVMap) =
parent(q) = null -- root: always satisfies PAC
OR (
PAC(parent(q), QSV, AVMap) -- parent satisfies PAC (recursive)
AND ∃ av ∈ AVMap targeting parent(q) -- parent has an AV in the map
AND (
filters(q) = ∅ -- no filter: visible whenever parent is
OR filters(q).satisfied_by(answer(av)) -- parent's answer satisfies filter
)
)
Where:
- parent(q) returns the parent AQVersionId of q (null for root questions), derived from the AQ's ParentQuestionId identity property
- filters(q) returns the answer filters on AQVersionId q (conditions on the parent's answer that control q's visibility)
- answer(av) returns the answer value stored in AV av (null if the AV is blank)
Key properties:
- If PAC fails for q, it fails for all descendants of q — the entire subtree is excluded
- If a parent has a blank/null answer, children with answer filters are excluded (the filter cannot be satisfied). Children without filters may still be included depending on implementation — the default should be exclusion (unanswered parent means unexplored branch)
- PAC is evaluated top-down, so there is no circularity: root questions are evaluated first, then their children using the root's answer, and so on
- The same PAC logic is used both by the annotation form (to determine which questions to render) and by the QSV transition algorithm (to determine which AVs to include in the new ASV)
Data Model¶
Collections¶
| Collection | Aggregate Root | Contents | Rationale |
|---|---|---|---|
pmAnnotationQuestion |
AQ | AQ identity + pendingChanges + embedded AQVersions | Global collection, scope-based access control (D28) |
pmQuestionSet |
QS | QS identity + embedded QSVs | Global collection, scope-based access control |
pmAnnotation |
Annotation | Annotation identity + pendingAnswer + embedded AVs | Own aggregate; avoids Study document contention and unbounded growth (D41) |
pmAnnotationSession |
Session | Session metadata + annotationIds + embedded ASVs | Own aggregate; same reasoning as pmAnnotation (D42) |
pmStudy (unchanged) |
Study | Study metadata only -- no back-references to annotations or sessions | Navigation via queries on pmAnnotation/pmAnnotationSession using StudyId indexes (D50-revised) |
pmProject (modified) |
Project | DraftAQs + project configuration | Holds draft questions before activation |
Why Separate Collections¶
The current system embeds annotation data within the Study document. This creates two problems at scale:
- Write contention: Multiple annotators auto-saving concurrently all compete for the same Study document lock
- Unbounded growth: Each annotation's version history grows with every save, pushing the Study document toward MongoDB's 16MB limit
Extracting annotations and sessions into their own collections (D41, D42) resolves both issues. The Study document is not modified -- it does not need back-references to annotations or sessions because those collections carry studyId and are indexed for efficient lookup (D50-revised). Each Annotation document is its own aggregate with its AVs embedded (D43), ensuring that creating an AV and updating currentVersionNumber is a single atomic write.
Key Indexes¶
pmAnnotationQuestion:
{ Scope: 1, OwnerId: 1 }-- scope-based access queries{ _id: 1, "Versions.VersionNumber": 1 }-- AQVersion lookup by composite identity (QuestionId, VersionNumber)
pmAnnotation:
{ StudyId: 1, AnnotatorId: 1, QuestionId: 1 }-- unique composite identity (candidate); also serves as prefix index for StudyId-based lookups (replaces Study back-reference){ StudyId: 1, QuestionId: 1 }-- reconciliation annotation lookup (where AnnotatorId is null)
pmAnnotationSession:
{ StudyId: 1, StageId: 1, InvestigatorId: 1 }-- session lookup per annotator per stage{ StudyId: 1 }-- all sessions for a study (replaces Study back-reference)
Scope Fields¶
All AQ and QS entities carry Scope and OwnerId fields for access control:
| Scope | OwnerId | Meaning |
|---|---|---|
| System | SystemId | Platform-curated (CAMARADES) |
| Organisation | OrgId | Org-private library |
| Researcher | InvestigatorId | Personal library |
| Project | ProjectId | Project-local |
The DerivedFrom field on AQ, AQVersion, QS, and QSV creates a fork graph (DAG) that provides full provenance. Any project-local asset can be traced back through its derivation chain to its origin.
Consistency Model¶
| Operation | Collections Touched | Consistency Mechanism |
|---|---|---|
| Auto-save (pendingAnswer) | N pmAnnotation docs | N independent single-doc writes (each atomic) |
| Save/Complete | N pmAnnotation docs + 1 pmAnnotationSession doc | Multi-document transaction (D47) |
| Revert | N pmAnnotation docs | N independent single-doc writes |
| Load session | 1 pmAnnotationSession + N pmAnnotation docs | Two reads (session doc has annotationIds; $in query on pmAnnotation) |
| Find study's sessions | pmAnnotationSession | Query by StudyId index |
| Find study's annotations | pmAnnotation | Query by StudyId prefix on composite index |
Save/Complete uses MongoDB multi-document transactions. This is acceptable because save is an infrequent explicit user action (not auto-save), N is the number of questions answered (tens, not thousands), and MongoDB Atlas fully supports multi-document transactions.
Cross-Stage Behavior¶
Annotations Are Study-Scoped, Not Stage-Scoped¶
A candidate annotation's identity is (StudyId, AnnotatorId, QuestionId) -- it has no stage component. When a shared question appears in multiple stages (structurally forced by the parent integrity constraint on QSVs), both stages' sessions reference the same Annotation entity.
Walkthrough: Question Q-shared is assigned to both Stage 1 and Stage 2. Reviewer A works on both stages.
- Reviewer opens Stage 1 session -- Annotation
ann-1created for Q-shared - Reviewer answers and saves Stage 1 -- AV version 1 created on
ann-1. Stage 1's ASV records{ann-1: 1} - Reviewer opens Stage 2 session --
ann-1is included in Stage 2's annotation list (same entity, shared reference) - Reviewer sees version 1's value (the answer from Stage 1)
- Reviewer edits the answer in Stage 2 context and saves -- AV version 2 created on the same
ann-1. Stage 2's ASV records{ann-1: 2} - Stage 1's completed ASV still pins to
{ann-1: 1}-- the answer as it was at Stage 1 completion
Key principle: No forking, no copying. One annotation identity, multiple AVs. Each stage's session version pins to the specific AV that was current when that session was saved. Stage isolation is achieved through ASV snapshots, not through separate annotation entities.
AnnotationVersion Carries Stage Context¶
Each AV records a stageId indicating which stage it was committed from. This enables queries like "show me all answers committed from Stage 2" without needing stage-scoped annotation entities.
Reconciliation Annotations Across Stages¶
Reconciliation annotations have annotatorId=null and are shared across stages AND reconcilers. Each AV on a reconciliation annotation records committedBy (which reconciler) and stageId (which stage's reconciliation produced it). The gold standard for a question on a study is always the latest AV on its reconciliation annotation.
When Stage A is reconciled, the reconciler creates AVs on reconciliation annotations for Stage A's questions. When Stage B is reconciled later, the Stage B reconciler sees Stage A's reconciled answers as context and creates new AVs for Stage B's questions. For questions that appear in both stages (due to parent integrity), the Stage B reconciler can confirm or override the Stage A answer by creating a new AV.
Cross-stage disagreement is resolved through the natural act of reconciliation -- no separate conflict detection or resolution workflow is needed.
Cross-Stage PAC Consistency (D51, D53, D54, D55)¶
When a parent annotation question appears in multiple stages and a child question with answer filters exists in only one of those stages, changing the parent's answer in a later stage can invalidate the child annotation's PAC in the earlier stage. This is the cross-stage PAC consistency concern.
The scenario: Q-parent is in Stage 1 and Stage 2. Q-child (with answer filter requiring Q-parent="X") is only in Stage 1. The annotator answers Q-parent="X" and Q-child in Stage 1, then changes Q-parent to "Y" in Stage 2. Q-child's PAC now fails -- it was answered under a condition that no longer holds.
Detection -- real-time in the annotation form (D53): When the annotator changes a parent answer, the annotation form immediately evaluates PAC for all child annotations in other stages that depend on this parent. If any would be invalidated, a non-blocking inline indicator appears beneath the parent question (an information icon). Clicking the icon opens a modal showing:
- Which stages contain affected child annotations
- The descendant annotation tree and descendant annotation questions
- The current vs. new PAC evaluation for each affected child
This provides the annotator with full context at the moment they have the mental context of the change. The indicator remains visible as long as the parent's tentative answer (pendingAnswer) differs from the value required by child PAC filters.
Materialisation -- domain event handler (D51): When an AV is committed on a parent annotation, a ParentAnnotationAnswerChanged domain event is emitted. A handler evaluates PAC for all child annotations across stages that depend on this parent. For any child annotation where PAC now fails, a crossStageConsistency field on the Annotation document is updated:
This is a denormalized projection maintained by domain events -- it depends on the state of other aggregates (the parent annotation's current AV and the child's AQVersion answer filters). It is not derivable from the Annotation's own state. The field is non-authoritative and can be recomputed from source-of-truth state via a batch recomputation if the event handler fails.
Consistency history: No separate history mechanism is needed. The parent annotation's AV timeline is the authoritative record. "Was this child ever stale?" can be answered by replaying the parent's AV sequence against the child's answer filters -- a read-time projection over immutable data.
Reconciliation behaviour (D54): During reconciliation, stale annotations are prominently flagged. The reconciler has access to both reconciled and candidate annotations from all stages. When saving or submitting a reconciliation session that contains stale annotations, the reconciler must explicitly acknowledge a summary view of all affected annotations. After acknowledgement, they may opt to suppress future acknowledgement prompts for that session ("ignore future warnings"), but the inline indicators remain visible regardless.
Configurable enforcement levels (D55): Project administrators can configure cross-stage PAC enforcement per project:
| Level | Behaviour | Default |
|---|---|---|
flag |
Inline indicators + reconciler acknowledgement on submit | Yes |
block-reconciliation |
Reconciliation submission blocked until stale annotations are explicitly reviewed and each is individually acknowledged | No |
inform-only |
Inline indicators only, no acknowledgement required at submit | No |
Migration Strategy¶
Migration is additive only -- no data is deleted or moved.
Phase 7 Migration Steps¶
- Backfill AQVersions: All existing annotation questions become AQ entities with a single v1 AQVersion containing their current content. This is accurate -- they have never been versioned.
- Create initial QSVs: For each stage, create a QuestionSet and initial QSV based on the stage's current AnnotationQuestions list.
- Add scope fields: All existing questions default to Scope=Project and OwnerId=ProjectId.
- Create pmAnnotation/pmAnnotationSession collections: Extract annotation and session data from Study documents into their own collections. Study documents are not modified -- no back-reference arrays needed (D50-revised).
- Backfill version references: Add AQVersionRef and QSVRef references to existing annotations (pointing to the v1 versions, using composite identity Value Objects).
- Auto-promote single-annotator studies: For studies with exactly one completed session per stage where MinAnnotators=1, create ReconciliationSessionVersions with Resolution=SingleAnnotator.
Backward Compatibility¶
- Existing annotation form continues to work throughout migration
- Existing question management UI continues to work
- API consumers see the same structures with additional optional fields
- The
Reconciledboolean on existing annotations is preserved but not used for new authority determination - Rollback: new fields can be removed with
$unsetoperations, restoring the previous schema
Success Criteria¶
- Every annotation question edit after activation creates a new AQVersion; the previous version is preserved and queryable.
- Every AQVersion is immutable -- no field on an AQVersion document changes after creation.
- Identity properties (dataType, parentQuestionId, groupAsSingle) cannot be modified on an activated AQ.
- DraftAQs convert cleanly to AQs on stage activation, with the first AQVersion containing the draft's content.
- Every AnnotationVersion records the AQVersionRef and QSVRef that were active when the answer was collected, using composite identity Value Objects.
- Session versions contain an explicit AnnotationAVMap pinning each annotation to a specific AV version number -- no computed filters.
- Auto-save updates pendingAnswer/pendingChanges without creating versions; versions are created only on explicit Save/Complete.
- Cross-stage annotation sharing works: the same Annotation entity is referenced by sessions in multiple stages, with stage-specific AVs.
- Reconciliation annotations have annotatorId=null with per-AV committedBy attribution.
- Parent integrity constraint is enforced: all ancestors of any AQVersion in a QSV are also present in the same QSV.
- All existing questions are backfilled as v1 AQVersions during migration with no data loss.
- QSV composition changes (add, remove, reorder, upgrade) create new immutable QSVs.
- Optimistic concurrency prevents lost updates on Study document writes.
- Annotations and sessions live in separate collections (pmAnnotation, pmAnnotationSession). Study documents are not modified -- navigation uses StudyId indexes on the new collections.
- Version entities use composite identity (rootId, versionNumber) with no independent GUIDs. Cross-aggregate references use composite identity Value Objects.
- Cross-stage PAC consistency is materialised on child annotations via domain events and flagged in the annotation form in real-time.
- Reconcilers must acknowledge stale annotations before submitting reconciliation sessions (configurable per project).
Design Decisions¶
| # | Decision | Rationale |
|---|---|---|
| D1 | Identity + Immutable Versions pattern for all entities | Full audit trail, time-travel queries, consistent pattern across the system |
| D2 | AQVersionId as sole reference in QSV (not QuestionId) | QuestionId is derivable from the AQVersion; single reference simplifies lookups |
| D3 | Explicit AnnotationAVMap in sessions (not computed filters) | Eliminates ambiguity; each session knows exactly which answer versions it contains |
| D4 | Annotation identity is study-scoped: (StudyId, AnnotatorId, QuestionId) | Supports cross-stage question sharing via explicit version references; optimistic concurrency on Study writes |
| D11 | QSV is immutable; admin decides session handling on change | System does not prescribe what happens to in-progress sessions; admin chooses |
| D28 | Single global collection for all AQs (pmAnnotationQuestion) | Scope + OwnerId for access control; no routing logic; immutable versions are safe to share |
| D29 | Parent integrity: all AQV ancestors must be in the same QSV | Enforced at QSV composition time; creates structural cross-stage question overlap |
| D37 | DraftAQ is a separate type from AQ | Creation phase has different invariants (all mutable) vs modification phase (identity frozen, only content versionable) |
| D38 | dataType, parentId, groupAsSingle are identity properties -- immutable after activation | Changing parent restructures entity subtrees; changing dataType invalidates all AVs; these changes produce a different question |
| D39 | QSV does not need a Draft type | QSV has no identity properties; its only content (AQVersionIds) is the same shape for initial assignment and modification |
| D40 | "Pending" for existing entities, "Draft" for under-construction | Avoids confusion: "Draft" = not yet real; "Pending" = real entity with uncommitted edits |
| D41 | Annotations are own aggregate in pmAnnotation collection | Avoids Study document contention and unbounded growth; Annotation + embedded AVs form natural aggregate boundary |
| D42 | Sessions are own aggregate in pmAnnotationSession collection | Same reasoning as D41; Session + embedded ASVs form natural aggregate boundary |
| D43 | AVs embedded in Annotation document (not separate collection) | Annotation and its AVs must be consistent (currentAVId must match); embedding gives single-doc atomicity |
| D44 | Cross-stage sharing via same Annotation entity with multiple AVs | No forking, no copying; both stages' sessions reference the same annotationId; ASVs pin to specific AVs |
| D45 | Reconciliation annotations have annotatorId=null; authorship on AVs | Shared across stages AND reconcilers; gold standard has no single owner; committedBy tracks per-AV authorship |
| D46 | Server-side auto-save via pendingAnswer field | Users change machines/browsers; client-only auto-save is insufficient; mutable field on Annotation document |
| D47 | Save/Complete use MongoDB multi-document transactions | Save touches N annotation docs + 1 session doc; transactions ensure consistency; acceptable for infrequent explicit actions |
| D48 | Graduated impact assessment at commit time | No property is forbidden to edit; the system assesses impact and warns proportionally |
| D49 | annotationId serves as stable entity identity (entityInstanceId removed) | Existing Annotation.Id already provides stable identity; no new concept needed |
| D50 | Study document holds only references (sessionIds, annotationIds) | Revised by D50-revised |
| D50-revised | Study document holds no back-references to annotations or sessions | Back-references add write amplification on every annotation/session creation and consistency risk if the $push fails. Annotations and sessions carry studyId and are indexed; queries by StudyId replace the reference arrays. Eliminates the contention that D41/D42 were designed to avoid. |
| D51 | Cross-stage PAC consistency is materialised via domain events | When a parent AV is committed, a handler evaluates PAC for child annotations in other stages. The crossStageConsistency field on the child Annotation is a denormalized projection -- non-authoritative, rebuildable. Materialised because read paths (reconciliation workspace, admin dashboard, session review) need this information frequently. |
| D52 | Session retains annotationIds as write-time materialisation | The annotation-session relationship is many-to-many (one annotation participates in multiple sessions across stages); the annotation side cannot hold the reference. annotationIds is populated once at session creation or QSV transition, not maintained incrementally. Justified as hot-path optimisation for form load latency. |
| D53 | Cross-stage PAC warning shown immediately on parent answer change | The annotator has the mental context of "I'm changing Q-parent from X to Y" at the moment of change. Waiting until save/complete loses that context. Non-blocking inline indicator (information icon) with clickable modal showing affected stages, descendant tree, and PAC evaluation. |
| D54 | Reconciler must acknowledge stale annotations before submitting | Stale annotations are flagged prominently in the reconciliation workspace. On save/submit, a summary of all affected annotations requires explicit acknowledgement. Reconciler may opt to suppress future acknowledgement prompts for that session, but inline indicators remain. Balances methodological rigour with workflow efficiency. |
| D55 | Cross-stage PAC enforcement is configurable per project | Three levels: flag (default -- indicators + acknowledgement), block-reconciliation (individual review required), inform-only (indicators only). Different systematic review protocols have different rigour requirements; sensible default with customisability. |
| D56 | Update design documents to incorporate D50-revised through D57 | Ensures all decisions, rationale, and alternatives are captured consistently across README.md and design-session.md. |
| D57 | Version entities use composite identity (rootId, versionNumber) not independent GUIDs | Follows DDD principle that non-root entities are identified relative to their aggregate root -- same pattern as event sourcing (streamId, sequenceNumber). Cross-aggregate references use composite identity Value Objects (e.g., AQVersionRef, QSVersionRef, AVRef). Reduces structural coupling; version identity is meaningful only within its parent aggregate. GUIDs can be generated deterministically at bounded-context boundaries (exports, external APIs) if needed. |
Related Documents¶
| Document | Relationship |
|---|---|
| Annotation Management & Reconciliation -- Design Decisions | Authoritative decision reference (D1-D50). This feature spec is derived from Sections 1 and 11. |
| Annotation Versioning Design | Design refinements (D37-D50) covering DraftAQ lifecycle, property classification, annotation/session versioning. Supersedes design-decisions.md where they conflict. |
| Product Specification | PO-facing description of versioning behavior and phased delivery. |
| Annotation Management & Reconciliation -- Master Plan | Engineering overview of all features in the initiative. |
| Reconciliation (future feature spec) | Depends on Annotation Versioning for AV-based reconciliation answers and session versioning. |
| Question Management (future feature spec) | Depends on Annotation Versioning for AQ/AQVersion entities and DraftAQ lifecycle. |
| Annotation Form V2 (future feature spec) | Depends on Annotation Versioning for per-answer version tracking and auto-save via pendingAnswer. |