Skip to content

Annotation Question Management & Reconciliation — Design Decisions

Purpose: This document captures all architectural and domain model decisions from the brainstorming sessions for this initiative. It is the authoritative reference for design intent. Where this document contradicts earlier spec files (reconciliation.md, data-model-migration.md, README.md), this document takes precedence and those specs must be updated to align.

Parent: Annotation Management & Reconciliation


Contradictions with Existing Spec Files

The following contradictions were identified between brainstorming decisions and the initial spec files, plus refinements from the annotation-versioning brainstorming (D37-D50). All 20 contradictions must be resolved in favour of the authoritative source before implementation begins.

Document precedence: annotation-versioning-design.md (D37-D50) supersedes this document (D1-D36) supersedes the original spec files.

Layer 1: Original 12 contradictions (this document supersedes spec files)

# Spec File Contradiction Required Change Resolution Status
1 reconciliation.md Uses claim/release workflow (self-assigned tasks) Replace with random assignment from pool (D9) Resolved
2 reconciliation.md Uses Reconciled flag on annotations as authority mechanism Replace with Reconciliation Session model (D5) Resolved
3 reconciliation.md DecisionSource enum (AnnotatorA/AnnotatorB/New/Averaged) implies picking a candidate's answer Remove — reconciler always creates own annotation (D7) Resolved
4 reconciliation.md ReconciliationRecord with ClaimedBy, ClaimedAt fields Remove claiming fields; use random assignment pool (D9) Resolved
5 reconciliation.md No mention of screening reconciliation ordering Add ordering constraint (D14) Resolved
6 reconciliation.md No candidate blinding requirement Add as system invariant (D10) Resolved
7 data-model-migration.md ReconciliationRecord schema differs from reconciliation.md Replace both with Reconciliation Session model (D5) Resolved
8 data-model-migration.md No FinalAnnotationOutcome concept Add Reconciliation Session entity and migration strategy Resolved
9 README.md API surface shows /claim and /release endpoints Replace with pool/assignment endpoints (D9) Resolved
10 README.md Domain model diagram shows ReconciliationRecord embedded in ExtractionInfo Replace with Reconciliation Session model (D5) Resolved
11 README.md Missing session model (Session → SessionVersion) Add to proposed architecture (D3) Resolved
12 README.md Missing screening reconciliation ordering in milestones Add dependency note (D14) Resolved

Layer 2: D37-D50 contradictions (annotation-versioning-design.md supersedes this document)

# Source Contradiction Required Change Resolution Status
13 design-decisions.md S1 (AQVersion) AnswerType and ParentQuestionId in AQVersion (versionable) Move to AQ identity level — immutable after publish (D38) Resolved
14 design-decisions.md S1 No DraftAQ concept Add DraftAQ as separate type from AQ (D37) Resolved
15 design-decisions.md S1 No pendingChanges / pendingAnswer concept Add pending buffers for auto-save (D40, D46) Resolved
16 design-decisions.md S1 / README.md "No new collections are created" Create pmAnnotation and pmAnnotationSession as separate collections (D41, D42) Resolved
17 design-decisions.md S1 Annotation identity is composite (StudyId, AnnotatorId, QuestionId) only Still true for candidates. Reconciliation annotations: (StudyId, QuestionId) with annotatorId: null (D45) Resolved
18 design-decisions.md S2 ReconciliationAnswer as separate entity with Resolution enum Reconciliation answers become AVs on reconciliation Annotations. committedBy and stageId on AV replace separate ReconciliationAnswer. (D45) Resolved
19 product-overview.md No auto-save mechanics described Server-side auto-save via pendingAnswer (D46) Resolved
20 data-model-migration.md "Embedded-first, extract-if-needed" principle with all data in pmProject/pmStudy Annotations and sessions move to own collections. Study holds references only (D41, D42, D50) Resolved

Section 1: Entity Model — Identity + Immutable Versions

Every core entity follows the same pattern: a stable identity with append-only immutable versions. All entities are scope-aware, supporting cross-scope sharing (see Section 9).

Draft Annotation Question (DraftAQ)

Before activation, questions exist as DraftAQ entities — fully mutable factory objects that live on the Project aggregate. DraftAQs have no version history, no impact assessment requirements, and no downstream dependencies. Everything is mutable.

DraftAQ (factory — lives on Project aggregate)
├── DraftId: Guid                     ← temporary, replaced by AqId on activation
├── DataType: string                  ← mutable during draft phase
├── ParentId: Guid?                   ← mutable during draft phase
├── GroupAsSingle: boolean            ← mutable during draft phase
├── Text: string                      ← mutable
├── Options: List<AnswerOption>       ← mutable
├── HelpText: string?                 ← mutable
├── AnswerFilters: List<AnswerFilter> ← mutable

When a project admin activates a stage (or explicitly publishes a question), DraftAQs are converted to AQs. The DraftAQ ceases to exist; the AQ is born with its first AQVersion. See annotation-versioning-design.md Section 1.1 for the full rationale.

Annotation Question (AQ → AQVersion)

AnnotationQuestion (Identity — collection: pmAnnotationQuestion)
├── Id: Guid                           ← stable across versions
├── Scope: System | Organisation | Researcher | Project
├── OwnerId: Guid                     ← SystemId | OrgId | InvestigatorId | ProjectId
├── Category: QuestionCategory
├── DataType: AnswerType               ← IDENTITY: immutable after activation (D38)
├── ParentQuestionId: Guid?            ← IDENTITY: immutable after activation (D38)
├── GroupAsSingle: boolean             ← IDENTITY: immutable after activation (D38)
├── DerivedFrom: QuestionId?           ← null if original, source QuestionId if forked
├── PendingChanges: {Text, Options, HelpText, AnswerFilters} | null  ← mutable auto-save buffer (D40)
├── CurrentVersionNumber: int
└── Versions: List<AQVersion>          ← append-only

AQVersion (Immutable Snapshot — content properties only)
├── Id: Guid                           ← AQVersionId, globally unique
├── VersionNumber: int                 ← 1, 2, 3, ...
├── DerivedFrom: AQVersionId?          ← null if original, source AQVersionId if forked
├── QuestionText: string
├── Options: List<AnswerOption>
├── HelpText: string?
├── AnswerFilters: List<AnswerFilter>  ← conditional display / option filters
├── CreatedAt: DateTime
├── CreatedBy: Guid
└── ChangeReason: string?

Key rules:

  • Editing a question's content (text, options, helpText, answerFilters) always creates a new AQVersion. The AQVersion.Id (AQVersionId) is the globally unique identifier used by all downstream references.
  • Identity properties (DataType, ParentQuestionId, GroupAsSingle) are immutable after activation. Changing these would effectively create a different question — use a new AQ instead. See annotation-versioning-design.md Section 1.2 for rationale.
  • The PendingChanges buffer supports server-side auto-save of in-progress edits. The PA commits pending changes to create a new AQVersion. See Section 11 (D40) for details.
  • All AQs regardless of scope live in a single global collection (pmAnnotationQuestion). Scope + OwnerId provide access control. No routing logic needed — query by scope/owner.
  • Immutable AQVersions are safe to share across aggregates and scope boundaries (like Git objects, Docker layers). The DDD Shared Kernel pattern explicitly supports this.

Question Set → Question Set Version (QS → QSV)

The QuestionSet is the unified concept that replaces both StageQuestionSet (SQS) and QuestionTemplate/TemplateVersion from earlier designs. Within a project context, question sets are referred to as "Question Sets"; when browsing the cross-project catalogue, they are presented as "Templates" in the UI.

QuestionSet (Identity — collection: pmQuestionSet)
├── Id: Guid                           ← QuestionSetId, stable across versions
├── Scope: System | Organisation | Researcher | Project
├── OwnerId: Guid                     ← SystemId | OrgId | InvestigatorId | ProjectId
├── Name: string
├── Description: string?
├── DerivedFrom: QuestionSetId?        ← null if original, source QuestionSetId if forked
├── Published: bool                    ← visible in community catalogue when true
├── Metadata: QuestionSetMetadata?     ← optional catalogue fields (domain, species, tags, etc.)
├── ImportPolicy: AllOrNothing | SelectiveImport    ← optional, for shared sets
├── EditPolicy: Editable | ReadOnly | EditableWithWarning  ← derivation/fork policy
├── CurrentVersionNumber: int
└── Versions: List<QuestionSetVersion> ← append-only

QuestionSetVersion (Immutable — QSV)
├── Id: Guid                           ← QSVId, globally unique
├── VersionNumber: int                 ← per-set lineage: 1, 2, 3, ...
├── AQVersionIds: OrderedList<Guid>    ← the question versions, in display order
├── CreatedAt: DateTime
├── CreatedBy: Guid
└── ChangeReason: string?

Key rules:

  • QSV is immutable. Changing the question set (adding, removing, reordering, or upgrading a question version) creates a new QSV with a new QSVId and incremented VersionNumber.
  • QSV entries are AQVersionIds only. The QuestionId is derivable from the AQVersion when needed, but the QSV stores only the version references.
  • A stage references a QSVId to define its active question configuration. What happens to existing sessions when a new QSV is assigned is an admin decision, not system-prescribed. The admin chooses from configurable options (e.g., pin sessions to their starting QSV, require re-annotation, etc.).
  • EditPolicy is a fork/derivation policy, not about mutating the QS itself (which follows the immutable versioning pattern). It controls what happens when a project imports this question set: Editable allows free forking, ReadOnly prevents derivation, EditableWithWarning allows derivation with a warning.
  • Published promotes visibility to all authenticated users. Community catalogue = WHERE Published == true. See Section 9 for details.

Parent Integrity Constraint

All ancestors (via ParentQuestionId hierarchy — not DerivedFrom) of any AQVersion in a QSV must also be present in the same QSV. This is enforced at QSV composition time, not at stage assignment time, because AQVersions are always created in the context of a QSV.

This constraint means that when a stage has child questions whose parents belong to another stage's question set, those parent AQVersions are necessarily included in both QSVs. Cross-stage question overlap is therefore structural (forced by parent integrity), not exceptional. This is an expected and common pattern.

Annotation → AnnotationVersion

Annotation (Identity — collection: pmAnnotation)
├── StudyId: Guid                      ┐
├── AnnotatorId: Guid | null           ├── composite identity (null for reconciliation, D45)
├── QuestionId: Guid                   ┘
├── PendingAnswer: {Value, Notes} | null  ← mutable auto-save buffer (D46)
├── CurrentVersionNumber: int
└── Versions: List<AnnotationVersion>  ← append-only (embedded in same document, D43)

AnnotationVersion (AV — Immutable)
├── Id: Guid                           ← AnnotationVersionId, globally unique
├── VersionNumber: int
├── AQVersionId: Guid                  ← which question version was answered
├── QSVId: Guid                        ← which question set version was active
├── Answer: TypedAnswer                ← bool/int/decimal/string/arrays
├── Notes: string?
├── CommittedBy: Guid                  ← who committed (important for reconciliation, D45)
├── StageId: Guid                      ← which stage this was committed from (D45)
├── CreatedAt: DateTime
├── CreatedByAction: "save" | "complete" | "impact-update"
└── SessionVersionId: Guid             ← which session submission included this

Key rules:

  • Candidate annotation identity is study-scoped: (StudyId, AnnotatorId, QuestionId). Not stage-scoped — this supports cross-stage question sharing via explicit version references.
  • Reconciliation annotation identity is (StudyId, QuestionId) with AnnotatorId: null (D45). Authorship is tracked per-AV via CommittedBy. The gold standard has no single owner — multiple reconcilers may contribute AVs across stages.
  • PendingAnswer is a mutable server-side auto-save buffer, updated with each auto-save (D46). Cleared on Save/Complete/Revert.
  • Every save/submit creates a new AnnotationVersion. Previous versions are preserved.
  • Each AnnotationVersion records the AQVersionId and QSVId that were active when the answer was collected, providing full audit traceability.
  • Annotations live in their own pmAnnotation collection (D41). The Study document holds reference arrays (annotationIds) only (D50). See Section 11 for rationale.

Session → SessionVersion

Session (Identity — collection: pmAnnotationSession)
├── Id: Guid                           ← SessionId
├── StageId: Guid
├── InvestigatorId: Guid
├── Reconciliation: boolean            ← is this a reconciliation session?
├── AnnotationIds: List<Guid>          ← references to Annotation entities in this session
├── CurrentVersionNumber: int
└── Versions: List<SessionVersion>     ← append-only (embedded in same document)

SessionVersion (ASV — Immutable)
├── Id: Guid                           ← SessionVersionId, globally unique
├── VersionNumber: int
├── Status: SessionStatus              ← Incomplete | Completed
├── QSVId: Guid                        ← which question set version this submission used
├── AnnotationAVMap: Map<AnnotationId, AVId>  ← pinned AV for each annotation at save time
├── CreatedAt: DateTime
├── CreatedByAction: "save" | "complete"
└── SubmittedAt: DateTime?

Key rules:

  • Sessions hold explicit AnnotationAVMap pinning each annotation to a specific AV at save time, not computed filters. There are no predicates or "latest" lookups — the session knows exactly which annotation versions it contains.
  • This replaces the current AnnotationSession.MatchingAnnotationsPredicate pattern (which uses computed filters based on Reconciled, StageId, AnnotatorId).
  • Sessions live in their own pmAnnotationSession collection (D42). The Study document holds reference arrays (sessionIds) only (D50). See Section 11 for rationale.

Stages Are Unordered

Stages do not have an inherent sequential order. They can be concurrent and orthogonal — e.g., one stage focused on risk of bias, another on data extraction, running in parallel. The system must not assume stage ordering for any authority or precedence logic.


Section 2: Reconciliation Session — Gold Standard Authority

Why Not Reconciled Annotations?

The current system uses a Reconciled: bool flag on annotations to mark reconciliation answers. The brainstorming established that this approach has fundamental problems:

  1. Duplication: In 3 of 4 scenarios (SingleAnnotator, CandidateAgreement, ManualReconciliation-with-agreement), the system would need to create reconciled annotations that are identical copies of existing annotations.
  2. Authority ambiguity: A Reconciled flag doesn't distinguish between "this annotation was auto-promoted because it was the only one" vs "a reconciler reviewed this and approved it".
  3. Inconsistent with screening: The screening annotation feature uses FinalScreeningOutcome for authority — annotation should follow the same pattern.

The Reconciliation Session Model

Each study has a single reconciliation session that acts as the gold standard. Each time a reconciler reconciles a stage, the system creates a new version of this session. The latest version IS the project-level authoritative answer set for that study.

The reconciliation form presents all AQs in the project (the "Project Question Set"). Only the intersection of the current stage's QSV questions and the AQs defined as required are actually required on the form. AQs that are required but don't belong to the current stage's QSV become optional, allowing reconcilers to see and optionally engage with the full project context.

This means each stage's reconciliation is another version of the same session, so that eventually all AQs belonging to stages will have reconciled answers and the full context of previous answers is available to reconcilers.

The reconciliation session model retains its role as the gold standard authority. However, reconciliation answers are now implemented as AnnotationVersions (AVs) on reconciliation Annotations rather than as a separate ReconciliationAnswer entity. The committedBy and stageId fields on AV (see Section 1) replace the separate ResolvedById and OriginStageId fields that were previously on ReconciliationAnswer.

ReconciliationSession (per Study — follows Identity + Versions pattern)
├── CurrentVersionNumber: int
└── Versions: List<ReconciliationSessionVersion>   ← append-only

ReconciliationSessionVersion (Immutable — equivalent to a reconciliation ASV)
├── Id: Guid                           ← globally unique
├── VersionNumber: int
├── ReconciledStageId: Guid            ← which stage's reconciliation produced this version
├── ReconcilerId: Guid                 ← who reconciled
├── QSVId: Guid                        ← the stage's active QSV (audit)
├── AnnotationAVMap: Map<AnnotationId, AVId>  ← reconciliation Annotation → authoritative AV
├── ResolutionMetadata: Map<QuestionId, ResolutionMeta>  ← per-question resolution tracking
├── SubmittedAt: DateTime
└── ChangeReason: string?

ResolutionMeta (per-question metadata — replaces ReconciliationAnswer)
├── Resolution: Resolution             ← how this answer was determined
├── Rationale: string?                 ← optional per-answer rationale

The AnnotationAVMap maps each reconciliation Annotation (identified by annotationId) to its authoritative AV. Since reconciliation annotations have annotatorId: null (D45), each (studyId, questionId) pair has exactly one reconciliation Annotation whose latest AV is the gold standard.

Resolution enum:

Value Meaning AV committed by
SingleAnnotator Only one annotator completed; auto-promoted System (on behalf of the candidate)
CandidateAgreement Multiple annotators agreed; reconciler bulk-approved Reconciler (creating own AV with agreed answer)
ManualReconciliation Annotators disagreed; reconciler manually resolved Reconciler (creating own AV with their answer)

See annotation-versioning-design.md Section 2.7 for the full reconciliation annotation model.

How Versioning Works

When a reconciler reconciles Stage B for a study that already has a v1 (from Stage A):

  1. Load v1's AnnotationAVMap (the current gold standard — a map of reconciliation annotations to their authoritative AVs)
  2. Stage B's QSV questions are required — reconciler must provide answers
  3. All other project questions are optional — previous answers shown as context
  4. Reconciler submits → new AVs created on reconciliation Annotations for Stage B questions, with committedBy: reconcilerId and stageId: Stage B
  5. Carried-forward entries from v1 retain their existing AVs (no new AV created for unchanged answers)
  6. Write v2 with the complete merged AnnotationAVMap

The latest version's AnnotationAVMap is always a complete snapshot of the gold standard — no need to traverse version history.

Cross-Stage Overlap

For questions appearing in multiple stages (structurally forced by the parent integrity rule — see Section 1):

  • Stage A reconciler answers q1 → creates AV on q1's reconciliation Annotation → v1 maps {ann-q1: av-1}
  • Stage B reconciler sees v1's answer for q1 as context, plus Stage B's candidate annotations for q1
  • Stage B reconciler provides their own answer for q1 (it's required — q1 is in Stage B's QSV)
  • If they agree with Stage A's answer, they confirm it. If they disagree, they change it.
  • A new AV is created on q1's reconciliation Annotation with committedBy: Stage B reconciler, stageId: Stage B
  • v2 maps {ann-q1: av-2} — the Stage B reconciler's AV

Cross-stage disagreement is resolved through the natural act of reconciliation — no separate conflict detection or resolution workflow is needed. The reconciler sees the previous answer, considers the candidates, and makes a deliberate decision.

Key Properties

  • Not a DDD entity: The reconciliation session is a materialised decision record, not a traditional aggregate. It has no lifecycle or state transitions — each version is an immutable fact.
  • Uses AV mechanism: Rather than a separate ReconciliationAnswer entity, reconciliation answers are AVs on reconciliation Annotations (D45). This unifies the annotation model — both candidate and reconciliation answers are AVs, just on different Annotation entities.
  • Complete snapshot: Each version contains ALL authoritative answers (newly reconciled + carried forward). The latest version is self-sufficient.
  • Audit trail: Version history shows exactly how the gold standard evolved — which stage, which reconciler, what changed at each step.
  • Per-answer attribution: Each reconciliation AV records committedBy (which reconciler) and stageId (which stage), preserving full provenance even when answers are carried forward across versions. Resolution metadata (Resolution, Rationale) is tracked per-question in ResolutionMetadata.

Key Rules

  • The reconciliation session is per study — one gold standard per study for the entire project.
  • Once a reconciliation AV exists for a question, no further candidate sessions are accepted for that study/question in the originating stage.
  • Resolution metadata (Resolution, Rationale) lives on ResolutionMeta in the session version, never on the annotation itself. Annotations are pure data; the reconciliation session carries authority and resolution tracking.
  • The Rationale field defaults to optional. Admins can configure it as required per stage (MVP), with potential per-question configurability in the future.
  • Concurrent stage reconciliation: Handled by optimistic concurrency — if two reconcilers submit for the same study simultaneously, the second writer detects a version mismatch, reloads, merges (stage questions are mostly disjoint), and retries. This is rare in practice (stages reach coverage at different times).

Section 3: Authority Determination Rules

When Is Authority Determined?

Authority is determined per stage when a study reaches sufficient annotation coverage. The outcome is a new version of the study's reconciliation session (see Section 2):

CompletedSessions == 0
  → Not ready. No action.

CompletedSessions == 1, MinAnnotators == 1
  → Auto-promote (SingleAnnotator).
  → System creates a new ReconciliationSessionVersion,
    adding ReconciliationAnswers for this stage's questions
    pointing to the candidate's AnnotationVersionIds.
  → Resolution = SingleAnnotator, ResolvedById = candidate's Id.

CompletedSessions == 1, MinAnnotators > 1
  → Not ready. Needs more annotators.

CompletedSessions >= 2
  → Reconciliation required (regardless of MinAnnotators value).
  → Study enters the reconciliation pool for this stage.

MinAnnotators Is a Readiness Threshold

MinAnnotators controls when a study is eligible for authority determination, not what kind of determination occurs. The actual completed session count determines the pathway:

  • If exactly 1 session completed and that meets MinAnnotators → auto-promote.
  • If 2+ sessions completed → reconciliation is always required, even if MinAnnotators was 1.

How can a study have more sessions than MinAnnotators? Admins assign annotators to stages, not individual studies. Multiple annotators working on the same stage will independently annotate the same studies. The system does not prevent this — it is expected and handled by the reconciliation pathway.


Section 4: Reconciliation Workflow

Random Assignment (Not Claiming)

Reconcilers are randomly assigned studies from the reconciliation pool. They do not browse or claim tasks.

Rationale:

  • Prevents cherry-picking (avoiding difficult studies)
  • Consistent with the screening annotation reconciliation pattern
  • Ensures fair workload distribution
  • Reduces bias in reconciliation outcomes

Workflow:

  1. Reconciler opens the reconciliation dashboard for a stage
  2. System presents a randomly-selected study from the unresolved pool
  3. Reconciler sees the full Project Question Set:
  4. Stage questions (required): candidate answers shown side-by-side (blinded)
  5. Previously reconciled questions (context): answers from earlier reconciliation session versions shown as read-only context
  6. Other project questions (optional): available for the reconciler to answer if they choose
  7. Reconciler reviews and resolves (or skips with reason)
  8. System creates a new ReconciliationSessionVersion, carrying forward previous answers and adding the new stage's reconciled answers
  9. System presents the next random study
  10. Pool shrinks as studies are resolved

Reconciler Always Creates Own Annotation

When reconciliation takes place (any scenario where a reconciler is involved), the reconciler always creates their own annotation, even if they agree with a candidate's answer.

Rationale:

  • Pointing ReconciliationAnswer.AnnotationVersionId to a candidate's annotation when the reconciler agrees would arbitrarily elevate one candidate over the other in the system.
  • The reconciler's annotation represents a deliberate decision, distinct from the candidates' independent assessments.
  • Clean attribution: ReconciliationAnswer.AnnotationVersionId always points to the reconciler's work.

How this works for agreed studies (bulk approve):

  • Reconciler reviews a batch of studies where all candidates agree.
  • On bulk approve, the system creates reconciliation annotations on behalf of the reconciler (copying the agreed answer, attributed to the reconciler).
  • ReconciliationAnswer for each question points to the reconciler's newly-created AnnotationVersionId.
  • Resolution = CandidateAgreement.

How this works for conflicted studies (manual resolution):

  • Reconciler manually reviews side-by-side comparisons.
  • Reconciler creates their own annotation for each question (may match a candidate or be entirely new).
  • ReconciliationAnswer points to the reconciler's AnnotationVersionId.
  • Resolution = ManualReconciliation.

Candidate Blinding — System Invariant

Candidate annotators must never see other candidates' answers during annotation. This is a system invariant for independent assessment, not merely a reconciliation UX consideration.

  • During annotation: annotator sees only their own previous answers (if returning to an in-progress session).
  • After completing their session: annotator still cannot see other candidates' answers.
  • Only reconcilers (and admins) see the side-by-side comparison of candidate answers.

This blinding rule applies to all annotation questions at all times, regardless of whether reconciliation is configured or has occurred.

Cross-Stage Visibility — Admin Configurable

When a question has been reconciled in a previous stage, the admin can configure how (or whether) that answer is shown to annotators and reconcilers in subsequent stages:

Setting Meaning
Blind Annotators and reconcilers in this stage do not see reconciliation answers from other stages
ShowOwnPrior Annotators see only their own prior answers from other stages (not the reconciled gold standard)
ShowReconciled Reconcilers see the reconciled answer from other stages as context (annotators remain blinded)

This is configured per stage by the project admin. Default: ShowReconciled for reconcilers, Blind for annotators (preserves independent assessment while giving reconcilers maximum context).


Section 5: Agreement Metrics

Metric Definitions

Two metrics are computed:

Percent Agreement (all question types):

PercentAgreement = (agreed_questions / total_questions) × 100
Simple, intuitive, universally applicable.

Cohen's Kappa (categorical questions only):

κ = (Po - Pe) / (1 - Pe)
where:
  Po = observed agreement proportion
  Pe = expected agreement by chance
Accounts for chance agreement. Only meaningful for categorical (boolean, select, checklist) questions. null for free-text and numeric questions.

Metric Granularity

Level Computation
Per question, per study Do all annotators agree on this question for this study?
Per question, across stage Percent agreement and kappa across all studies with 2+ annotators
Per study Percent of questions in agreement for this study
Stage aggregate Average percent agreement across all studies

When Metrics Are Computed

  • On session completion: When a study reaches 2+ completed sessions, per-study metrics are computed.
  • On reconciliation resolution: Updated to reflect resolved state.
  • On demand: Admin can trigger recomputation for a stage.
  • Background: MassTransit consumer handles computation asynchronously.

Section 6: Screening Integration

Separate Processes, Aligned Patterns

Screening annotation reconciliation and data annotation reconciliation are separate processes with aligned patterns and shared infrastructure.

Aspect Screening Reconciliation Annotation Reconciliation
What is reconciled Include/exclude decision + exclusion reasons Annotation answers per question
Authority mechanism FinalScreeningOutcome Reconciliation Session (versioned gold standard per study)
Pool management Random assignment from pool Random assignment from pool
Bulk operations Bulk approve agreed studies Bulk approve agreed studies
Reconciler annotation Creates own screening annotation Creates own data annotation
Outcome Study included or excluded Authoritative answer per question (latest session version)

Why not combine? The entities are fundamentally different (boolean include/exclude vs typed multi-question answers), the reconciliation UIs are different (simple decision vs side-by-side multi-question review), and the downstream effects are different (pipeline gating vs data authority).

What is shared: Pool management logic, random assignment algorithm, bulk approve patterns, metrics computation framework, permission model.

Ordering Constraint

When a project uses both screening and annotation stages:

Candidates complete screening + annotations
  → Screening Annotation Reconciliation
    → Reconciler resolves screening conflicts
      → FinalScreeningOutcome
        → Exclude: Study is excluded. No annotation reconciliation needed.
        → Include: Study is included. Check annotation sessions per stage.
          → CompletedSessions >= MinAnnotators?
            → Yes, 1 session, MinAnnotators==1:
                Auto-promote (SingleAnnotator)
                → New ReconciliationSessionVersion with auto-promoted answers
            → Yes, 2+ sessions:
                Study enters Annotation Reconciliation Pool for this stage
                → Reconciler resolves annotation conflicts
                  → New ReconciliationSessionVersion
                    (carries forward previous answers, adds this stage's)
                    → Study's gold standard evolves toward completeness
            → No:
                Not ready. Awaiting more annotation sessions.

Key constraint: Annotation reconciliation cannot proceed until screening reconciliation has determined that the study is included. Excluded studies are never reconciled for annotations — this saves work and avoids meaningless reconciliation of studies that won't be used.

No Bypass for Annotation Reconciliation (MVP)

Screening reconciliation supports bypass criteria (admin-configurable rules for skipping the pool) because the screening pipeline is an operational gate — studies must flow through to reach annotation. If every study required manual reconciliation, throughput would bottleneck.

Annotation reconciliation does not need bypass for MVP: - There is no equivalent pipeline pressure. Studies with agreed annotations get bulk-approved efficiently. - Removing reconciler involvement entirely (bypass) for some studies could undermine the quality assurance purpose of reconciliation. - Bulk approve already handles the efficiency concern for agreed studies without removing reconciler oversight. - Bypass can be considered as a future enhancement if demand arises.


Section 7: UI/UX Design Decisions

Question Management (Design + Assign Tabs)

  • Editing a question: Opens a "Save as New Version" flow. Admin sees the current version number and is informed they are creating a new version.
  • Version history: Panel showing all versions with diffs. Side-by-side comparison between any two versions.
  • Version badge: Displayed on each question card (e.g., "v3").
  • Locked questions: Questions that have been answered are marked as locked. Editing is still possible but triggers the versioning wizard.

Question Set Version Management

  • Creating a new QSV: Admin selects which AQVersionIds to include and their display order. Drag-and-drop reordering. Version selection per question. Parent integrity is enforced: all ancestors of any selected AQV are automatically included.
  • QSV immutability: Any change to the question set (add, remove, reorder, upgrade version) creates a new QSV.
  • Admin decision on active sessions: When a new QSV is assigned to a stage and sessions exist under the previous QSV, the admin chooses what to do. This is an admin decision, not a system-prescribed behaviour. Options may include: pin existing sessions to their starting QSV, invalidate in-progress sessions, require re-annotation for specific questions, etc. The system presents the options; the admin decides.

Annotation Form

  • Question version changes: If an admin creates a new QSV that upgrades a question version and configures re-annotation, annotators are informed of what changed. The notification is informational — it helps the annotator understand why they are being asked to re-answer.
  • Per-question saving: Form saves individual answers rather than the entire session, enabling auto-save and reducing data loss risk.

Reconciliation Dashboard

  • Random presentation: Reconciler clicks "Next study" to receive a randomly-assigned study from the pool. No browsing or selection.
  • Full Project Question Set: The reconciliation form shows all project AQs. The current stage's QSV questions are required; other project questions are optional. Previously reconciled answers (from earlier session versions) are shown as context.
  • Candidate blinding: Reconciler sees all candidates' answers side-by-side for the current stage's questions. Candidate annotators never see each other's answers (system invariant).
  • Anonymised candidates: During reconciliation review, candidate answers are presented anonymously (e.g., "Annotator A", "Annotator B") to prevent identity bias. The reconciler does not know which annotator gave which answer.
  • Cross-stage context: For questions that have been reconciled in previous stages, the reconciler sees the existing gold standard answer alongside the current stage's candidates (if applicable).
  • Bulk approve: For studies where all candidates agree, reconciler can approve in bulk. System creates reconciliation annotations on the reconciler's behalf and writes a new ReconciliationSessionVersion.
  • Manual resolution: For conflicted studies, reconciler enters their own answer per question. Optional rationale per decision (admin-configurable to require).
  • Metrics visibility: Per-question and per-study agreement metrics visible on the dashboard.

Section 8: Migration Strategy

Verified: No Existing Reconciliation Data

Production database verified via MongoDB MCP: - Sessions: 194,741 total — all have Reconciliation: false - Annotations: 3,096,894 total — all have Reconciled: false

There are zero reconciliation sessions or reconciled annotations in production. The reconciliation feature was partially implemented in the backend but never exposed in the UI and never used. This means: - No legacy reconciliation data handling is needed. - The Reconciled flag can be phased out cleanly. - Migration is a clean slate for reconciliation-related entities.

Migration Approach: Additive Only

No data is deleted or moved. All changes are additive.

Phase 1: Versioning Infrastructure

  • Add CurrentVersionNumber and Versions[] to AnnotationQuestion
  • Add Scope, OwnerId fields (default: Project scope, ProjectId as owner for existing questions)
  • Mark all existing questions as v1 (accurate — they have never been versioned)
  • Create initial QuestionSet and QuestionSetVersion for each stage based on current Stage.AnnotationQuestions
  • Add version reference fields to annotations (nullable initially, backfilled in Phase 2)

Phase 2: Reconciliation Session Backfill

  • For every study with exactly 1 completed session per stage where MinAnnotators == 1:
  • Create a ReconciliationSessionVersion with Resolution = SingleAnnotator answers
  • ReconciliationAnswer.AnnotationVersionId points to the candidate's (now v1) AnnotationVersionId
  • Studies with 0 completed sessions: no action (not ready)
  • Studies with 2+ completed sessions: enter the reconciliation pool (no auto-backfill — reconciler must review)

Phase 3: Feature Flag Rollout - New question management UI: behind feature flag (extends existing newQuestionManagement flag) - Reconciliation dashboard: behind new feature flag - Annotation form v2: behind feature flag (independent rollout) - Gradual enablement per project

Backward Compatibility

  • Existing annotation form continues to work throughout migration.
  • Existing question management UI continues to work.
  • API consumers see the same structures with additional optional fields.
  • The Reconciled boolean is preserved on existing annotations but not used for new authority determination (replaced by the Reconciliation Session model).

Section 9: Cross-Scope Sharing — Import, Fork, Publish

Discussion: GitHub Discussion #1598 — Feature Spec Question Templates (June 2024)

Problem

Many systematic review projects use the same or very similar annotation questions. Project admins spend time recreating identical questions across projects. CAMARADES also wants to provide "best practice methodology" question sets (e.g., risk of bias, reporting quality) and aggregate structured data across projects for automation/LLM training. This feature was specifically requested by the funder.

Design Principle: Reference-First with Copy-on-Write

Cross-scope sharing uses a reference model, not a copy model. Since AQVersions and QSVs are immutable, they are inherently safe to share across project boundaries — an immutable object can never cause consistency issues when referenced from multiple contexts.

Action Result
Import QSV as-is Stage references the shared QSV directly. No copy created.
Modify imported QSV System creates a new project-local QSV (DerivedFrom → source QSVId)
Use shared AQ as-is QSV references the shared AQVersionId directly. No copy created.
Modify imported AQ System creates a new project-local AQ (DerivedFrom → source QuestionId) with a new project-local AQVersion (DerivedFrom → source AQVersionId)

This follows established patterns: Git (shared immutable commits, fork to modify), Docker (shared immutable layers), npm (shared published versions, fork to patch), and Copy-on-Write filesystems (ZFS/Btrfs).

DDD justification: Immutable objects satisfy the safety requirements for cross-aggregate sharing. In DDD, Value Objects can be freely shared because they have no mutable state. AQVersions are effectively Value Objects with an ID for referencing convenience. The Shared Kernel pattern explicitly supports this — multiple bounded contexts referencing shared immutable concepts.

Unified Entity Model

The QuestionSet and QuestionSetVersion entities defined in Section 1 serve both as project-local question set configurations (assigned to stages) and as cross-project reusable templates. There is no separate "template" entity — a template is simply a QuestionSet at a non-Project scope, optionally with Published: true and rich Metadata.

Within a project context, question sets are referred to as "Question Sets". When browsing the cross-project catalogue, they are presented as "Templates" in the UI. The distinction is purely presentational.

Asset Scoping Model

All versionable assets (AQ, AQVersion, QS, QSV) exist at one of four ownership scope levels:

Ownership Scopes:
  System        → SyRF platform (curated by CAMARADES)
  Organisation  → Org-private library (see Section 10)
  Researcher    → Personal library
  Project       → Project-local

Community Catalogue:
  Not a scope level — it is a query: WHERE Published == true
  Any asset at System, Organisation, or Researcher scope can be published

Visibility rules:

Asset Scope Published: false Published: true
System All authenticated users All authenticated users
Organisation Org members only All authenticated users
Researcher Researcher only All authenticated users
Project Project members only N/A (promote to higher scope to publish)

Derivation rules:

  • Any scope can derive (fork) from any equal-or-higher-visibility scope
  • Derivation creates a new immutable asset at the target scope
  • The derived asset contains DerivedFrom pointing to the source
  • Unmodified assets in the source are referenced, not copied

Project Question Library

A project maintains a Question Library — a manifest of all AQ references available to the project. This is a flat set of AQVersionId references (not entity copies):

ProjectQuestionLibrary
├── Entries: Set<QuestionLibraryEntry>

QuestionLibraryEntry
├── QuestionId: Guid
├── VersionIds: List<Guid>            ← AQVersionIds available in this project
├── Ownership: Owned | Imported
└── SourceScope: System | Organisation | Researcher | Project

Import invariant: Any QSV assigned to a stage must have all of its constituent AQVersionIds already present in the project's Question Library. This is enforced at QSV assignment time.

The Fork Graph

The DerivedFrom fields on AQ, AQVersion, QS, and QSV create a fork graph (DAG) that provides full provenance. Any project-local asset can be traced back through its derivation chain to its origin:

AQ "Sample Size" (System scope)
  └── AQVersion v1 (System scope)
        ├── referenced as-is by Org A's QSV (no fork, shared reference)
        │     └── referenced as-is by Project X's stage (no fork)
        └── forked by Project Y → Project AQVersion v1
              (DerivedFrom = System AQVersion v1)
              (added "Include dropouts?" option)

Import Workflow

Import as-is (reference, no copy):

  1. Admin browses community catalogue (search/filter by metadata) or org/system libraries
  2. Selects a QuestionSet (and optionally specific questions if SelectiveImport policy)
  3. AQVersionIds are added to the project's Question Library
  4. Stage's active QSV is set to the selected QSVId (or a subset QSV is created)
  5. No new AQ or AQVersion entities created — the stage references the shared immutable assets
  6. Annotations will reference the shared AQVersionIds directly

Fork to customize:

  1. Admin wants to modify an imported QSV (add/remove/reorder questions, or edit a question)
  2. For modified questions: system creates project-local AQ + AQVersion with DerivedFrom pointers
  3. System creates project-local QSV with DerivedFrom pointer, referencing:
  4. Shared AQVersionIds for unchanged questions (shared reference)
  5. Project-local AQVersionIds for modified questions (forked)
  6. Stage now references the project-local QSV
  7. New project-local entities added to the Question Library

Publish Workflow

Publish project questions to the community catalogue:

  1. Admin selects questions from their project
  2. Admin enters QuestionSet metadata (name, domain, species, guidance URL)
  3. System creates a new QuestionSet at Researcher or Organisation scope with Published: true
  4. The QSV references the selected AQVersionIds (shared immutable references)
  5. QuestionSet versioned independently from that point

Cross-Project Aggregation

The reference model makes cross-project data aggregation trivially correct:

Projects using a shared QSV as-is:
  → Annotations directly reference shared AQVersionIds
  → Query: ReconciliationSession answers WHERE AQVersionId IN [shared AQVersionIds]
  → Result: All authoritative answers, semantically identical across projects

Projects that forked:
  → Follow DerivedFrom chain to identify forked AQVersions
  → Decide if forks are semantically comparable (did the fork change meaning?)

This is the key enabler for CAMARADES' LLM training data extraction use case: when hundreds of projects reference the same AQVersionId, their reconciliation session answers are structured, comparable training data with zero ambiguity about semantic equivalence.

System Questions as a QuestionSet

The 18 hardcoded system questions (with fixed GUIDs) should be remodelled as a CAMARADES curated QuestionSet — the "SyRF Core" question set at System scope with ImportPolicy: AllOrNothing and EditPolicy: ReadOnly. This provides:

  • A versioning pathway for system questions (currently impossible)
  • Conceptual alignment: system questions ARE shared questions from the platform
  • A migration path to eventually remove hardcoded GUIDs

This is a future migration, not an MVP requirement.

Physical Storage

All AQs live in a single global collection (pmAnnotationQuestion) regardless of scope. Scope + OwnerId fields provide access control. No routing logic needed — query by scope/owner.

All QuestionSets live in a single global collection (pmQuestionSet) regardless of scope. Same pattern.

Since QSVs reference AQVersionIds by Guid, resolution is straightforward — query pmAnnotationQuestion by AQVersionId. The immutability guarantee makes this efficient and infinitely cacheable.

Prioritization

Phase Scope Depends On
With M1 Scope, OwnerId, DerivedFrom fields on AQ/AQVersion (with defaults for existing data) Question versioning (M1)
With M1 pmQuestionSet collection, QS/QSV entities, stage references QSVId Question versioning (M1)
New milestone Curated catalogue, import-as-reference workflow, Question Library M1 complete
Follow-on Fork-to-customize workflow, community publishing, metadata search Import MVP
Follow-on Cross-project aggregation API, data export, LLM training pipeline Sharing + Reconciliation Session
Follow-on System questions as QuestionSet, update propagation notifications Sharing stable

Section 10: Organisation Model

Overview

Organisations provide a multi-tenancy layer above projects. Projects and users belong to organisations, enabling:

  • Private template libraries: Org-level question templates visible only to org members
  • Centralised user management: Invite/remove users at org level
  • Shared methodology: Enforce or recommend org-standard question sets
  • Permission hierarchy: Org-level roles in addition to existing project/stage roles

Entity Model

Organisation (New Aggregate — new collection: pmOrganisation)
├── Id: Guid
├── Name: string
├── Slug: string                      ← URL-friendly identifier
├── Settings: OrgSettings
│   ├── DefaultEditPolicy: Editable | ReadOnly | EditableWithWarning
│   ├── RequireTemplateForNewStages: bool
│   └── AllowCommunityPublishing: bool
├── Members: List<OrgMembership>
│   ├── InvestigatorId: Guid
│   ├── Role: OrgOwner | OrgAdmin | OrgMember
│   └── JoinedAt: DateTime
└── CreatedAt: DateTime

Relationship to Existing Entities

Organisation
  ├── has many → Projects (Project.OrganisationId: Guid?)
  ├── has many → Members (OrgMembership → Investigator)
  └── owns → Org-scoped AQs, AQVersions, QuestionSets

Project (modified)
  ├── OrganisationId: Guid?           ← nullable for backward compatibility
  └── (all existing fields unchanged)

Investigator (modified)
  ├── OrgMemberships: List<OrgMembershipRef>?
  │   ├── OrganisationId: Guid
  │   └── Role: OrgOwner | OrgAdmin | OrgMember
  └── (all existing fields unchanged)

Permission Hierarchy

System (SyRF platform)
  └── Organisation
        ├── OrgOwner:  Full org management, delete org, transfer ownership
        ├── OrgAdmin:  Manage members, manage org templates, create projects
        ├── OrgMember: View org templates, import into projects, create projects
        └── Project (inherits org context but NOT automatic access)
              ├── ProjectAdmin: Manage project settings, stages, members
              ├── ProjectMember: Participate in assigned stages
              └── Stage
                    ├── Annotate
                    ├── Reconcile
                    └── (other stage activities)

Key principle: Org-level roles do not automatically grant project-level access. An OrgAdmin can create projects and manage org templates, but must still be explicitly added to a project to access its data. This follows the principle of least privilege and prevents accidental data exposure.

Template Visibility with Organisations

Visibility matrix:

Who can see...          | System | Org A private | Org B private | Community | Project X (Org A)
─────────────────────── | ────── | ───────────── | ───────────── | ──────── | ─────────────────
System templates        | ✓      | ✓             | ✓             | ✓        | ✓
Org A templates         | ✗      | ✓             | ✗             | ✗        | ✓
Org B templates         | ✗      | ✗             | ✓             | ✗        | ✗
Community templates     | ✓      | ✓             | ✓             | ✓        | ✓
Project X local assets  | ✗      | ✗             | ✗             | ✗        | ✓

An OrgAdmin can publish an org template to the community catalogue, changing its visibility from org-private to public. This is a one-way promotion (private → public).

Researcher-Level Libraries

In addition to org and project scopes, individual researchers can maintain a personal question library:

Researcher-scoped assets:
├── Scope: Researcher
├── OwnerId: InvestigatorId
├── Visibility: private to the researcher
└── Can be imported into any project the researcher is a member of

This enables researchers to build up personal libraries of question sets they reuse across their projects and organisations.

Migration Approach

The organisation model is additive — existing projects and users continue to work without organisations:

  1. Project.OrganisationId is nullable — existing projects have no org
  2. Investigator.OrgMemberships is nullable — existing users have no org affiliations
  3. Organisations can be created and populated gradually
  4. Existing projects can be migrated into organisations when ready

Phase 1: Add nullable org fields to Project and Investigator Phase 2: Organisation CRUD, member management, org-level UI Phase 3: Org template libraries, visibility rules Phase 4: Org settings enforcement (e.g., require template for new stages)

MongoDB Storage

New collection: pmOrganisation

{
  _id: CSUUID("..."),
  name: "CAMARADES Edinburgh",
  slug: "camarades-edinburgh",
  settings: {
    defaultEditPolicy: "EditableWithWarning",
    requireTemplateForNewStages: false,
    allowCommunityPublishing: true
  },
  members: [
    {
      investigatorId: CSUUID("..."),
      role: "OrgAdmin",
      joinedAt: ISODate("2026-01-15T00:00:00Z")
    }
  ],
  createdAt: ISODate("2026-01-15T00:00:00Z")
}

Modified pmProject:

{
  _id: CSUUID("..."),
  organisationId: CSUUID("..."),  // nullable — null for legacy projects
  // ... all existing fields unchanged
}

Decision Log

Summary of all decisions made during brainstorming, for quick reference:

# Decision Rationale
D1 Identity + Immutable Versions pattern for all entities Provides full audit trail, enables time-travel queries, consistent pattern
D2 AQVersionId as sole reference in QSV QuestionId is derivable; single reference simplifies lookups
D3 Explicit AnnotationVersionIds in sessions (not computed filters) Eliminates ambiguity; each session knows exactly what it contains
D4 Annotation identity is study-scoped: (StudyId, AnnotatorId, QuestionId) Supports cross-stage question sharing via explicit version refs
D5 Reconciliation Session as authority mechanism (not Reconciled flag or separate FAO entity) Versioned gold standard per study; each stage's reconciliation is another version; latest version IS the authoritative answer set
D6 Resolution metadata on ReconciliationAnswer only, not on annotations Clean separation: annotations are pure data; the reconciliation session carries authority
D7 Reconciler always creates own annotation when involved Avoids arbitrarily elevating one candidate; clean attribution
D8 MinAnnotators is a readiness threshold, not authority mechanism CompletedSessions count determines pathway, not MinAnnotators
D9 Random assignment for reconciliation (not claiming) Prevents cherry-picking; fair distribution; matches screening pattern
D10 Candidate blinding is a system invariant Independent assessment requires annotators never see each other's answers
D11 QSV is immutable; admin decides session handling on change System does not prescribe what happens to sessions; admin chooses
D12 Rationale is optional by default, admin-configurable Per-stage for MVP; potentially per-question in future
D13 Separate screening and annotation reconciliation processes Different entities, UIs, and downstream effects; share infrastructure
D14 Screening reconciliation precedes annotation reconciliation Excluded studies skip annotation reconciliation entirely
D15 No annotation reconciliation bypass for MVP Bulk approve handles efficiency; bypass risks undermining quality assurance
D16 Reconciler annotations attributed to reconciler Bulk approve creates annotations on reconciler's behalf, not copied from candidates
D17 No legacy reconciliation migration needed Verified: 0 reconciled annotations in production
D18 Additive-only migration No data deleted; new fields alongside existing; rollback = $unset
D19 Reference-first with copy-on-write for cross-scope sharing Immutable assets are safe to share; eliminates duplication; cross-project aggregation is trivially correct
D20 Four-level ownership scoping: System, Organisation, Researcher, Project Enables private org libraries, personal libraries, and project-local customization
D21 Fork graph (DAG) for provenance tracking via DerivedFrom Complete traceable lineage; any asset traceable back to its origin
D22 QuestionSet/QuestionSetVersion unifies SQS and QuestionTemplate Single entity serves both as stage question config and cross-project template; eliminates TemplateVersion indirection
D23 Organisation as new aggregate with nullable foreign keys Additive adoption; existing projects/users unaffected until migrated
D24 Org roles do NOT grant automatic project access Principle of least privilege; explicit project membership still required
D25 System questions to be remodelled as CAMARADES curated QuestionSet (future) Enables versioning of system questions; conceptual alignment
D26 Researcher-level personal question libraries Researchers build reusable libraries across their projects
D27 Community is a Published flag, not a scope level Community catalogue = WHERE Published == true; avoids a fifth scope level; any ownership scope can publish
D28 Single global collection for all AQs (pmAnnotationQuestion) Scope + OwnerId fields for access control; no routing logic; shared immutable versions are safe
D29 Parent integrity: all AQV ancestors must be in the same QSV Enforced at QSV composition time; creates structural cross-stage question overlap
D30 Stages are unordered — can be concurrent and orthogonal No sequential ordering assumption; no "latest stage wins" precedence
D31 Cross-stage question overlap is structural, not exceptional Forced by parent integrity rule; expected and common pattern
D32 Cross-stage visibility is admin-configurable per stage Blind, ShowOwnPrior, or ShowReconciled; default: reconcilers see context, annotators are blinded
D33 Reconciliation session is a materialised decision record, not a DDD entity No lifecycle or state transitions; computed synchronously on the study document; not a separate aggregate
D34 Reconciliation form shows full Project Question Set Stage questions required, others optional; reconcilers have full project context; cross-stage answers visible
D35 Cross-stage disagreement resolved through natural reconciliation Later stage's reconciler sees and engages with earlier answers; no separate conflict resolution workflow needed
D36 Project Question Library as flat reference manifest Set of AQVersionId references, not entity copies; import invariant enforced at QSV assignment

Section 11: Annotation Versioning & Entity Model Refinements (D37-D50)

Source: annotation-versioning-design.md — brainstorming session (Feb 2026). These decisions refine D1-D36 above. Where they conflict, the D37-D50 decisions take precedence and the affected sections of this document have been updated inline (see audit trail comments).

Decision Table

# Decision Rationale
D37 DraftAQ is a separate type from AQ Creation phase has fundamentally different invariants (all properties mutable) vs modification phase (identity frozen, only content versionable). Conditional logic signals two domain concepts forced into one model.
D38 dataType, parentId, groupAsSingle are identity properties — immutable after publish Changing parent restructures entity subtrees and form shape. Changing dataType invalidates all AVs. These are not "new versions of the same question" — they are different questions.
D39 SQS does not need a Draft type SQS has no identity properties. Its only content (questionIds) is the same shape for initial assignment and subsequent modification. pendingChanges is sufficient.
D40 "Pending" terminology for existing entities, "Draft" for under-construction Avoids confusion. "Draft" = not yet real. "Pending" = real entity with uncommitted edits.
D41 Annotations are own aggregate in pmAnnotation collection Avoids Study document contention and unbounded growth. Annotation + embedded AVs form natural aggregate boundary.
D42 Sessions are own aggregate in pmAnnotationSession collection Same reasoning as D41. Session + embedded ASVs form natural aggregate boundary.
D43 AVs embedded in Annotation document (not separate collection) Annotation and its AVs must be consistent (currentAVId must match). Embedding gives single-doc atomicity for AV creation.
D44 Cross-stage annotation sharing via same Annotation entity with multiple AVs No forking, no copying. Both stages' sessions reference the same annotationId. Editing creates a new AV; other stages' ASVs pin to the older AV.
D45 Reconciliation annotations have annotatorId: null and track authorship on AVs Reconciliation annotations are shared across stages AND reviewers. The gold standard has no single owner — authorship tracked per-AV via committedBy.
D46 Server-side auto-save via pendingAnswer field Users change machines/browsers. Client-only auto-save (IndexedDB) is insufficient. pendingAnswer is a mutable field on the Annotation document, updated with each auto-save.
D47 Save/Complete operations use MongoDB multi-document transactions Save touches N annotation docs + 1 session doc. Transactions ensure consistency. Acceptable overhead for infrequent explicit user actions.
D48 Graduated impact assessment at commit time, not immutable property restrictions No property is "forbidden to edit" — the system assesses impact and warns proportionally. A dataType change with annotations is allowed, it just requires creating a new AQ (because it IS a new question).
D49 entityInstanceId concept removed — annotationId serves as stable entity identity The existing Annotation.Id already provides stable identity for entity subtree references. No new concept needed. Migration is direct: current Id becomes annotationId.
D50 Study document holds only references (sessionIds, annotationIds) Study is slimmed to metadata + reference arrays. Actual annotation and session data lives in their own collections. Prevents document bloat and contention.

Collection Boundaries Summary

Collection Aggregate root Contents Rationale
pmStudy Study Study metadata + sessionIds: GUID[] + annotationIds: GUID[] (references only) Slimmed to avoid contention and bloat (D50)
pmAnnotation Annotation Annotation identity + pendingAnswer + embedded avs: AV[] Own aggregate boundary; grows with saves (D41, D43)
pmAnnotationSession Session Session metadata + annotationIds + embedded asvs: ASV[] Own aggregate boundary; grows with saves (D42)
pmProject Project DraftAQs, AQs (or refs), SQS config Unchanged
pmAnnotationQuestion AQ AQ identity + embedded AQVersions Global collection, scope-based access (D28)
pmQuestionSet QS QS identity + embedded QSVs Global collection, scope-based access

Key Patterns Established

  • DraftAQ → AQ lifecycle: Fully mutable drafts convert to versioned entities on activation (D37)
  • Identity / Content / Derived property classification: Determines which properties can be versioned vs require a new entity (D38)
  • Pending buffers: pendingChanges on AQ/SQS and pendingAnswer on Annotation enable server-side auto-save without creating versions (D40, D46)
  • Reconciliation via AV mechanism: Reconciliation answers are AVs on annotatorId: null annotations, unifying candidate and reconciliation answer storage (D45)

Open Items

Items that remain unresolved or require product-owner input:

  1. Decimal comparison tolerance: Should decimal answer comparison use a tolerance for agreement detection? Current decision: exact match (safer). Tolerance can be added later.

  2. Reconciliation access control: Who can reconcile? Options: (a) any project member, (b) designated reconcilers only, © project administrators only. Recommendation: (b) designated reconcilers, consistent with screening.

  3. Export format changes: Adding version references and reconciliation metrics to exports may affect downstream tools. Survey current consumers before finalising.

  4. Legacy annotation audit: Should we audit for annotations referencing non-existent questions before migration? Low risk (versioning is additive) but good hygiene.

  5. Admin UI for session handling on QSV change: Exact options and UX for the admin decision when assigning a new QSV to a stage with active sessions. Needs UX design.

  6. Notification system for re-annotation: When admin requires re-annotation after a question version change, how are annotators notified? In-app notification, email, or both?

  7. QuestionSet update notifications: When a shared QuestionSet author publishes a new version, should projects referencing the old version be notified? How? Opt-in upgrade flow needs UX design.

  8. QuestionSet deprecation policy: What happens when a shared QuestionSet is deprecated? Projects still reference immutable versions (safe), but the catalogue should communicate lifecycle status.

  9. Organisation migration strategy: When and how do existing projects migrate into organisations? Voluntary opt-in? Bulk migration? What happens to projects that never join an org?

  10. Auth0 integration for org-level roles: How do org-level permissions integrate with the existing Auth0-based authentication? New claims/roles needed?

  11. Selective import and parent integrity interaction: For QuestionSets with SelectiveImport policy, the import UI needs to show a question picker. Parent integrity means selecting a child question auto-includes all ancestors — UX should make this clear.

  12. Optimistic concurrency strategy for reconciliation session: When concurrent stage reconciliation produces a version conflict, the merge strategy is straightforward (disjoint stage questions), but the implementation details (retry logic, UI feedback) need design.

  13. ReconciliationAnswer per-question rationale vs per-session: Current design supports per-answer rationale. Is this needed for MVP, or is per-session rationale sufficient? Per-answer is richer but adds UI complexity.

  14. Reconciliation session re-reconciliation workflow: When a stage needs re-reconciliation (e.g., new candidate data), how should previously carried-forward answers from other stages be handled? Preserve as-is, or present for re-confirmation?