Collection Infrastructure¶
Summary¶
Phase 3 creates the database foundations that the question versioning and annotation systems will live in. Think of it as building new filing cabinets before we can reorganise the files. No user-visible changes, but without these structures, Phases 4-7 cannot proceed.
The Problem¶
Today, all question data is stored inside the project record itself. As projects grow -- more questions, more versions, more annotation data -- this single-document approach becomes a bottleneck. Multiple team members working at the same time can interfere with each other's saves, and the amount of data that can fit in a single record has hard limits imposed by the database.
We need dedicated storage spaces for versioned questions and question sets, designed from the start with the right structure to support future cross-project sharing and PRISMA compliance.
What We're Building¶
Two new database collections (dedicated storage areas) are being created:
-
Annotation Questions collection -- A dedicated space for versioned annotation questions. Each question will have its own record with room to grow as new versions are added. Questions include ownership and scope information that will enable future cross-project sharing.
-
Question Sets collection -- A dedicated space for question set snapshots. Each snapshot captures exactly which questions (at which versions, in which order) are assigned to a stage at a given point in time. This is the foundation of the audit trail -- being able to reconstruct what annotators saw at any point.
Additionally, we are adding a safety mechanism called "optimistic concurrency" to study records. This prevents a situation where two people saving at the same time accidentally overwrite each other's work -- the system detects the conflict and asks the second person to retry.
Why This Matters¶
Although users will not see any changes from this phase, it is a critical enabler. The infrastructure choices made here -- how data is partitioned, how concurrent access is handled, how scope fields are structured -- directly affect the performance and reliability of every feature built in Phases 4-7. Getting the foundations right now avoids costly restructuring later.
How It Connects¶
This infrastructure phase is a prerequisite for Phase 4 (Question Lifecycle), which builds the draft-to-activation workflow on top of these collections. Without dedicated collections, the versioning system would not have anywhere to store its data. The scope and ownership fields prepare for cross-project question sharing in a future release.
See also: Platform Architecture | Technical Spec