Study Lifecycle Status Model and Source Type Taxonomy¶
Purpose¶
This document defines the two classification systems that enable PRISMA 2020 flow diagram count derivation:
- Study Lifecycle Status -- the 9-state model tracking a study's position in the systematic review pipeline
- Source Type Taxonomy -- the 6-value classification of search sources into PRISMA's dual-column structure
Together, these systems constrain every PRISMA box derivation rule. Without a clear lifecycle model, PRISMA boxes cannot be populated. Without source types, the PRISMA dual-column structure (databases/registers vs. other sources) cannot be supported.
This document is a binding constraint on Phases 12-16 of the SyRF platform evolution.
Normative language: "MUST" indicates an absolute requirement. "SHALL" indicates mandatory behavior. "SHOULD" indicates a strong recommendation. "MAY" indicates optional behavior.
Cross-references: prisma-flow-diagram-mapping.md (PRISMA box-to-field mapping), three-level-data-model.md (Publication/Citation/Study entity specifications)
Requirement coverage: PRISMA-01, PRISMA-02, PRISMA-04, PRISMA-05
2. Critical Distinction: Lifecycle Status vs. Screening Outcome¶
This section prevents the single most dangerous data model mistake. Conflating lifecycle status with screening outcome makes per-profile multi-stage pipelines impossible, breaking PRISMA boxes 4-5 and 8-9. Read this section before any lifecycle or screening implementation.
Lifecycle Status¶
Lifecycle Status tracks the study's position in the overall review pipeline. It is a SINGLE value on the Study entity. It answers: "Where is this study in the process?"
- Stored as:
Study.lifecycleStatus(enum, one value at a time) - Changed by: system events (import, dedup, retrieval) or admin actions
- Scope: the study's existence and availability in the review, regardless of any screening decision
Screening Outcome¶
Screening Outcome tracks per-profile inclusion/exclusion decisions. It is an ARRAY of per-profile results on the Study entity. It answers: "What was decided about this study under criteria set X?"
- Stored as:
Study.screeningOutcomes[](array, one entry per screening profile) - Changed by: screener decisions, reconciliation outcomes
- Scope: a specific screening profile's determination about this study
Why They Are Different¶
-
A study can be Excluded under one screening profile and Included under another. In a multi-stage pipeline (e.g., title/abstract screening followed by full-text screening), the same study participates in multiple screening profiles. Each profile produces its own outcome independently.
-
Lifecycle status is about the study's existence in the review, not about any particular screening decision. A study that is "Excluded" at title/abstract screening still exists in the review -- it is still
Activein terms of lifecycle. It simply did not pass one screening gate. -
PRISMA uses BOTH systems for different boxes:
- Lifecycle status feeds: box 3 (duplicates, automation removal), box 6-7 (retrieval), box 10/16 (included studies)
-
Screening outcomes feed: box 4-5 (title/abstract screening), box 8-9 (full-text screening)
-
Changing a screening profile's agreement mode MUST NOT require updating a lifecycle status field. If it does, the model is wrong.
Worked Example¶
Consider a study imported from PubMed:
- Import: Study created with
lifecycleStatus = Active - Title/Abstract screening: Screeners evaluate the study under the T/A screening profile
screeningOutcomes[0] = { profileId: TA_PROFILE, result: Excluded, reason: "Wrong population" }lifecycleStatusis STILLActive-- the study was not removed from the review, just excluded under one profile- Alternative scenario: If the study passes ALL screening profiles and is fully included:
screeningOutcomes[0] = { profileId: TA_PROFILE, result: Included }screeningOutcomes[1] = { profileId: FT_PROFILE, result: Included }lifecycleStatustransitions toIncluded-- the study has reached the terminal state
The lifecycle status transitions to Included only when the study passes all required screening profiles. Individual profile exclusions do NOT change lifecycle status.
3. StudyLifecycleStatus Enum¶
public enum StudyLifecycleStatus
{
// === Import phase ===
Active = 0, // Default. Study is available for screening/review.
// This is the initial state for all imported studies.
// === Dedup phase ===
Duplicate = 1, // Confirmed duplicate (auto-confirmed or admin-confirmed).
// Study is excluded from all stage pools.
PendingDuplicateReview = 2, // Probable duplicate awaiting admin review.
// Study is temporarily excluded from stage pools.
// === Retrieval phase (PRISMA boxes 6-7) ===
FullTextSought = 3, // Full text retrieval has been attempted.
FullTextNotRetrieved = 4, // Full text could not be obtained.
// Study excluded from full-text screening pools.
// === Terminal states ===
Included = 5, // Final: study included in the review.
// Determined when study passes all required screening profiles.
Merged = 6, // Merged into another study during duplicate resolution.
// Original data preserved but study excluded from pools.
// === Pre-screen removal (PRISMA box 3) ===
RemovedByAutomation = 7, // Removed by automation tool before screening.
RemovedOther = 8 // Removed for other pre-screen reasons.
}
Detailed Status Definitions¶
Active (0) -- Default State¶
| Property | Value |
|---|---|
| Definition | Study is available for screening, annotation, and review. This is the default state. |
| Trigger | Study import (initial creation); or admin reversal of Duplicate/PendingDuplicateReview; or successful full-text retrieval (FullTextSought -> Active) |
| Set by | System (on import), Admin (on reversal) |
| Appears in stage pools | Yes -- Active studies are the ONLY studies that appear in screening stage pools |
| PRISMA box | Not directly counted. Active is the working state; studies in Active participate in screening (boxes 4, 5, 8, 9) via screeningOutcomes. |
Duplicate (1) -- Confirmed Duplicate¶
| Property | Value |
|---|---|
| Definition | Confirmed duplicate of another study. Either auto-confirmed by the dedup algorithm (high confidence) or manually confirmed by an admin. |
| Trigger | ASySD auto-confirmation (high confidence pair); or admin confirmation of probable duplicate |
| Set by | System (auto-confirm), Admin (manual confirm) |
| Appears in stage pools | No -- Duplicate studies are excluded from all stage pools |
| PRISMA box | Box 3: duplicates count. The total duplicate count is derived from COUNT(Citations) - COUNT(unique Studies WHERE lifecycleStatus NOT IN (Duplicate, Merged)). |
PendingDuplicateReview (2) -- Probable Duplicate¶
| Property | Value |
|---|---|
| Definition | Probable duplicate flagged by the dedup algorithm but below the auto-confirmation confidence threshold. Awaiting admin review in the duplicate review queue. |
| Trigger | ASySD flags pair as probable duplicate (below auto-confirm threshold) |
| Set by | System (dedup algorithm) |
| Appears in stage pools | No -- PendingDuplicateReview studies are conservatively excluded from pools to prevent screening duplicates |
| PRISMA box | Not directly counted in PRISMA. If confirmed as duplicate, moves to Duplicate and counted in box 3. If rejected, returns to Active. |
FullTextSought (3) -- Retrieval Attempted¶
| Property | Value |
|---|---|
| Definition | Full text retrieval has been attempted for this study. This is a transitional state between title/abstract screening and full-text screening. |
| Trigger | Admin or system initiates full-text retrieval for a study that passed title/abstract screening |
| Set by | System (retrieval workflow), Admin (manual action) |
| Appears in stage pools | No -- study is in retrieval limbo, not yet available for full-text screening |
| PRISMA box | Box 6 (dbr_sought_reports) for Column 1 sources; Box 12 (other_sought_reports) for Column 2 sources. Count includes studies that have been sought, whether or not retrieval succeeded. |
FullTextNotRetrieved (4) -- Retrieval Failed¶
| Property | Value |
|---|---|
| Definition | Full text could not be obtained. The study cannot proceed to full-text screening. This is a terminal state (but admin can override). |
| Trigger | Retrieval process reports failure; admin marks as not retrievable |
| Set by | System (retrieval failure), Admin (manual determination) |
| Appears in stage pools | No -- study cannot be screened without full text |
| PRISMA box | Box 7 (dbr_notretrieved_reports) for Column 1 sources; Box 13 (other_notretrieved_reports) for Column 2 sources. |
Included (5) -- Final Inclusion¶
| Property | Value |
|---|---|
| Definition | Study is included in the systematic review. This is a terminal state determined when the study passes all required screening profiles. |
| Trigger | Study passes all required screening profiles (all screeningOutcomes show result = Included) |
| Set by | System (automatically when all screening profiles resolve to Included) |
| Appears in stage pools | No -- study has completed the pipeline. It MAY appear in annotation pools for data extraction. |
| PRISMA box | Box 10 (new_studies, new_reports), Box 16 (total_studies, total_reports). |
Merged (6) -- Merged During Dedup¶
| Property | Value |
|---|---|
| Definition | Study has been merged into another study during duplicate resolution. The original study's data (Citations) is preserved but the study itself is no longer an active participant in the review. |
| Trigger | Admin confirms merge during duplicate review; Citations moved to canonical study |
| Set by | Admin (via merge workflow) |
| Appears in stage pools | No -- the merged study is superseded by the canonical study |
| PRISMA box | Box 3: Merged studies contribute to the duplicates count (Citations count - unique Studies). |
RemovedByAutomation (7) -- Pre-Screen Automation Removal¶
| Property | Value |
|---|---|
| Definition | Study was removed by an automation tool before screening. Examples: machine learning classifiers, rule-based filters, format validators. |
| Trigger | Automation tool marks study as ineligible |
| Set by | System (automation tool) |
| Appears in stage pools | No |
| PRISMA box | Box 3: excluded_automatic count. |
RemovedOther (8) -- Pre-Screen Other Removal¶
| Property | Value |
|---|---|
| Definition | Study was removed for pre-screening reasons other than automation or deduplication. Examples: retracted publications, known irrelevant records, admin cleanup. |
| Trigger | Admin marks study for removal |
| Set by | Admin |
| Appears in stage pools | No |
| PRISMA box | Box 3: excluded_other count. |
4. Valid State Transitions¶
State Transition Diagram¶
+--- RemovedByAutomation (terminal)
|
Import ---> Active ---+---> Duplicate (terminal, reversible by admin)
|
+---> PendingDuplicateReview ---+---> Duplicate
| +---> Active (admin: not duplicate)
|
+---> FullTextSought ---+---> FullTextNotRetrieved (terminal)
| +---> Active (full text retrieved successfully)
|
+---> Included (terminal)
|
+---> Merged (terminal)
|
+---> RemovedOther (terminal)
|
+--- RemovedByAutomation (terminal)
State Transition Table¶
| # | From State | To State | Trigger | Actor | Reversible? | Notes |
|---|---|---|---|---|---|---|
| T1 | (new) | Active | Study imported | System | N/A | Initial state for all imported studies |
| T2 | Active | Duplicate | Dedup auto-confirm (high confidence) | System | Yes (admin) | Admin can reverse to Active |
| T3 | Active | PendingDuplicateReview | Dedup flags probable duplicate | System | Yes (resolves to Duplicate or Active) | Transitional state |
| T4 | Active | FullTextSought | Full-text retrieval initiated | System/Admin | Yes (returns to Active on success) | Transitional state |
| T5 | Active | Included | All screening profiles pass | System | Admin override only | Terminal state |
| T6 | Active | Merged | Admin confirms merge | Admin | Admin override only | Terminal state |
| T7 | Active | RemovedByAutomation | Automation tool marks ineligible | System | Admin override only | Terminal state |
| T8 | Active | RemovedOther | Admin removes for other reasons | Admin | Admin override only | Terminal state |
| T9 | PendingDuplicateReview | Duplicate | Admin confirms duplicate | Admin | Yes (admin can reverse to Active) | Admin review resolution |
| T10 | PendingDuplicateReview | Active | Admin rejects duplicate | Admin | N/A (already resolved) | Admin review resolution |
| T11 | Duplicate | Active | Admin reverses dedup decision | Admin | Yes | Exceptional: admin determines studies are not duplicates |
| T12 | FullTextSought | Active | Full text obtained successfully | System/Admin | N/A | Study returns to Active for full-text screening |
| T13 | FullTextSought | FullTextNotRetrieved | Retrieval fails | System/Admin | Admin override only | Terminal state |
Key Transition Rules¶
- Only Active studies appear in screening stage pools. This is a system invariant. All other lifecycle states exclude the study from all screening pools.
- PendingDuplicateReview studies are conservatively excluded from pools. Even though they may not be duplicates, they are withheld from screening until an admin resolves the status. This prevents wasted screening effort on probable duplicates.
- Duplicate -> Active is allowed. An admin can reverse a dedup decision if they determine the studies are not actually duplicates. This is important for quality assurance.
- FullTextSought -> Active is the normal flow. When full text is obtained, the study returns to Active and becomes available for full-text screening.
- Terminal states (Included, Merged, RemovedByAutomation, RemovedOther, FullTextNotRetrieved) are normally irreversible. Admin can override in exceptional cases, but the system does not provide routine transition paths back from terminal states.
- Lifecycle status transitions MUST NOT be triggered by screening outcomes. The system SHALL NOT automatically change
lifecycleStatuswhen a single screening profile excludes a study. The transition toIncludedhappens ONLY when all required screening profiles resolve to Included.
5. Screening Outcome Model¶
This section defines the per-profile screening outcome structure that complements the lifecycle status model. The full ScreeningOutcome specification will be detailed in Phase 15; this placeholder ensures lifecycle status and screening outcomes are architecturally separate from the start.
ScreeningOutcome Structure¶
// Embedded array on Study: Study.screeningOutcomes[]
public class ScreeningOutcome
{
public Guid ProfileId { get; set; } // Which screening profile
public Guid StageId { get; set; } // Which stage
public ScreeningResult Result { get; set; } // Included, Excluded, Conflict, Pending
public string? PrimaryExclusionReason { get; set; } // For PRISMA box 9/15
public DateTime? ResolvedAt { get; set; }
public ScreeningAuthority Authority { get; set; } // CandidateAgreement, Reconciled
}
public enum ScreeningResult
{
Pending = 0, // Not yet determined
Included = 1, // Passed this screening profile
Excluded = 2, // Failed this screening profile
Conflict = 3 // Screening disagreement, awaiting reconciliation
}
public enum ScreeningAuthority
{
CandidateAgreement = 0, // All screeners agreed
Reconciled = 1 // Reconciler resolved disagreement
}
How Screening Outcomes and Lifecycle Status Interact¶
-
Screening outcomes are per-profile. A study may have outcomes for multiple screening profiles (e.g., one for title/abstract, one for full-text). Each entry in
screeningOutcomes[]represents one profile's determination. -
Lifecycle status reflects the aggregate result. The system evaluates all
screeningOutcomes[]entries to determine lifecycle transitions: - If ALL required profiles have
Result = Included, the system MAY transitionlifecycleStatustoIncluded. -
If any profile has
Result = Excluded, the study does NOT change lifecycle status -- it remainsActivebut is excluded from downstream pools governed by that profile's outcome. -
PRISMA counting uses both independently:
- Box 4 (
records_screened): Uses stage pool membership (Active studies entering screening) - Box 5 (
records_excludedat T/A): UsesscreeningOutcomes[TA_PROFILE].Result = Excluded - Box 8 (
dbr_assessed): Uses stage pool membership (studies entering FT screening) - Box 9 (
dbr_excludedwith reasons): UsesscreeningOutcomes[FT_PROFILE].PrimaryExclusionReason - Box 10 (
new_studies): UseslifecycleStatus = Included
Note: This is a design-phase specification. The full ScreeningOutcome model will be detailed in Phase 15. This placeholder ensures lifecycle status and screening outcomes are architecturally separate from the start.
6. Source Type Taxonomy¶
6a. SearchSourceType Enum¶
public enum SearchSourceType
{
// Column 1: Databases and Registers (PRISMA boxes 2, 4-9)
Database = 0, // Bibliographic databases: PubMed, Embase, CINAHL,
// Web of Science, Scopus, PsycINFO, etc.
Register = 1, // Study registers: ClinicalTrials.gov, CENTRAL, ICTRP, etc.
// Column 2: Other Sources (PRISMA boxes 11-15)
Website = 2, // Website searches (including Google Scholar for
// subject searching in some cases)
Organisation = 3, // Organisations contacted for studies
CitationSearching = 4, // Forward/backward citation chasing
Other = 5 // Other methods: expert contacts, conference abstracts, etc.
}
6b. PRISMA Column Assignment¶
| Source Type | PRISMA Column | PRISMA Boxes | Description |
|---|---|---|---|
| Database | Column 1: Databases and Registers | box2, box4-box9 | Bibliographic databases (PubMed, Embase, Scopus, etc.) |
| Register | Column 1: Databases and Registers | box2, box4-box9 | Study registers (ClinicalTrials.gov, CENTRAL, ICTRP) |
| Website | Column 2: Other Sources | box11-box15 | Website searches |
| Organisation | Column 2: Other Sources | box11-box15 | Organisations contacted for studies |
| CitationSearching | Column 2: Other Sources | box11-box15 | Forward/backward citation chasing |
| Other | Column 2: Other Sources | box11-box15 | Other methods (expert contacts, conference abstracts) |
Column assignment rule:
- Column 1 (Databases/Registers):
sourceType IN (Database, Register) - Column 2 (Other Sources):
sourceType IN (Website, Organisation, CitationSearching, Other)
6c. Entity Modifications¶
Fields to add to SystematicSearch:
| Field | Type | Nullable | Description |
|---|---|---|---|
sourceType |
SearchSourceType? | Yes (nullable for migration) | PRISMA source classification |
sourceName |
string? | Yes | Free-text source name (e.g., "PubMed", "Embase") |
Fields to add to SearchImportJob:
| Field | Type | Nullable | Description |
|---|---|---|---|
sourceType |
SearchSourceType? | Yes | Source type known at import time |
sourceName |
string? | Yes | Source name known at import time |
Rationale for both entities: The SystematicSearch entity represents the search itself, while SearchImportJob represents a specific import operation. Source type is known at import time and propagated to the SystematicSearch for PRISMA reporting. Both entities carry the field to ensure traceability from import to PRISMA count.
6d. Common Source Name Registry¶
The following table provides suggested standard source names to promote consistency across projects. Source names are free text, not an enum -- this registry is advisory only.
| SearchSourceType | Suggested Names |
|---|---|
| Database | PubMed, Embase, CINAHL, Web of Science, Scopus, PsycINFO, MEDLINE, Cochrane Library |
| Register | ClinicalTrials.gov, CENTRAL, ICTRP, WHO ICTRP, EU Clinical Trials Register |
| Website | Google Scholar, specific institutional websites |
| Organisation | (free text: organization name) |
| CitationSearching | Forward citation, Backward citation, Snowballing |
| Other | Expert contact, Conference abstract, Grey literature |
Note: Source name is free text, not an enum. The registry is advisory to promote consistency. A future UI enhancement MAY provide autocomplete suggestions from this registry.
6e. LibraryFileType to SourceType Inference¶
For migration (Phase 16), the following table documents which LibraryFileType values can be safely inferred to a SearchSourceType. This enables automated backfill of the sourceType field on existing SystematicSearch records.
| LibraryFileType | Inferred SourceType | Confidence | Rationale |
|---|---|---|---|
| PubmedXml | Database | HIGH | PubMed is always a bibliographic database |
| EndnoteXml | Unknown | LOW | Could be any source exported via EndNote |
| LivingSearchJson | Unknown | LOW | Depends on search configuration; could be database or register |
| TsvLibrary | Unknown | LOW | Generic format; could come from any source type |
| CsvLibrary | Unknown | LOW | Generic format; could come from any source type |
Migration rule: Only PubmedXml SHALL be automatically backfilled to sourceType = Database. All other file types SHALL leave sourceType as null, requiring admin manual classification. The admin interface SHALL provide a bulk classification tool for setting source types on existing searches.
7. PRISMA Count Derivation Rules¶
This section provides the complete derivation table linking every PRISMA box to the exact query that populates it. All 17 boxes and 34 fields are covered.
Notation conventions:
- ss = SystematicSearch entity
- s = Study entity
- c = Citation (embedded on Study as s.citations[])
- s.sourceColumn = derived from the sourceType of the study's primary Citation. Column 1 = Database or Register; Column 2 = Website, Organisation, CitationSearching, or Other
Box 1: Previous Studies (Updated Reviews -- DEFERRED)¶
| PRISMA Field | Derivation | Data Source | Status |
|---|---|---|---|
previous_studies |
N/A -- SyRF does not support updated reviews | N/A | DEFERRED |
previous_reports |
N/A -- SyRF does not support updated reviews | N/A | DEFERRED |
Box 2: Records Identified from Databases and Registers¶
| PRISMA Field | Derivation | Data Source |
|---|---|---|
database_results |
SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType = Database |
SystematicSearch (ss) |
database_specific_results |
GROUP BY ss.sourceName WHERE ss.projectId = :projectId AND ss.sourceType = Database; FORMAT "{sourceName} (n={numberOfCitations})" |
SystematicSearch |
register_results |
SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType = Register |
SystematicSearch |
register_specific_results |
GROUP BY ss.sourceName WHERE ss.projectId = :projectId AND ss.sourceType = Register; FORMAT "{sourceName} (n={numberOfCitations})" |
SystematicSearch |
Box 3: Records Removed Before Screening¶
| PRISMA Field | Derivation | Data Source |
|---|---|---|
duplicates |
LET totalCitations = SUM(COUNT(s.citations)) across all s WHERE s.projectId = :projectId; LET uniqueActiveStudies = COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus NOT IN (Duplicate, Merged); RETURN totalCitations - uniqueActiveStudies |
Citation ©, Study (s) |
excluded_automatic |
COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus = RemovedByAutomation |
Study |
excluded_other |
COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus = RemovedOther |
Study |
Box 4: Records Screened (Databases/Registers)¶
| PRISMA Field | Derivation | Data Source |
|---|---|---|
records_screened |
COUNT(s) WHERE s.projectId = :projectId AND s ENTERED title/abstract screening stage pool AND s.sourceColumn = 1 |
Study + Stage pool tracking |
Note: "Entered stage pool" requires stage pool tracking. The precise mechanism is specified in Phase 14 (Stage Filtering). For PRISMA purposes, a study is "screened" if it was made available to screeners in the title/abstract screening stage, regardless of the outcome.
Box 5: Records Excluded at Title/Abstract (Databases/Registers)¶
| PRISMA Field | Derivation | Data Source |
|---|---|---|
records_excluded |
COUNT(s) WHERE s.projectId = :projectId AND s.screeningOutcomes[titleAbstractProfileId].Result = Excluded AND s.sourceColumn = 1 |
Study.screeningOutcomes |
Box 6: Reports Sought for Retrieval (Databases/Registers)¶
| PRISMA Field | Derivation | Data Source |
|---|---|---|
dbr_sought_reports |
COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus IN (FullTextSought, Active, Included) AND s.screeningOutcomes[titleAbstractProfileId].Result = Included AND s.sourceColumn = 1 |
Study |
Note: This counts studies that passed T/A screening and entered the retrieval phase. Studies with lifecycleStatus = Active are included because a successful retrieval returns the study to Active for FT screening.
Box 7: Reports Not Retrieved (Databases/Registers)¶
| PRISMA Field | Derivation | Data Source |
|---|---|---|
dbr_notretrieved_reports |
COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus = FullTextNotRetrieved AND s.sourceColumn = 1 |
Study |
Box 8: Reports Assessed for Eligibility (Databases/Registers)¶
| PRISMA Field | Derivation | Data Source |
|---|---|---|
dbr_assessed |
COUNT(s) WHERE s.projectId = :projectId AND s ENTERED full-text screening stage pool AND s.sourceColumn = 1 |
Study + Stage pool |
Box 9: Reports Excluded with Reasons (Databases/Registers)¶
| PRISMA Field | Derivation | Data Source |
|---|---|---|
dbr_excluded |
GROUP BY s.screeningOutcomes[fullTextProfileId].PrimaryExclusionReason WHERE s.projectId = :projectId AND s.screeningOutcomes[fullTextProfileId].Result = Excluded AND s.sourceColumn = 1; FORMAT "{reason} (n={count})" |
Study.screeningOutcomes |
Box 10: New Studies Included¶
| PRISMA Field | Derivation | Data Source |
|---|---|---|
new_studies |
COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus = Included |
Study |
new_reports |
LET includedStudies = s WHERE s.projectId = :projectId AND s.lifecycleStatus = Included; SUM(COUNT(s.citations)) for each includedStudy |
Citation |
Box 11: Records Identified from Other Sources¶
| PRISMA Field | Derivation | Data Source |
|---|---|---|
website_results |
SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType = Website |
SystematicSearch |
organisation_results |
SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType = Organisation |
SystematicSearch |
citations_results |
SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType = CitationSearching |
SystematicSearch |
Box 12: Reports Sought for Retrieval (Other Sources)¶
| PRISMA Field | Derivation | Data Source |
|---|---|---|
other_sought_reports |
Same as box 6 (dbr_sought_reports) but s.sourceColumn = 2 |
Study |
Box 13: Reports Not Retrieved (Other Sources)¶
| PRISMA Field | Derivation | Data Source |
|---|---|---|
other_notretrieved_reports |
Same as box 7 (dbr_notretrieved_reports) but s.sourceColumn = 2 |
Study |
Box 14: Reports Assessed for Eligibility (Other Sources)¶
| PRISMA Field | Derivation | Data Source |
|---|---|---|
other_assessed |
Same as box 8 (dbr_assessed) but s.sourceColumn = 2 |
Study + Stage pool |
Box 15: Reports Excluded with Reasons (Other Sources)¶
| PRISMA Field | Derivation | Data Source |
|---|---|---|
other_excluded |
Same as box 9 (dbr_excluded) but s.sourceColumn = 2 |
Study.screeningOutcomes |
Box 16: Total Studies Included¶
| PRISMA Field | Derivation | Data Source |
|---|---|---|
total_studies |
Same as box 10 new_studies (no previous studies for non-updated reviews) |
Study |
total_reports |
Same as box 10 new_reports |
Citation |
Note: When updated review support is added (box 1), box 16 becomes new_studies + previous_studies and new_reports + previous_reports.
Box 17: Studies Included in Meta-Analysis¶
| PRISMA Field | Derivation | Data Source |
|---|---|---|
total_studies_ma |
COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus = Included AND s.metaAnalysisIncluded = true |
Study |
total_reports_ma |
LET maStudies = s WHERE s.lifecycleStatus = Included AND s.metaAnalysisIncluded = true; SUM(COUNT(s.citations)) for each maStudy |
Citation |
Source Column Derivation¶
The s.sourceColumn used throughout the derivation rules is derived from the study's import history:
sourceColumn(study) =
LET primaryCitation = study.citations[0] // First (earliest) Citation
LET sourceType = primaryCitation.sourceType
IF sourceType IN (Database, Register) THEN 1 // Column 1: Databases/Registers
ELSE IF sourceType IN (Website, Organisation, CitationSearching, Other) THEN 2 // Column 2: Other
ELSE NULL // Unclassified (legacy data)
Rule: When a study has multiple Citations from different source types (e.g., found in both PubMed and a website), the FIRST import's sourceType determines the PRISMA column assignment. All Citations are preserved for per-source counting (box 2, box 11), but the study appears in only one column for screening/inclusion boxes (4-9, 12-15).
8. Current Data Model Gap Summary¶
This table provides a quick reference for the specific fields needed on each entity to support PRISMA count derivation.
| Entity | Field | Type | Added In | PRISMA Boxes Served | Status |
|---|---|---|---|---|---|
| SystematicSearch | sourceType |
SearchSourceType? | Phase 7 | box2, box11 | Gap -- not present |
| SystematicSearch | sourceName |
string? | Phase 7 | box2, box11 | Gap -- not present |
| SearchImportJob | sourceType |
SearchSourceType? | Phase 12 | (propagates to SystematicSearch) | Gap -- not present |
| SearchImportJob | sourceName |
string? | Phase 12 | (propagates to SystematicSearch) | Gap -- not present |
| Study | lifecycleStatus |
StudyLifecycleStatus? | Phase 12 | box3, box6-7, box10, box12-13, box16 | Gap -- not present |
| Study | citations[] |
Citation[] | Phase 12 | box2, box3, box10-11, box16-17 | Gap -- not present |
| Study | publicationId |
Guid? | Phase 12 | (cross-project identity) | Gap -- not present |
| Study | duplicateGroupId |
Guid? | Phase 12 | box3 | Gap -- not present |
| Study | fullTextStatus |
FullTextStatus? | Phase 12 | box6-7, box8, box12-14 | Gap -- not present |
| Study | screeningOutcomes[] |
ScreeningOutcome[] | Phase 13/15 | box4-5, box8-9, box14-15 | Gap -- not present |
| Study | metaAnalysisIncluded |
bool? | Phase 16 | box17 | Gap -- not present |
Summary: 11 fields across 3 entities are needed for complete PRISMA support. All are additive (nullable) and introduced incrementally across Phases 7, 12, 13/15, and 16.
9. Edge Cases¶
Study Imported from Multiple Sources of Different Types¶
Scenario: A study's Citations include one from PubMed (Database, Column 1) and one from a website (Website, Column 2).
Rule: Use the FIRST import's sourceType for PRISMA column assignment. The study appears in one column for screening/inclusion boxes (4-9, 12-15). All Citations are preserved and counted in their respective source boxes (box 2, box 11).
Rationale: PRISMA requires each study to appear in exactly one column for the screening/inclusion phases. The first import represents the study's initial entry point into the review. Per-source record counts (boxes 2, 11) remain accurate because Citations are immutable and always counted by their own sourceType.
Study with No SourceType (Legacy Data)¶
Scenario: An existing study's SystematicSearch has no sourceType (field is null).
Rule: The study is counted in PRISMA totals that do not require source column assignment (e.g., box 3 duplicates, box 10 included studies). For source-specific boxes (box 2, box 4-9, box 11-15), the study cannot be assigned to a column and is excluded from those counts.
Migration: Phase 16 backfills sourceType where determinable from LibraryFileType (only PubmedXml -> Database is high-confidence). Admin interface allows manual classification for remaining records.
Study Screened Under Multiple Profiles¶
Scenario: A study is screened under a title/abstract profile and then a full-text profile, with different outcomes.
Rule: Each profile produces its own entry in screeningOutcomes[]. PRISMA counts use specific profiles:
- Title/abstract profile outcomes feed boxes 4-5 (records screened, records excluded)
- Full-text profile outcomes feed boxes 8-9, 14-15 (reports assessed, reports excluded with reasons)
There is no conflict because each PRISMA box references a specific screening profile, not the aggregate of all profiles.
Study Passes All Screening but Has FullTextNotRetrieved Status¶
Scenario: Edge case where lifecycle status is FullTextNotRetrieved but a screening outcome shows Included.
Rule: This scenario SHOULD NOT occur in normal operation. FullTextNotRetrieved studies are excluded from full-text screening pools, so they cannot receive a full-text screening outcome. If it occurs (e.g., due to admin override), the lifecycleStatus takes precedence: the study is NOT counted as included (box 10) because FullTextNotRetrieved is a terminal state.
Duplicate Study with Existing Screening Data¶
Scenario: A study is screened and then later identified as a duplicate (e.g., a late-arriving import reveals the duplicate relationship).
Rule: When the study's lifecycleStatus transitions to Duplicate, the study is excluded from all pools and from PRISMA screening counts. The screeningOutcomes[] data is preserved (not deleted) for audit purposes, but it is no longer counted in PRISMA boxes. The screening data from the duplicate study MAY be available for reconciliation on the canonical study (see deduplication specification, Phase 12).
10. Cross-References¶
- PRISMA Flow Diagram Mapping: prisma-flow-diagram-mapping.md -- Complete box-to-field mapping referencing the lifecycle states and source types defined here
- Three-Level Data Model: three-level-data-model.md -- Entity specifications for Publication, Citation, and Study that carry the fields referenced in this document (
Study.lifecycleStatus,Study.screeningOutcomes[],Citation.sourceType,SystematicSearch.sourceType)
Requirement Coverage¶
| Requirement ID | Coverage in This Document |
|---|---|
| PRISMA-01 | Complete: All 34 PRISMA fields have derivation rules using lifecycle status and/or screening outcomes |
| PRISMA-02 | Complete: Source type taxonomy with 6 values and PRISMA column assignment rules |
| PRISMA-04 | Complete: Study lifecycle status model with 9 states, valid transitions, and PRISMA count derivation rules |
| PRISMA-05 | Partial: Deduplication counts derivable per source type via Citation.sourceType; full dedup counting requires Phase 12 implementation |