Skip to content

Study Lifecycle Status Model and Source Type Taxonomy

Purpose

This document defines the two classification systems that enable PRISMA 2020 flow diagram count derivation:

  1. Study Lifecycle Status -- the 9-state model tracking a study's position in the systematic review pipeline
  2. Source Type Taxonomy -- the 6-value classification of search sources into PRISMA's dual-column structure

Together, these systems constrain every PRISMA box derivation rule. Without a clear lifecycle model, PRISMA boxes cannot be populated. Without source types, the PRISMA dual-column structure (databases/registers vs. other sources) cannot be supported.

This document is a binding constraint on Phases 12-16 of the SyRF platform evolution.

Normative language: "MUST" indicates an absolute requirement. "SHALL" indicates mandatory behavior. "SHOULD" indicates a strong recommendation. "MAY" indicates optional behavior.

Cross-references: prisma-flow-diagram-mapping.md (PRISMA box-to-field mapping), three-level-data-model.md (Publication/Citation/Study entity specifications)

Requirement coverage: PRISMA-01, PRISMA-02, PRISMA-04, PRISMA-05


2. Critical Distinction: Lifecycle Status vs. Screening Outcome

This section prevents the single most dangerous data model mistake. Conflating lifecycle status with screening outcome makes per-profile multi-stage pipelines impossible, breaking PRISMA boxes 4-5 and 8-9. Read this section before any lifecycle or screening implementation.

Lifecycle Status

Lifecycle Status tracks the study's position in the overall review pipeline. It is a SINGLE value on the Study entity. It answers: "Where is this study in the process?"

  • Stored as: Study.lifecycleStatus (enum, one value at a time)
  • Changed by: system events (import, dedup, retrieval) or admin actions
  • Scope: the study's existence and availability in the review, regardless of any screening decision

Screening Outcome

Screening Outcome tracks per-profile inclusion/exclusion decisions. It is an ARRAY of per-profile results on the Study entity. It answers: "What was decided about this study under criteria set X?"

  • Stored as: Study.screeningOutcomes[] (array, one entry per screening profile)
  • Changed by: screener decisions, reconciliation outcomes
  • Scope: a specific screening profile's determination about this study

Why They Are Different

  1. A study can be Excluded under one screening profile and Included under another. In a multi-stage pipeline (e.g., title/abstract screening followed by full-text screening), the same study participates in multiple screening profiles. Each profile produces its own outcome independently.

  2. Lifecycle status is about the study's existence in the review, not about any particular screening decision. A study that is "Excluded" at title/abstract screening still exists in the review -- it is still Active in terms of lifecycle. It simply did not pass one screening gate.

  3. PRISMA uses BOTH systems for different boxes:

  4. Lifecycle status feeds: box 3 (duplicates, automation removal), box 6-7 (retrieval), box 10/16 (included studies)
  5. Screening outcomes feed: box 4-5 (title/abstract screening), box 8-9 (full-text screening)

  6. Changing a screening profile's agreement mode MUST NOT require updating a lifecycle status field. If it does, the model is wrong.

Worked Example

Consider a study imported from PubMed:

  1. Import: Study created with lifecycleStatus = Active
  2. Title/Abstract screening: Screeners evaluate the study under the T/A screening profile
  3. screeningOutcomes[0] = { profileId: TA_PROFILE, result: Excluded, reason: "Wrong population" }
  4. lifecycleStatus is STILL Active -- the study was not removed from the review, just excluded under one profile
  5. Alternative scenario: If the study passes ALL screening profiles and is fully included:
  6. screeningOutcomes[0] = { profileId: TA_PROFILE, result: Included }
  7. screeningOutcomes[1] = { profileId: FT_PROFILE, result: Included }
  8. lifecycleStatus transitions to Included -- the study has reached the terminal state

The lifecycle status transitions to Included only when the study passes all required screening profiles. Individual profile exclusions do NOT change lifecycle status.


3. StudyLifecycleStatus Enum

public enum StudyLifecycleStatus
{
    // === Import phase ===
    Active = 0,                  // Default. Study is available for screening/review.
                                 // This is the initial state for all imported studies.

    // === Dedup phase ===
    Duplicate = 1,               // Confirmed duplicate (auto-confirmed or admin-confirmed).
                                 // Study is excluded from all stage pools.
    PendingDuplicateReview = 2,  // Probable duplicate awaiting admin review.
                                 // Study is temporarily excluded from stage pools.

    // === Retrieval phase (PRISMA boxes 6-7) ===
    FullTextSought = 3,          // Full text retrieval has been attempted.
    FullTextNotRetrieved = 4,    // Full text could not be obtained.
                                 // Study excluded from full-text screening pools.

    // === Terminal states ===
    Included = 5,                // Final: study included in the review.
                                 // Determined when study passes all required screening profiles.
    Merged = 6,                  // Merged into another study during duplicate resolution.
                                 // Original data preserved but study excluded from pools.

    // === Pre-screen removal (PRISMA box 3) ===
    RemovedByAutomation = 7,     // Removed by automation tool before screening.
    RemovedOther = 8             // Removed for other pre-screen reasons.
}

Detailed Status Definitions

Active (0) -- Default State

Property Value
Definition Study is available for screening, annotation, and review. This is the default state.
Trigger Study import (initial creation); or admin reversal of Duplicate/PendingDuplicateReview; or successful full-text retrieval (FullTextSought -> Active)
Set by System (on import), Admin (on reversal)
Appears in stage pools Yes -- Active studies are the ONLY studies that appear in screening stage pools
PRISMA box Not directly counted. Active is the working state; studies in Active participate in screening (boxes 4, 5, 8, 9) via screeningOutcomes.

Duplicate (1) -- Confirmed Duplicate

Property Value
Definition Confirmed duplicate of another study. Either auto-confirmed by the dedup algorithm (high confidence) or manually confirmed by an admin.
Trigger ASySD auto-confirmation (high confidence pair); or admin confirmation of probable duplicate
Set by System (auto-confirm), Admin (manual confirm)
Appears in stage pools No -- Duplicate studies are excluded from all stage pools
PRISMA box Box 3: duplicates count. The total duplicate count is derived from COUNT(Citations) - COUNT(unique Studies WHERE lifecycleStatus NOT IN (Duplicate, Merged)).

PendingDuplicateReview (2) -- Probable Duplicate

Property Value
Definition Probable duplicate flagged by the dedup algorithm but below the auto-confirmation confidence threshold. Awaiting admin review in the duplicate review queue.
Trigger ASySD flags pair as probable duplicate (below auto-confirm threshold)
Set by System (dedup algorithm)
Appears in stage pools No -- PendingDuplicateReview studies are conservatively excluded from pools to prevent screening duplicates
PRISMA box Not directly counted in PRISMA. If confirmed as duplicate, moves to Duplicate and counted in box 3. If rejected, returns to Active.

FullTextSought (3) -- Retrieval Attempted

Property Value
Definition Full text retrieval has been attempted for this study. This is a transitional state between title/abstract screening and full-text screening.
Trigger Admin or system initiates full-text retrieval for a study that passed title/abstract screening
Set by System (retrieval workflow), Admin (manual action)
Appears in stage pools No -- study is in retrieval limbo, not yet available for full-text screening
PRISMA box Box 6 (dbr_sought_reports) for Column 1 sources; Box 12 (other_sought_reports) for Column 2 sources. Count includes studies that have been sought, whether or not retrieval succeeded.

FullTextNotRetrieved (4) -- Retrieval Failed

Property Value
Definition Full text could not be obtained. The study cannot proceed to full-text screening. This is a terminal state (but admin can override).
Trigger Retrieval process reports failure; admin marks as not retrievable
Set by System (retrieval failure), Admin (manual determination)
Appears in stage pools No -- study cannot be screened without full text
PRISMA box Box 7 (dbr_notretrieved_reports) for Column 1 sources; Box 13 (other_notretrieved_reports) for Column 2 sources.

Included (5) -- Final Inclusion

Property Value
Definition Study is included in the systematic review. This is a terminal state determined when the study passes all required screening profiles.
Trigger Study passes all required screening profiles (all screeningOutcomes show result = Included)
Set by System (automatically when all screening profiles resolve to Included)
Appears in stage pools No -- study has completed the pipeline. It MAY appear in annotation pools for data extraction.
PRISMA box Box 10 (new_studies, new_reports), Box 16 (total_studies, total_reports).

Merged (6) -- Merged During Dedup

Property Value
Definition Study has been merged into another study during duplicate resolution. The original study's data (Citations) is preserved but the study itself is no longer an active participant in the review.
Trigger Admin confirms merge during duplicate review; Citations moved to canonical study
Set by Admin (via merge workflow)
Appears in stage pools No -- the merged study is superseded by the canonical study
PRISMA box Box 3: Merged studies contribute to the duplicates count (Citations count - unique Studies).

RemovedByAutomation (7) -- Pre-Screen Automation Removal

Property Value
Definition Study was removed by an automation tool before screening. Examples: machine learning classifiers, rule-based filters, format validators.
Trigger Automation tool marks study as ineligible
Set by System (automation tool)
Appears in stage pools No
PRISMA box Box 3: excluded_automatic count.

RemovedOther (8) -- Pre-Screen Other Removal

Property Value
Definition Study was removed for pre-screening reasons other than automation or deduplication. Examples: retracted publications, known irrelevant records, admin cleanup.
Trigger Admin marks study for removal
Set by Admin
Appears in stage pools No
PRISMA box Box 3: excluded_other count.

4. Valid State Transitions

State Transition Diagram

                                    +--- RemovedByAutomation (terminal)
                                    |
Import ---> Active ---+---> Duplicate (terminal, reversible by admin)
                      |
                      +---> PendingDuplicateReview ---+---> Duplicate
                      |                               +---> Active (admin: not duplicate)
                      |
                      +---> FullTextSought ---+---> FullTextNotRetrieved (terminal)
                      |                       +---> Active (full text retrieved successfully)
                      |
                      +---> Included (terminal)
                      |
                      +---> Merged (terminal)
                      |
                      +---> RemovedOther (terminal)
                      |
                      +--- RemovedByAutomation (terminal)

State Transition Table

# From State To State Trigger Actor Reversible? Notes
T1 (new) Active Study imported System N/A Initial state for all imported studies
T2 Active Duplicate Dedup auto-confirm (high confidence) System Yes (admin) Admin can reverse to Active
T3 Active PendingDuplicateReview Dedup flags probable duplicate System Yes (resolves to Duplicate or Active) Transitional state
T4 Active FullTextSought Full-text retrieval initiated System/Admin Yes (returns to Active on success) Transitional state
T5 Active Included All screening profiles pass System Admin override only Terminal state
T6 Active Merged Admin confirms merge Admin Admin override only Terminal state
T7 Active RemovedByAutomation Automation tool marks ineligible System Admin override only Terminal state
T8 Active RemovedOther Admin removes for other reasons Admin Admin override only Terminal state
T9 PendingDuplicateReview Duplicate Admin confirms duplicate Admin Yes (admin can reverse to Active) Admin review resolution
T10 PendingDuplicateReview Active Admin rejects duplicate Admin N/A (already resolved) Admin review resolution
T11 Duplicate Active Admin reverses dedup decision Admin Yes Exceptional: admin determines studies are not duplicates
T12 FullTextSought Active Full text obtained successfully System/Admin N/A Study returns to Active for full-text screening
T13 FullTextSought FullTextNotRetrieved Retrieval fails System/Admin Admin override only Terminal state

Key Transition Rules

  1. Only Active studies appear in screening stage pools. This is a system invariant. All other lifecycle states exclude the study from all screening pools.
  2. PendingDuplicateReview studies are conservatively excluded from pools. Even though they may not be duplicates, they are withheld from screening until an admin resolves the status. This prevents wasted screening effort on probable duplicates.
  3. Duplicate -> Active is allowed. An admin can reverse a dedup decision if they determine the studies are not actually duplicates. This is important for quality assurance.
  4. FullTextSought -> Active is the normal flow. When full text is obtained, the study returns to Active and becomes available for full-text screening.
  5. Terminal states (Included, Merged, RemovedByAutomation, RemovedOther, FullTextNotRetrieved) are normally irreversible. Admin can override in exceptional cases, but the system does not provide routine transition paths back from terminal states.
  6. Lifecycle status transitions MUST NOT be triggered by screening outcomes. The system SHALL NOT automatically change lifecycleStatus when a single screening profile excludes a study. The transition to Included happens ONLY when all required screening profiles resolve to Included.

5. Screening Outcome Model

This section defines the per-profile screening outcome structure that complements the lifecycle status model. The full ScreeningOutcome specification will be detailed in Phase 15; this placeholder ensures lifecycle status and screening outcomes are architecturally separate from the start.

ScreeningOutcome Structure

// Embedded array on Study: Study.screeningOutcomes[]
public class ScreeningOutcome
{
    public Guid ProfileId { get; set; }         // Which screening profile
    public Guid StageId { get; set; }           // Which stage
    public ScreeningResult Result { get; set; }  // Included, Excluded, Conflict, Pending
    public string? PrimaryExclusionReason { get; set; }  // For PRISMA box 9/15
    public DateTime? ResolvedAt { get; set; }
    public ScreeningAuthority Authority { get; set; }  // CandidateAgreement, Reconciled
}

public enum ScreeningResult
{
    Pending = 0,     // Not yet determined
    Included = 1,    // Passed this screening profile
    Excluded = 2,    // Failed this screening profile
    Conflict = 3     // Screening disagreement, awaiting reconciliation
}

public enum ScreeningAuthority
{
    CandidateAgreement = 0,  // All screeners agreed
    Reconciled = 1           // Reconciler resolved disagreement
}

How Screening Outcomes and Lifecycle Status Interact

  1. Screening outcomes are per-profile. A study may have outcomes for multiple screening profiles (e.g., one for title/abstract, one for full-text). Each entry in screeningOutcomes[] represents one profile's determination.

  2. Lifecycle status reflects the aggregate result. The system evaluates all screeningOutcomes[] entries to determine lifecycle transitions:

  3. If ALL required profiles have Result = Included, the system MAY transition lifecycleStatus to Included.
  4. If any profile has Result = Excluded, the study does NOT change lifecycle status -- it remains Active but is excluded from downstream pools governed by that profile's outcome.

  5. PRISMA counting uses both independently:

  6. Box 4 (records_screened): Uses stage pool membership (Active studies entering screening)
  7. Box 5 (records_excluded at T/A): Uses screeningOutcomes[TA_PROFILE].Result = Excluded
  8. Box 8 (dbr_assessed): Uses stage pool membership (studies entering FT screening)
  9. Box 9 (dbr_excluded with reasons): Uses screeningOutcomes[FT_PROFILE].PrimaryExclusionReason
  10. Box 10 (new_studies): Uses lifecycleStatus = Included

Note: This is a design-phase specification. The full ScreeningOutcome model will be detailed in Phase 15. This placeholder ensures lifecycle status and screening outcomes are architecturally separate from the start.


6. Source Type Taxonomy

6a. SearchSourceType Enum

public enum SearchSourceType
{
    // Column 1: Databases and Registers (PRISMA boxes 2, 4-9)
    Database = 0,          // Bibliographic databases: PubMed, Embase, CINAHL,
                           // Web of Science, Scopus, PsycINFO, etc.
    Register = 1,          // Study registers: ClinicalTrials.gov, CENTRAL, ICTRP, etc.

    // Column 2: Other Sources (PRISMA boxes 11-15)
    Website = 2,           // Website searches (including Google Scholar for
                           // subject searching in some cases)
    Organisation = 3,      // Organisations contacted for studies
    CitationSearching = 4, // Forward/backward citation chasing
    Other = 5              // Other methods: expert contacts, conference abstracts, etc.
}

6b. PRISMA Column Assignment

Source Type PRISMA Column PRISMA Boxes Description
Database Column 1: Databases and Registers box2, box4-box9 Bibliographic databases (PubMed, Embase, Scopus, etc.)
Register Column 1: Databases and Registers box2, box4-box9 Study registers (ClinicalTrials.gov, CENTRAL, ICTRP)
Website Column 2: Other Sources box11-box15 Website searches
Organisation Column 2: Other Sources box11-box15 Organisations contacted for studies
CitationSearching Column 2: Other Sources box11-box15 Forward/backward citation chasing
Other Column 2: Other Sources box11-box15 Other methods (expert contacts, conference abstracts)

Column assignment rule:

  • Column 1 (Databases/Registers): sourceType IN (Database, Register)
  • Column 2 (Other Sources): sourceType IN (Website, Organisation, CitationSearching, Other)

6c. Entity Modifications

Fields to add to SystematicSearch:

Field Type Nullable Description
sourceType SearchSourceType? Yes (nullable for migration) PRISMA source classification
sourceName string? Yes Free-text source name (e.g., "PubMed", "Embase")

Fields to add to SearchImportJob:

Field Type Nullable Description
sourceType SearchSourceType? Yes Source type known at import time
sourceName string? Yes Source name known at import time

Rationale for both entities: The SystematicSearch entity represents the search itself, while SearchImportJob represents a specific import operation. Source type is known at import time and propagated to the SystematicSearch for PRISMA reporting. Both entities carry the field to ensure traceability from import to PRISMA count.

6d. Common Source Name Registry

The following table provides suggested standard source names to promote consistency across projects. Source names are free text, not an enum -- this registry is advisory only.

SearchSourceType Suggested Names
Database PubMed, Embase, CINAHL, Web of Science, Scopus, PsycINFO, MEDLINE, Cochrane Library
Register ClinicalTrials.gov, CENTRAL, ICTRP, WHO ICTRP, EU Clinical Trials Register
Website Google Scholar, specific institutional websites
Organisation (free text: organization name)
CitationSearching Forward citation, Backward citation, Snowballing
Other Expert contact, Conference abstract, Grey literature

Note: Source name is free text, not an enum. The registry is advisory to promote consistency. A future UI enhancement MAY provide autocomplete suggestions from this registry.

6e. LibraryFileType to SourceType Inference

For migration (Phase 16), the following table documents which LibraryFileType values can be safely inferred to a SearchSourceType. This enables automated backfill of the sourceType field on existing SystematicSearch records.

LibraryFileType Inferred SourceType Confidence Rationale
PubmedXml Database HIGH PubMed is always a bibliographic database
EndnoteXml Unknown LOW Could be any source exported via EndNote
LivingSearchJson Unknown LOW Depends on search configuration; could be database or register
TsvLibrary Unknown LOW Generic format; could come from any source type
CsvLibrary Unknown LOW Generic format; could come from any source type

Migration rule: Only PubmedXml SHALL be automatically backfilled to sourceType = Database. All other file types SHALL leave sourceType as null, requiring admin manual classification. The admin interface SHALL provide a bulk classification tool for setting source types on existing searches.


7. PRISMA Count Derivation Rules

This section provides the complete derivation table linking every PRISMA box to the exact query that populates it. All 17 boxes and 34 fields are covered.

Notation conventions: - ss = SystematicSearch entity - s = Study entity - c = Citation (embedded on Study as s.citations[]) - s.sourceColumn = derived from the sourceType of the study's primary Citation. Column 1 = Database or Register; Column 2 = Website, Organisation, CitationSearching, or Other

Box 1: Previous Studies (Updated Reviews -- DEFERRED)

PRISMA Field Derivation Data Source Status
previous_studies N/A -- SyRF does not support updated reviews N/A DEFERRED
previous_reports N/A -- SyRF does not support updated reviews N/A DEFERRED

Box 2: Records Identified from Databases and Registers

PRISMA Field Derivation Data Source
database_results SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType = Database SystematicSearch (ss)
database_specific_results GROUP BY ss.sourceName WHERE ss.projectId = :projectId AND ss.sourceType = Database; FORMAT "{sourceName} (n={numberOfCitations})" SystematicSearch
register_results SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType = Register SystematicSearch
register_specific_results GROUP BY ss.sourceName WHERE ss.projectId = :projectId AND ss.sourceType = Register; FORMAT "{sourceName} (n={numberOfCitations})" SystematicSearch

Box 3: Records Removed Before Screening

PRISMA Field Derivation Data Source
duplicates LET totalCitations = SUM(COUNT(s.citations)) across all s WHERE s.projectId = :projectId; LET uniqueActiveStudies = COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus NOT IN (Duplicate, Merged); RETURN totalCitations - uniqueActiveStudies Citation ©, Study (s)
excluded_automatic COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus = RemovedByAutomation Study
excluded_other COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus = RemovedOther Study

Box 4: Records Screened (Databases/Registers)

PRISMA Field Derivation Data Source
records_screened COUNT(s) WHERE s.projectId = :projectId AND s ENTERED title/abstract screening stage pool AND s.sourceColumn = 1 Study + Stage pool tracking

Note: "Entered stage pool" requires stage pool tracking. The precise mechanism is specified in Phase 14 (Stage Filtering). For PRISMA purposes, a study is "screened" if it was made available to screeners in the title/abstract screening stage, regardless of the outcome.

Box 5: Records Excluded at Title/Abstract (Databases/Registers)

PRISMA Field Derivation Data Source
records_excluded COUNT(s) WHERE s.projectId = :projectId AND s.screeningOutcomes[titleAbstractProfileId].Result = Excluded AND s.sourceColumn = 1 Study.screeningOutcomes

Box 6: Reports Sought for Retrieval (Databases/Registers)

PRISMA Field Derivation Data Source
dbr_sought_reports COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus IN (FullTextSought, Active, Included) AND s.screeningOutcomes[titleAbstractProfileId].Result = Included AND s.sourceColumn = 1 Study

Note: This counts studies that passed T/A screening and entered the retrieval phase. Studies with lifecycleStatus = Active are included because a successful retrieval returns the study to Active for FT screening.

Box 7: Reports Not Retrieved (Databases/Registers)

PRISMA Field Derivation Data Source
dbr_notretrieved_reports COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus = FullTextNotRetrieved AND s.sourceColumn = 1 Study

Box 8: Reports Assessed for Eligibility (Databases/Registers)

PRISMA Field Derivation Data Source
dbr_assessed COUNT(s) WHERE s.projectId = :projectId AND s ENTERED full-text screening stage pool AND s.sourceColumn = 1 Study + Stage pool

Box 9: Reports Excluded with Reasons (Databases/Registers)

PRISMA Field Derivation Data Source
dbr_excluded GROUP BY s.screeningOutcomes[fullTextProfileId].PrimaryExclusionReason WHERE s.projectId = :projectId AND s.screeningOutcomes[fullTextProfileId].Result = Excluded AND s.sourceColumn = 1; FORMAT "{reason} (n={count})" Study.screeningOutcomes

Box 10: New Studies Included

PRISMA Field Derivation Data Source
new_studies COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus = Included Study
new_reports LET includedStudies = s WHERE s.projectId = :projectId AND s.lifecycleStatus = Included; SUM(COUNT(s.citations)) for each includedStudy Citation

Box 11: Records Identified from Other Sources

PRISMA Field Derivation Data Source
website_results SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType = Website SystematicSearch
organisation_results SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType = Organisation SystematicSearch
citations_results SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType = CitationSearching SystematicSearch

Box 12: Reports Sought for Retrieval (Other Sources)

PRISMA Field Derivation Data Source
other_sought_reports Same as box 6 (dbr_sought_reports) but s.sourceColumn = 2 Study

Box 13: Reports Not Retrieved (Other Sources)

PRISMA Field Derivation Data Source
other_notretrieved_reports Same as box 7 (dbr_notretrieved_reports) but s.sourceColumn = 2 Study

Box 14: Reports Assessed for Eligibility (Other Sources)

PRISMA Field Derivation Data Source
other_assessed Same as box 8 (dbr_assessed) but s.sourceColumn = 2 Study + Stage pool

Box 15: Reports Excluded with Reasons (Other Sources)

PRISMA Field Derivation Data Source
other_excluded Same as box 9 (dbr_excluded) but s.sourceColumn = 2 Study.screeningOutcomes

Box 16: Total Studies Included

PRISMA Field Derivation Data Source
total_studies Same as box 10 new_studies (no previous studies for non-updated reviews) Study
total_reports Same as box 10 new_reports Citation

Note: When updated review support is added (box 1), box 16 becomes new_studies + previous_studies and new_reports + previous_reports.

Box 17: Studies Included in Meta-Analysis

PRISMA Field Derivation Data Source
total_studies_ma COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus = Included AND s.metaAnalysisIncluded = true Study
total_reports_ma LET maStudies = s WHERE s.lifecycleStatus = Included AND s.metaAnalysisIncluded = true; SUM(COUNT(s.citations)) for each maStudy Citation

Source Column Derivation

The s.sourceColumn used throughout the derivation rules is derived from the study's import history:

sourceColumn(study) =
  LET primaryCitation = study.citations[0]  // First (earliest) Citation
  LET sourceType = primaryCitation.sourceType
  IF sourceType IN (Database, Register) THEN 1      // Column 1: Databases/Registers
  ELSE IF sourceType IN (Website, Organisation, CitationSearching, Other) THEN 2  // Column 2: Other
  ELSE NULL                                          // Unclassified (legacy data)

Rule: When a study has multiple Citations from different source types (e.g., found in both PubMed and a website), the FIRST import's sourceType determines the PRISMA column assignment. All Citations are preserved for per-source counting (box 2, box 11), but the study appears in only one column for screening/inclusion boxes (4-9, 12-15).


8. Current Data Model Gap Summary

This table provides a quick reference for the specific fields needed on each entity to support PRISMA count derivation.

Entity Field Type Added In PRISMA Boxes Served Status
SystematicSearch sourceType SearchSourceType? Phase 7 box2, box11 Gap -- not present
SystematicSearch sourceName string? Phase 7 box2, box11 Gap -- not present
SearchImportJob sourceType SearchSourceType? Phase 12 (propagates to SystematicSearch) Gap -- not present
SearchImportJob sourceName string? Phase 12 (propagates to SystematicSearch) Gap -- not present
Study lifecycleStatus StudyLifecycleStatus? Phase 12 box3, box6-7, box10, box12-13, box16 Gap -- not present
Study citations[] Citation[] Phase 12 box2, box3, box10-11, box16-17 Gap -- not present
Study publicationId Guid? Phase 12 (cross-project identity) Gap -- not present
Study duplicateGroupId Guid? Phase 12 box3 Gap -- not present
Study fullTextStatus FullTextStatus? Phase 12 box6-7, box8, box12-14 Gap -- not present
Study screeningOutcomes[] ScreeningOutcome[] Phase 13/15 box4-5, box8-9, box14-15 Gap -- not present
Study metaAnalysisIncluded bool? Phase 16 box17 Gap -- not present

Summary: 11 fields across 3 entities are needed for complete PRISMA support. All are additive (nullable) and introduced incrementally across Phases 7, 12, 13/15, and 16.


9. Edge Cases

Study Imported from Multiple Sources of Different Types

Scenario: A study's Citations include one from PubMed (Database, Column 1) and one from a website (Website, Column 2).

Rule: Use the FIRST import's sourceType for PRISMA column assignment. The study appears in one column for screening/inclusion boxes (4-9, 12-15). All Citations are preserved and counted in their respective source boxes (box 2, box 11).

Rationale: PRISMA requires each study to appear in exactly one column for the screening/inclusion phases. The first import represents the study's initial entry point into the review. Per-source record counts (boxes 2, 11) remain accurate because Citations are immutable and always counted by their own sourceType.

Study with No SourceType (Legacy Data)

Scenario: An existing study's SystematicSearch has no sourceType (field is null).

Rule: The study is counted in PRISMA totals that do not require source column assignment (e.g., box 3 duplicates, box 10 included studies). For source-specific boxes (box 2, box 4-9, box 11-15), the study cannot be assigned to a column and is excluded from those counts.

Migration: Phase 16 backfills sourceType where determinable from LibraryFileType (only PubmedXml -> Database is high-confidence). Admin interface allows manual classification for remaining records.

Study Screened Under Multiple Profiles

Scenario: A study is screened under a title/abstract profile and then a full-text profile, with different outcomes.

Rule: Each profile produces its own entry in screeningOutcomes[]. PRISMA counts use specific profiles: - Title/abstract profile outcomes feed boxes 4-5 (records screened, records excluded) - Full-text profile outcomes feed boxes 8-9, 14-15 (reports assessed, reports excluded with reasons)

There is no conflict because each PRISMA box references a specific screening profile, not the aggregate of all profiles.

Study Passes All Screening but Has FullTextNotRetrieved Status

Scenario: Edge case where lifecycle status is FullTextNotRetrieved but a screening outcome shows Included.

Rule: This scenario SHOULD NOT occur in normal operation. FullTextNotRetrieved studies are excluded from full-text screening pools, so they cannot receive a full-text screening outcome. If it occurs (e.g., due to admin override), the lifecycleStatus takes precedence: the study is NOT counted as included (box 10) because FullTextNotRetrieved is a terminal state.

Duplicate Study with Existing Screening Data

Scenario: A study is screened and then later identified as a duplicate (e.g., a late-arriving import reveals the duplicate relationship).

Rule: When the study's lifecycleStatus transitions to Duplicate, the study is excluded from all pools and from PRISMA screening counts. The screeningOutcomes[] data is preserved (not deleted) for audit purposes, but it is no longer counted in PRISMA boxes. The screening data from the duplicate study MAY be available for reconciliation on the canonical study (see deduplication specification, Phase 12).


10. Cross-References

  • PRISMA Flow Diagram Mapping: prisma-flow-diagram-mapping.md -- Complete box-to-field mapping referencing the lifecycle states and source types defined here
  • Three-Level Data Model: three-level-data-model.md -- Entity specifications for Publication, Citation, and Study that carry the fields referenced in this document (Study.lifecycleStatus, Study.screeningOutcomes[], Citation.sourceType, SystematicSearch.sourceType)

Requirement Coverage

Requirement ID Coverage in This Document
PRISMA-01 Complete: All 34 PRISMA fields have derivation rules using lifecycle status and/or screening outcomes
PRISMA-02 Complete: Source type taxonomy with 6 values and PRISMA column assignment rules
PRISMA-04 Complete: Study lifecycle status model with 9 states, valid transitions, and PRISMA count derivation rules
PRISMA-05 Partial: Deduplication counts derivable per source type via Citation.sourceType; full dedup counting requires Phase 12 implementation