Study Lifecycle Status Model and Source Type Taxonomy¶

Purpose¶

This document defines the two classification systems that enable PRISMA 2020 flow diagram count derivation:

Study Lifecycle Status -- the 9-state model tracking a study's position in the systematic review pipeline
Source Type Taxonomy -- the 6-value classification of search sources into PRISMA's dual-column structure

Together, these systems constrain every PRISMA box derivation rule. Without a clear lifecycle model, PRISMA boxes cannot be populated. Without source types, the PRISMA dual-column structure (databases/registers vs. other sources) cannot be supported.

This document is a binding constraint on Phases 12-16 of the SyRF platform evolution.

Normative language: "MUST" indicates an absolute requirement. "SHALL" indicates mandatory behavior. "SHOULD" indicates a strong recommendation. "MAY" indicates optional behavior.

Cross-references: prisma-flow-diagram-mapping.md (PRISMA box-to-field mapping), three-level-data-model.md (Publication/Citation/Study entity specifications)

Requirement coverage: PRISMA-01, PRISMA-02, PRISMA-04, PRISMA-05

2. Critical Distinction: Lifecycle Status vs. Screening Outcome¶

This section prevents the single most dangerous data model mistake. Conflating lifecycle status with screening outcome makes per-profile multi-stage pipelines impossible, breaking PRISMA boxes 4-5 and 8-9. Read this section before any lifecycle or screening implementation.

Lifecycle Status¶

Lifecycle Status tracks the study's position in the overall review pipeline. It is a SINGLE value on the Study entity. It answers: "Where is this study in the process?"

Stored as: Study.lifecycleStatus (enum, one value at a time)
Changed by: system events (import, dedup, retrieval) or admin actions
Scope: the study's existence and availability in the review, regardless of any screening decision

Screening Outcome¶

Screening Outcome tracks per-profile inclusion/exclusion decisions. It is an ARRAY of per-profile results on the Study entity. It answers: "What was decided about this study under criteria set X?"

Stored as: Study.screeningOutcomes[] (array, one entry per screening profile)
Changed by: screener decisions, reconciliation outcomes
Scope: a specific screening profile's determination about this study

Why They Are Different¶

A study can be Excluded under one screening profile and Included under another. In a multi-stage pipeline (e.g., title/abstract screening followed by full-text screening), the same study participates in multiple screening profiles. Each profile produces its own outcome independently.
Lifecycle status is about the study's existence in the review, not about any particular screening decision. A study that is "Excluded" at title/abstract screening still exists in the review -- it is still Active in terms of lifecycle. It simply did not pass one screening gate.
PRISMA uses BOTH systems for different boxes:
Lifecycle status feeds: box 3 (duplicates, automation removal), box 6-7 (retrieval), box 10/16 (included studies)
Screening outcomes feed: box 4-5 (title/abstract screening), box 8-9 (full-text screening)
Changing a screening profile's agreement mode MUST NOT require updating a lifecycle status field. If it does, the model is wrong.

Worked Example¶

Consider a study imported from PubMed:

Import: Study created with lifecycleStatus = Active
Title/Abstract screening: Screeners evaluate the study under the T/A screening profile
screeningOutcomes[0] = { profileId: TA_PROFILE, result: Excluded, reason: "Wrong population" }
lifecycleStatus is STILL Active -- the study was not removed from the review, just excluded under one profile
Alternative scenario: If the study passes ALL screening profiles and is fully included:
screeningOutcomes[0] = { profileId: TA_PROFILE, result: Included }
screeningOutcomes[1] = { profileId: FT_PROFILE, result: Included }
lifecycleStatus transitions to Included -- the study has reached the terminal state

The lifecycle status transitions to Included only when the study passes all required screening profiles. Individual profile exclusions do NOT change lifecycle status.

3. StudyLifecycleStatus Enum¶

public enum StudyLifecycleStatus
{
    // === Import phase ===
    Active = 0,                  // Default. Study is available for screening/review.
                                 // This is the initial state for all imported studies.

    // === Dedup phase ===
    Duplicate = 1,               // Confirmed duplicate (auto-confirmed or admin-confirmed).
                                 // Study is excluded from all stage pools.
    PendingDuplicateReview = 2,  // Probable duplicate awaiting admin review.
                                 // Study is temporarily excluded from stage pools.

    // === Retrieval phase (PRISMA boxes 6-7) ===
    FullTextSought = 3,          // Full text retrieval has been attempted.
    FullTextNotRetrieved = 4,    // Full text could not be obtained.
                                 // Study excluded from full-text screening pools.

    // === Terminal states ===
    Included = 5,                // Final: study included in the review.
                                 // Determined when study passes all required screening profiles.
    Merged = 6,                  // Merged into another study during duplicate resolution.
                                 // Original data preserved but study excluded from pools.

    // === Pre-screen removal (PRISMA box 3) ===
    RemovedByAutomation = 7,     // Removed by automation tool before screening.
    RemovedOther = 8             // Removed for other pre-screen reasons.
}

Detailed Status Definitions¶

Active (0) -- Default State¶

Property	Value
Definition	Study is available for screening, annotation, and review. This is the default state.
Trigger	Study import (initial creation); or admin reversal of Duplicate/PendingDuplicateReview; or successful full-text retrieval (FullTextSought -> Active)
Set by	System (on import), Admin (on reversal)
Appears in stage pools	Yes -- Active studies are the ONLY studies that appear in screening stage pools
PRISMA box	Not directly counted. Active is the working state; studies in Active participate in screening (boxes 4, 5, 8, 9) via screeningOutcomes.

Duplicate (1) -- Confirmed Duplicate¶

Property	Value
Definition	Confirmed duplicate of another study. Either auto-confirmed by the dedup algorithm (high confidence) or manually confirmed by an admin.
Trigger	ASySD auto-confirmation (high confidence pair); or admin confirmation of probable duplicate
Set by	System (auto-confirm), Admin (manual confirm)
Appears in stage pools	No -- Duplicate studies are excluded from all stage pools
PRISMA box	Box 3: `duplicates` count. The total duplicate count is derived from `COUNT(Citations) - COUNT(unique Studies WHERE lifecycleStatus NOT IN (Duplicate, Merged))`.

PendingDuplicateReview (2) -- Probable Duplicate¶

Property	Value
Definition	Probable duplicate flagged by the dedup algorithm but below the auto-confirmation confidence threshold. Awaiting admin review in the duplicate review queue.
Trigger	ASySD flags pair as probable duplicate (below auto-confirm threshold)
Set by	System (dedup algorithm)
Appears in stage pools	No -- PendingDuplicateReview studies are conservatively excluded from pools to prevent screening duplicates
PRISMA box	Not directly counted in PRISMA. If confirmed as duplicate, moves to Duplicate and counted in box 3. If rejected, returns to Active.

FullTextSought (3) -- Retrieval Attempted¶

Property	Value
Definition	Full text retrieval has been attempted for this study. This is a transitional state between title/abstract screening and full-text screening.
Trigger	Admin or system initiates full-text retrieval for a study that passed title/abstract screening
Set by	System (retrieval workflow), Admin (manual action)
Appears in stage pools	No -- study is in retrieval limbo, not yet available for full-text screening
PRISMA box	Box 6 (`dbr_sought_reports`) for Column 1 sources; Box 12 (`other_sought_reports`) for Column 2 sources. Count includes studies that have been sought, whether or not retrieval succeeded.

FullTextNotRetrieved (4) -- Retrieval Failed¶

Property	Value
Definition	Full text could not be obtained. The study cannot proceed to full-text screening. This is a terminal state (but admin can override).
Trigger	Retrieval process reports failure; admin marks as not retrievable
Set by	System (retrieval failure), Admin (manual determination)
Appears in stage pools	No -- study cannot be screened without full text
PRISMA box	Box 7 (`dbr_notretrieved_reports`) for Column 1 sources; Box 13 (`other_notretrieved_reports`) for Column 2 sources.

Included (5) -- Final Inclusion¶

Property	Value
Definition	Study is included in the systematic review. This is a terminal state determined when the study passes all required screening profiles.
Trigger	Study passes all required screening profiles (all screeningOutcomes show result = Included)
Set by	System (automatically when all screening profiles resolve to Included)
Appears in stage pools	No -- study has completed the pipeline. It MAY appear in annotation pools for data extraction.
PRISMA box	Box 10 (`new_studies`, `new_reports`), Box 16 (`total_studies`, `total_reports`).

Merged (6) -- Merged During Dedup¶

Property	Value
Definition	Study has been merged into another study during duplicate resolution. The original study's data (Citations) is preserved but the study itself is no longer an active participant in the review.
Trigger	Admin confirms merge during duplicate review; Citations moved to canonical study
Set by	Admin (via merge workflow)
Appears in stage pools	No -- the merged study is superseded by the canonical study
PRISMA box	Box 3: Merged studies contribute to the `duplicates` count (Citations count - unique Studies).

RemovedByAutomation (7) -- Pre-Screen Automation Removal¶

Property	Value
Definition	Study was removed by an automation tool before screening. Examples: machine learning classifiers, rule-based filters, format validators.
Trigger	Automation tool marks study as ineligible
Set by	System (automation tool)
Appears in stage pools	No
PRISMA box	Box 3: `excluded_automatic` count.

RemovedOther (8) -- Pre-Screen Other Removal¶

Property	Value
Definition	Study was removed for pre-screening reasons other than automation or deduplication. Examples: retracted publications, known irrelevant records, admin cleanup.
Trigger	Admin marks study for removal
Set by	Admin
Appears in stage pools	No
PRISMA box	Box 3: `excluded_other` count.

4. Valid State Transitions¶

State Transition Diagram¶

                                    +--- RemovedByAutomation (terminal)
                                    |
Import ---> Active ---+---> Duplicate (terminal, reversible by admin)
                      |
                      +---> PendingDuplicateReview ---+---> Duplicate
                      |                               +---> Active (admin: not duplicate)
                      |
                      +---> FullTextSought ---+---> FullTextNotRetrieved (terminal)
                      |                       +---> Active (full text retrieved successfully)
                      |
                      +---> Included (terminal)
                      |
                      +---> Merged (terminal)
                      |
                      +---> RemovedOther (terminal)
                      |
                      +--- RemovedByAutomation (terminal)

State Transition Table¶

#	From State	To State	Trigger	Actor	Reversible?	Notes
T1	(new)	Active	Study imported	System	N/A	Initial state for all imported studies
T2	Active	Duplicate	Dedup auto-confirm (high confidence)	System	Yes (admin)	Admin can reverse to Active
T3	Active	PendingDuplicateReview	Dedup flags probable duplicate	System	Yes (resolves to Duplicate or Active)	Transitional state
T4	Active	FullTextSought	Full-text retrieval initiated	System/Admin	Yes (returns to Active on success)	Transitional state
T5	Active	Included	All screening profiles pass	System	Admin override only	Terminal state
T6	Active	Merged	Admin confirms merge	Admin	Admin override only	Terminal state
T7	Active	RemovedByAutomation	Automation tool marks ineligible	System	Admin override only	Terminal state
T8	Active	RemovedOther	Admin removes for other reasons	Admin	Admin override only	Terminal state
T9	PendingDuplicateReview	Duplicate	Admin confirms duplicate	Admin	Yes (admin can reverse to Active)	Admin review resolution
T10	PendingDuplicateReview	Active	Admin rejects duplicate	Admin	N/A (already resolved)	Admin review resolution
T11	Duplicate	Active	Admin reverses dedup decision	Admin	Yes	Exceptional: admin determines studies are not duplicates
T12	FullTextSought	Active	Full text obtained successfully	System/Admin	N/A	Study returns to Active for full-text screening
T13	FullTextSought	FullTextNotRetrieved	Retrieval fails	System/Admin	Admin override only	Terminal state

Key Transition Rules¶

Only Active studies appear in screening stage pools. This is a system invariant. All other lifecycle states exclude the study from all screening pools.
PendingDuplicateReview studies are conservatively excluded from pools. Even though they may not be duplicates, they are withheld from screening until an admin resolves the status. This prevents wasted screening effort on probable duplicates.
Duplicate -> Active is allowed. An admin can reverse a dedup decision if they determine the studies are not actually duplicates. This is important for quality assurance.
FullTextSought -> Active is the normal flow. When full text is obtained, the study returns to Active and becomes available for full-text screening.
Terminal states (Included, Merged, RemovedByAutomation, RemovedOther, FullTextNotRetrieved) are normally irreversible. Admin can override in exceptional cases, but the system does not provide routine transition paths back from terminal states.
Lifecycle status transitions MUST NOT be triggered by screening outcomes. The system SHALL NOT automatically change lifecycleStatus when a single screening profile excludes a study. The transition to Included happens ONLY when all required screening profiles resolve to Included.

5. Screening Outcome Model¶

This section defines the per-profile screening outcome structure that complements the lifecycle status model. The full ScreeningOutcome specification will be detailed in Phase 15; this placeholder ensures lifecycle status and screening outcomes are architecturally separate from the start.

ScreeningOutcome Structure¶

// Embedded array on Study: Study.screeningOutcomes[]
public class ScreeningOutcome
{
    public Guid ProfileId { get; set; }         // Which screening profile
    public Guid StageId { get; set; }           // Which stage
    public ScreeningResult Result { get; set; }  // Included, Excluded, Conflict, Pending
    public string? PrimaryExclusionReason { get; set; }  // For PRISMA box 9/15
    public DateTime? ResolvedAt { get; set; }
    public ScreeningAuthority Authority { get; set; }  // CandidateAgreement, Reconciled
}

public enum ScreeningResult
{
    Pending = 0,     // Not yet determined
    Included = 1,    // Passed this screening profile
    Excluded = 2,    // Failed this screening profile
    Conflict = 3     // Screening disagreement, awaiting reconciliation
}

public enum ScreeningAuthority
{
    CandidateAgreement = 0,  // All screeners agreed
    Reconciled = 1           // Reconciler resolved disagreement
}

How Screening Outcomes and Lifecycle Status Interact¶

Screening outcomes are per-profile. A study may have outcomes for multiple screening profiles (e.g., one for title/abstract, one for full-text). Each entry in screeningOutcomes[] represents one profile's determination.
Lifecycle status reflects the aggregate result. The system evaluates all screeningOutcomes[] entries to determine lifecycle transitions:
If ALL required profiles have Result = Included, the system MAY transition lifecycleStatus to Included.
If any profile has Result = Excluded, the study does NOT change lifecycle status -- it remains Active but is excluded from downstream pools governed by that profile's outcome.
PRISMA counting uses both independently:
Box 4 (records_screened): Uses stage pool membership (Active studies entering screening)
Box 5 (records_excluded at T/A): Uses screeningOutcomes[TA_PROFILE].Result = Excluded
Box 8 (dbr_assessed): Uses stage pool membership (studies entering FT screening)
Box 9 (dbr_excluded with reasons): Uses screeningOutcomes[FT_PROFILE].PrimaryExclusionReason
Box 10 (new_studies): Uses lifecycleStatus = Included

Note: This is a design-phase specification. The full ScreeningOutcome model will be detailed in Phase 15. This placeholder ensures lifecycle status and screening outcomes are architecturally separate from the start.

6. Source Type Taxonomy¶

6a. SearchSourceType Enum¶

public enum SearchSourceType
{
    // Column 1: Databases and Registers (PRISMA boxes 2, 4-9)
    Database = 0,          // Bibliographic databases: PubMed, Embase, CINAHL,
                           // Web of Science, Scopus, PsycINFO, etc.
    Register = 1,          // Study registers: ClinicalTrials.gov, CENTRAL, ICTRP, etc.

    // Column 2: Other Sources (PRISMA boxes 11-15)
    Website = 2,           // Website searches (including Google Scholar for
                           // subject searching in some cases)
    Organisation = 3,      // Organisations contacted for studies
    CitationSearching = 4, // Forward/backward citation chasing
    Other = 5              // Other methods: expert contacts, conference abstracts, etc.
}

6b. PRISMA Column Assignment¶

Source Type	PRISMA Column	PRISMA Boxes	Description
Database	Column 1: Databases and Registers	box2, box4-box9	Bibliographic databases (PubMed, Embase, Scopus, etc.)
Register	Column 1: Databases and Registers	box2, box4-box9	Study registers (ClinicalTrials.gov, CENTRAL, ICTRP)
Website	Column 2: Other Sources	box11-box15	Website searches
Organisation	Column 2: Other Sources	box11-box15	Organisations contacted for studies
CitationSearching	Column 2: Other Sources	box11-box15	Forward/backward citation chasing
Other	Column 2: Other Sources	box11-box15	Other methods (expert contacts, conference abstracts)

Column assignment rule:

Column 1 (Databases/Registers): sourceType IN (Database, Register)
Column 2 (Other Sources): sourceType IN (Website, Organisation, CitationSearching, Other)

6c. Entity Modifications¶

Fields to add to SystematicSearch:

Field	Type	Nullable	Description
`sourceType`	SearchSourceType?	Yes (nullable for migration)	PRISMA source classification
`sourceName`	string?	Yes	Free-text source name (e.g., "PubMed", "Embase")

Fields to add to SearchImportJob:

Field	Type	Nullable	Description
`sourceType`	SearchSourceType?	Yes	Source type known at import time
`sourceName`	string?	Yes	Source name known at import time

Rationale for both entities: The SystematicSearch entity represents the search itself, while SearchImportJob represents a specific import operation. Source type is known at import time and propagated to the SystematicSearch for PRISMA reporting. Both entities carry the field to ensure traceability from import to PRISMA count.

6d. Common Source Name Registry¶

The following table provides suggested standard source names to promote consistency across projects. Source names are free text, not an enum -- this registry is advisory only.

SearchSourceType	Suggested Names
Database	PubMed, Embase, CINAHL, Web of Science, Scopus, PsycINFO, MEDLINE, Cochrane Library
Register	ClinicalTrials.gov, CENTRAL, ICTRP, WHO ICTRP, EU Clinical Trials Register
Website	Google Scholar, specific institutional websites
Organisation	(free text: organization name)
CitationSearching	Forward citation, Backward citation, Snowballing
Other	Expert contact, Conference abstract, Grey literature

Note: Source name is free text, not an enum. The registry is advisory to promote consistency. A future UI enhancement MAY provide autocomplete suggestions from this registry.

6e. LibraryFileType to SourceType Inference¶

For migration (Phase 16), the following table documents which LibraryFileType values can be safely inferred to a SearchSourceType. This enables automated backfill of the sourceType field on existing SystematicSearch records.

LibraryFileType	Inferred SourceType	Confidence	Rationale
PubmedXml	Database	HIGH	PubMed is always a bibliographic database
EndnoteXml	Unknown	LOW	Could be any source exported via EndNote
LivingSearchJson	Unknown	LOW	Depends on search configuration; could be database or register
TsvLibrary	Unknown	LOW	Generic format; could come from any source type
CsvLibrary	Unknown	LOW	Generic format; could come from any source type

Migration rule: Only PubmedXml SHALL be automatically backfilled to sourceType = Database. All other file types SHALL leave sourceType as null, requiring admin manual classification. The admin interface SHALL provide a bulk classification tool for setting source types on existing searches.

7. PRISMA Count Derivation Rules¶

This section provides the complete derivation table linking every PRISMA box to the exact query that populates it. All 17 boxes and 34 fields are covered.

Notation conventions: - ss = SystematicSearch entity - s = Study entity - c = Citation (embedded on Study as s.citations[]) - s.sourceColumn = derived from the sourceType of the study's primary Citation. Column 1 = Database or Register; Column 2 = Website, Organisation, CitationSearching, or Other

Box 1: Previous Studies (Updated Reviews -- DEFERRED)¶

PRISMA Field	Derivation	Data Source	Status
`previous_studies`	N/A -- SyRF does not support updated reviews	N/A	DEFERRED
`previous_reports`	N/A -- SyRF does not support updated reviews	N/A	DEFERRED

Box 2: Records Identified from Databases and Registers¶

PRISMA Field	Derivation	Data Source
`database_results`	`SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType = Database`	SystematicSearch (ss)
`database_specific_results`	`GROUP BY ss.sourceName WHERE ss.projectId = :projectId AND ss.sourceType = Database; FORMAT "{sourceName} (n={numberOfCitations})"`	SystematicSearch
`register_results`	`SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType = Register`	SystematicSearch
`register_specific_results`	`GROUP BY ss.sourceName WHERE ss.projectId = :projectId AND ss.sourceType = Register; FORMAT "{sourceName} (n={numberOfCitations})"`	SystematicSearch

Box 3: Records Removed Before Screening¶

PRISMA Field	Derivation	Data Source
`duplicates`	`LET totalCitations = SUM(COUNT(s.citations)) across all s WHERE s.projectId = :projectId; LET uniqueActiveStudies = COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus NOT IN (Duplicate, Merged); RETURN totalCitations - uniqueActiveStudies`	Citation ©, Study (s)
`excluded_automatic`	`COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus = RemovedByAutomation`	Study
`excluded_other`	`COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus = RemovedOther`	Study

Box 4: Records Screened (Databases/Registers)¶

PRISMA Field	Derivation	Data Source
`records_screened`	`COUNT(s) WHERE s.projectId = :projectId AND s ENTERED title/abstract screening stage pool AND s.sourceColumn = 1`	Study + Stage pool tracking

Note: "Entered stage pool" requires stage pool tracking. The precise mechanism is specified in Phase 14 (Stage Filtering). For PRISMA purposes, a study is "screened" if it was made available to screeners in the title/abstract screening stage, regardless of the outcome.

Box 5: Records Excluded at Title/Abstract (Databases/Registers)¶

PRISMA Field	Derivation	Data Source
`records_excluded`	`COUNT(s) WHERE s.projectId = :projectId AND s.screeningOutcomes[titleAbstractProfileId].Result = Excluded AND s.sourceColumn = 1`	Study.screeningOutcomes

Box 6: Reports Sought for Retrieval (Databases/Registers)¶

PRISMA Field	Derivation	Data Source
`dbr_sought_reports`	`COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus IN (FullTextSought, Active, Included) AND s.screeningOutcomes[titleAbstractProfileId].Result = Included AND s.sourceColumn = 1`	Study

Note: This counts studies that passed T/A screening and entered the retrieval phase. Studies with lifecycleStatus = Active are included because a successful retrieval returns the study to Active for FT screening.

Box 7: Reports Not Retrieved (Databases/Registers)¶

PRISMA Field	Derivation	Data Source
`dbr_notretrieved_reports`	`COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus = FullTextNotRetrieved AND s.sourceColumn = 1`	Study

Box 8: Reports Assessed for Eligibility (Databases/Registers)¶

PRISMA Field	Derivation	Data Source
`dbr_assessed`	`COUNT(s) WHERE s.projectId = :projectId AND s ENTERED full-text screening stage pool AND s.sourceColumn = 1`	Study + Stage pool

Box 9: Reports Excluded with Reasons (Databases/Registers)¶

PRISMA Field	Derivation	Data Source
`dbr_excluded`	`GROUP BY s.screeningOutcomes[fullTextProfileId].PrimaryExclusionReason WHERE s.projectId = :projectId AND s.screeningOutcomes[fullTextProfileId].Result = Excluded AND s.sourceColumn = 1; FORMAT "{reason} (n={count})"`	Study.screeningOutcomes

Box 10: New Studies Included¶

PRISMA Field	Derivation	Data Source
`new_studies`	`COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus = Included`	Study
`new_reports`	`LET includedStudies = s WHERE s.projectId = :projectId AND s.lifecycleStatus = Included; SUM(COUNT(s.citations)) for each includedStudy`	Citation

Box 11: Records Identified from Other Sources¶

PRISMA Field	Derivation	Data Source
`website_results`	`SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType = Website`	SystematicSearch
`organisation_results`	`SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType = Organisation`	SystematicSearch
`citations_results`	`SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType = CitationSearching`	SystematicSearch

Box 12: Reports Sought for Retrieval (Other Sources)¶

PRISMA Field	Derivation	Data Source
`other_sought_reports`	Same as box 6 (`dbr_sought_reports`) but `s.sourceColumn = 2`	Study

Box 13: Reports Not Retrieved (Other Sources)¶

PRISMA Field	Derivation	Data Source
`other_notretrieved_reports`	Same as box 7 (`dbr_notretrieved_reports`) but `s.sourceColumn = 2`	Study

Box 14: Reports Assessed for Eligibility (Other Sources)¶

PRISMA Field	Derivation	Data Source
`other_assessed`	Same as box 8 (`dbr_assessed`) but `s.sourceColumn = 2`	Study + Stage pool

Box 15: Reports Excluded with Reasons (Other Sources)¶

PRISMA Field	Derivation	Data Source
`other_excluded`	Same as box 9 (`dbr_excluded`) but `s.sourceColumn = 2`	Study.screeningOutcomes

Box 16: Total Studies Included¶

PRISMA Field	Derivation	Data Source
`total_studies`	Same as box 10 `new_studies` (no previous studies for non-updated reviews)	Study
`total_reports`	Same as box 10 `new_reports`	Citation

Note: When updated review support is added (box 1), box 16 becomes new_studies + previous_studies and new_reports + previous_reports.

Box 17: Studies Included in Meta-Analysis¶

PRISMA Field	Derivation	Data Source
`total_studies_ma`	`COUNT(s) WHERE s.projectId = :projectId AND s.lifecycleStatus = Included AND s.metaAnalysisIncluded = true`	Study
`total_reports_ma`	`LET maStudies = s WHERE s.lifecycleStatus = Included AND s.metaAnalysisIncluded = true; SUM(COUNT(s.citations)) for each maStudy`	Citation

Source Column Derivation¶

The s.sourceColumn used throughout the derivation rules is derived from the study's import history:

sourceColumn(study) =
  LET primaryCitation = study.citations[0]  // First (earliest) Citation
  LET sourceType = primaryCitation.sourceType
  IF sourceType IN (Database, Register) THEN 1      // Column 1: Databases/Registers
  ELSE IF sourceType IN (Website, Organisation, CitationSearching, Other) THEN 2  // Column 2: Other
  ELSE NULL                                          // Unclassified (legacy data)

Rule: When a study has multiple Citations from different source types (e.g., found in both PubMed and a website), the FIRST import's sourceType determines the PRISMA column assignment. All Citations are preserved for per-source counting (box 2, box 11), but the study appears in only one column for screening/inclusion boxes (4-9, 12-15).

8. Current Data Model Gap Summary¶

This table provides a quick reference for the specific fields needed on each entity to support PRISMA count derivation.

Entity	Field	Type	Added In	PRISMA Boxes Served	Status
SystematicSearch	`sourceType`	SearchSourceType?	Phase 7	box2, box11	Gap -- not present
SystematicSearch	`sourceName`	string?	Phase 7	box2, box11	Gap -- not present
SearchImportJob	`sourceType`	SearchSourceType?	Phase 12	(propagates to SystematicSearch)	Gap -- not present
SearchImportJob	`sourceName`	string?	Phase 12	(propagates to SystematicSearch)	Gap -- not present
Study	`lifecycleStatus`	StudyLifecycleStatus?	Phase 12	box3, box6-7, box10, box12-13, box16	Gap -- not present
Study	`citations[]`	Citation[]	Phase 12	box2, box3, box10-11, box16-17	Gap -- not present
Study	`publicationId`	Guid?	Phase 12	(cross-project identity)	Gap -- not present
Study	`duplicateGroupId`	Guid?	Phase 12	box3	Gap -- not present
Study	`fullTextStatus`	FullTextStatus?	Phase 12	box6-7, box8, box12-14	Gap -- not present
Study	`screeningOutcomes[]`	ScreeningOutcome[]	Phase 13/15	box4-5, box8-9, box14-15	Gap -- not present
Study	`metaAnalysisIncluded`	bool?	Phase 16	box17	Gap -- not present

Summary: 11 fields across 3 entities are needed for complete PRISMA support. All are additive (nullable) and introduced incrementally across Phases 7, 12, 13/15, and 16.

9. Edge Cases¶

Study Imported from Multiple Sources of Different Types¶

Scenario: A study's Citations include one from PubMed (Database, Column 1) and one from a website (Website, Column 2).

Rule: Use the FIRST import's sourceType for PRISMA column assignment. The study appears in one column for screening/inclusion boxes (4-9, 12-15). All Citations are preserved and counted in their respective source boxes (box 2, box 11).

Rationale: PRISMA requires each study to appear in exactly one column for the screening/inclusion phases. The first import represents the study's initial entry point into the review. Per-source record counts (boxes 2, 11) remain accurate because Citations are immutable and always counted by their own sourceType.

Study with No SourceType (Legacy Data)¶

Scenario: An existing study's SystematicSearch has no sourceType (field is null).

Rule: The study is counted in PRISMA totals that do not require source column assignment (e.g., box 3 duplicates, box 10 included studies). For source-specific boxes (box 2, box 4-9, box 11-15), the study cannot be assigned to a column and is excluded from those counts.

Migration: Phase 16 backfills sourceType where determinable from LibraryFileType (only PubmedXml -> Database is high-confidence). Admin interface allows manual classification for remaining records.

Study Screened Under Multiple Profiles¶

Scenario: A study is screened under a title/abstract profile and then a full-text profile, with different outcomes.

Rule: Each profile produces its own entry in screeningOutcomes[]. PRISMA counts use specific profiles: - Title/abstract profile outcomes feed boxes 4-5 (records screened, records excluded) - Full-text profile outcomes feed boxes 8-9, 14-15 (reports assessed, reports excluded with reasons)

There is no conflict because each PRISMA box references a specific screening profile, not the aggregate of all profiles.

Study Passes All Screening but Has FullTextNotRetrieved Status¶

Scenario: Edge case where lifecycle status is FullTextNotRetrieved but a screening outcome shows Included.

Rule: This scenario SHOULD NOT occur in normal operation. FullTextNotRetrieved studies are excluded from full-text screening pools, so they cannot receive a full-text screening outcome. If it occurs (e.g., due to admin override), the lifecycleStatus takes precedence: the study is NOT counted as included (box 10) because FullTextNotRetrieved is a terminal state.

Duplicate Study with Existing Screening Data¶

Scenario: A study is screened and then later identified as a duplicate (e.g., a late-arriving import reveals the duplicate relationship).

Rule: When the study's lifecycleStatus transitions to Duplicate, the study is excluded from all pools and from PRISMA screening counts. The screeningOutcomes[] data is preserved (not deleted) for audit purposes, but it is no longer counted in PRISMA boxes. The screening data from the duplicate study MAY be available for reconciliation on the canonical study (see deduplication specification, Phase 12).

10. Cross-References¶

PRISMA Flow Diagram Mapping: prisma-flow-diagram-mapping.md -- Complete box-to-field mapping referencing the lifecycle states and source types defined here
Three-Level Data Model: three-level-data-model.md -- Entity specifications for Publication, Citation, and Study that carry the fields referenced in this document (Study.lifecycleStatus, Study.screeningOutcomes[], Citation.sourceType, SystematicSearch.sourceType)

Requirement Coverage¶

Requirement ID	Coverage in This Document
PRISMA-01	Complete: All 34 PRISMA fields have derivation rules using lifecycle status and/or screening outcomes
PRISMA-02	Complete: Source type taxonomy with 6 values and PRISMA column assignment rules
PRISMA-04	Complete: Study lifecycle status model with 9 states, valid transitions, and PRISMA count derivation rules
PRISMA-05	Partial: Deduplication counts derivable per source type via Citation.sourceType; full dedup counting requires Phase 12 implementation