Skip to content

PRISMA 2020 Flow Diagram: Box-to-Field Mapping Specification

Purpose

This document is a binding constraint on all data model decisions in Phases 3-16 of the SyRF platform evolution. Every field in the PRISMA 2020 flow diagram is mapped to a specific SyRF data model entity, field, and derivation rule. Implementation phases MUST NOT introduce data structures that make any derivation rule in this document impossible to execute.

Normative language: "MUST" indicates an absolute requirement. "SHALL" indicates mandatory behavior. "SHOULD" indicates a strong recommendation. "MAY" indicates optional behavior.

Source authority: PRISMA 2020 Statement (Page et al., 2021, BMJ 372:n71) and the PRISMA2020 R package CSV template (github.com/prisma-flowdiagram/PRISMA2020).

1. PRISMA 2020 Flow Diagram Overview

Template Variants

The PRISMA 2020 flow diagram has four template variants:

Variant Review Type Source Columns Boxes
New review, databases/registers only New 1 column Reduced
New review, databases/registers + other sources New 2 columns Full (17 boxes)
Updated review, databases/registers only Updated 1 column + previous studies Reduced + box1
Updated review, databases/registers + other sources Updated 2 columns + previous studies Full + box1

SyRF targets the most comprehensive variant: "new review with databases, registers, and other sources" (17 boxes, 34 fields). The data model MUST support all 34 fields. The "updated review" boxes (box1) are deferred but the data model SHALL NOT preclude future support.

Three Phases of the PRISMA Flow

  1. Identification -- Records discovered from all sources, with duplicate/automation removal
  2. Screening -- Records screened at title/abstract and full-text levels, split by source column
  3. Included -- Studies and reports included in the final review and/or meta-analysis

PRISMA Source Columns

The PRISMA 2020 diagram separates identification sources into two parallel columns:

  • Column 1: Databases and Registers -- Bibliographic databases (PubMed, Embase, Scopus, etc.) and study registers (ClinicalTrials.gov, CENTRAL, ICTRP)
  • Column 2: Other Sources -- Websites, organisations, citation searching (forward/backward), and other methods

All subsequent screening and inclusion boxes are tracked per-column to provide per-source-type transparency.

2. Terminology Mapping

PRISMA 2020 distinguishes three levels of bibliographic granularity (Page et al., 2021; Rethlefsen et al., 2022). This section provides the binding mapping between PRISMA terminology and SyRF entities.

PRISMA Term Definition SyRF Entity SyRF Scope Relationship
Record Individual citation retrieved from a database or source Citation Project-scoped Immutable. One created per citation per import.
Record (deduplicated) Unique bibliographic identity after duplicate removal Publication (pmPublication) System-scoped Global. Accumulates best-of-breed metadata across all projects.
Report Full-text document retrieved for eligibility assessment Study with fullTextStatus tracking Project-scoped A record that has progressed to full-text retrieval.
Study Unique research investigation (may have multiple reports) Study (pmStudy) after dedup consolidation Project-scoped Multiple Citations may link to one Study.

Key invariants:

  • An Citation is NEVER deleted after creation (PRISMA requirement: per-source counting MUST remain derivable).
  • A Publication is NEVER deleted (system-scoped bibliographic identity persists even if all linked Citations are removed from projects).
  • A Study may have many Citations (after dedup confirms they represent the same research).
  • The fullTextStatus field on Study distinguishes between "record" and "report" in PRISMA terminology.

3. Box-by-Box Mapping

3.1 Identification Phase

Box 1: Previous Studies (Updated Reviews -- Deferred)

# PRISMA Field Description SyRF Data Source Derivation Rule Phase Req ID
1 previous_studies Studies included in previous version of the review N/A (deferred) N/A -- SyRF does not currently support updated reviews Deferred PRISMA-01
2 previous_reports Reports included in previous version of the review N/A (deferred) N/A -- SyRF does not currently support updated reviews Deferred PRISMA-01

Box 2: Records Identified from Databases and Registers

# PRISMA Field Description SyRF Data Source Derivation Rule Phase Req ID
3 database_results Total records identified from databases SystematicSearch.numberOfCitations SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType = Database 7, 12 PRISMA-01, PRISMA-02
4 database_specific_results Per-database breakdown (e.g., "PubMed, n=450; Embase, n=320") SystematicSearch.sourceName, SystematicSearch.numberOfCitations GROUP BY ss.sourceName WHERE ss.sourceType = Database; FORMAT AS "{sourceName} (n={numberOfCitations})" 7, 12 PRISMA-01, PRISMA-02
5 register_results Total records identified from registers SystematicSearch.numberOfCitations SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType = Register 7, 12 PRISMA-01, PRISMA-02
6 register_specific_results Per-register breakdown SystematicSearch.sourceName, SystematicSearch.numberOfCitations GROUP BY ss.sourceName WHERE ss.sourceType = Register; FORMAT AS "{sourceName} (n={numberOfCitations})" 7, 12 PRISMA-01, PRISMA-02

Box 3: Records Removed Before Screening

# PRISMA Field Description SyRF Data Source Derivation Rule Phase Req ID
7 duplicates Duplicate records removed Study.lifecycleStatus, Study.citations[] LET totalCitations = COUNT(study.citations) across all studies in project; LET uniqueStudies = COUNT(studies WHERE lifecycleStatus NOT IN (Duplicate, Merged)); RETURN totalCitations - uniqueStudies 12 PRISMA-01, DEDUP-01
8 excluded_automatic Records marked ineligible by automation tools Study.lifecycleStatus COUNT(studies) WHERE studies.projectId = :projectId AND studies.lifecycleStatus = RemovedByAutomation 12, 16 PRISMA-01
9 excluded_other Records removed for other pre-screening reasons Study.lifecycleStatus COUNT(studies) WHERE studies.projectId = :projectId AND studies.lifecycleStatus = RemovedOther 12, 16 PRISMA-01

Box 11: Records Identified from Other Sources

# PRISMA Field Description SyRF Data Source Derivation Rule Phase Req ID
10 website_results Records from websites SystematicSearch.numberOfCitations SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType = Website 7, 12 PRISMA-01, PRISMA-02
11 organisation_results Records from organisations SystematicSearch.numberOfCitations SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType = Organisation 7, 12 PRISMA-01, PRISMA-02
12 citations_results Records from citation searching SystematicSearch.numberOfCitations SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType = CitationSearching 7, 12 PRISMA-01, PRISMA-02

3.2 Screening Phase -- Databases/Registers Column

Box 4: Records Screened (Databases/Registers)

# PRISMA Field Description SyRF Data Source Derivation Rule Phase Req ID
13 records_screened Records screened at title/abstract (databases/registers) Study.citations[].sourceType, Study.lifecycleStatus, screening stage pool COUNT(studies) WHERE studies.projectId = :projectId AND studies.lifecycleStatus = Active AND study HAS citation WITH sourceType IN (Database, Register) AND study ENTERED title/abstract screening stage pool 13, 14 PRISMA-01, SCR-05

Box 5: Records Excluded at Title/Abstract (Databases/Registers)

# PRISMA Field Description SyRF Data Source Derivation Rule Phase Req ID
14 records_excluded Records excluded at title/abstract screening (databases/registers) Study.screeningOutcomes[], Study.citations[].sourceType COUNT(studies) WHERE studies.projectId = :projectId AND study HAS citation WITH sourceType IN (Database, Register) AND FinalScreeningOutcome(titleAbstractProfileId) = Excluded 15 PRISMA-01, SCR-04

Box 6: Reports Sought for Retrieval (Databases/Registers)

# PRISMA Field Description SyRF Data Source Derivation Rule Phase Req ID
15 dbr_sought_reports Reports sought for full-text retrieval (databases/registers) Study.fullTextStatus, Study.citations[].sourceType COUNT(studies) WHERE studies.projectId = :projectId AND study HAS citation WITH sourceType IN (Database, Register) AND study.fullTextStatus IN (Sought, Retrieved, NotRetrieved) AND FinalScreeningOutcome(titleAbstractProfileId) = Included 12, 16 PRISMA-01

Box 7: Reports Not Retrieved (Databases/Registers)

# PRISMA Field Description SyRF Data Source Derivation Rule Phase Req ID
16 dbr_notretrieved_reports Reports not retrieved (databases/registers) Study.fullTextStatus, Study.citations[].sourceType COUNT(studies) WHERE studies.projectId = :projectId AND study HAS citation WITH sourceType IN (Database, Register) AND study.fullTextStatus = NotRetrieved 12, 16 PRISMA-01

Box 8: Reports Assessed for Eligibility (Databases/Registers)

# PRISMA Field Description SyRF Data Source Derivation Rule Phase Req ID
17 dbr_assessed Reports assessed for eligibility at full-text (databases/registers) Study.fullTextStatus, Study.citations[].sourceType, full-text screening stage pool COUNT(studies) WHERE studies.projectId = :projectId AND study HAS citation WITH sourceType IN (Database, Register) AND study.fullTextStatus = Retrieved AND study ENTERED full-text screening stage pool 13, 14, 15 PRISMA-01, SCR-05

Box 9: Reports Excluded with Reasons (Databases/Registers)

# PRISMA Field Description SyRF Data Source Derivation Rule Phase Req ID
18 dbr_excluded Reports excluded with reasons at full-text (databases/registers) Study.screeningOutcomes[], Study.citations[].sourceType GROUP BY FinalScreeningOutcome(fullTextProfileId).annotation.primaryReason WHERE study HAS citation WITH sourceType IN (Database, Register) AND FinalScreeningOutcome(fullTextProfileId) = Excluded; FORMAT AS "{reason} (n={count})" 15 PRISMA-01, SCR-02

3.3 Screening Phase -- Other Sources Column

Box 12: Reports Sought for Retrieval (Other Sources)

# PRISMA Field Description SyRF Data Source Derivation Rule Phase Req ID
19 other_sought_reports Reports sought for retrieval (other sources) Study.fullTextStatus, Study.citations[].sourceType COUNT(studies) WHERE studies.projectId = :projectId AND study HAS citation WITH sourceType IN (Website, Organisation, CitationSearching, Other) AND study.fullTextStatus IN (Sought, Retrieved, NotRetrieved) 12, 16 PRISMA-01

Box 13: Reports Not Retrieved (Other Sources)

# PRISMA Field Description SyRF Data Source Derivation Rule Phase Req ID
20 other_notretrieved_reports Reports not retrieved (other sources) Study.fullTextStatus, Study.citations[].sourceType COUNT(studies) WHERE studies.projectId = :projectId AND study HAS citation WITH sourceType IN (Website, Organisation, CitationSearching, Other) AND study.fullTextStatus = NotRetrieved 12, 16 PRISMA-01

Box 14: Reports Assessed for Eligibility (Other Sources)

# PRISMA Field Description SyRF Data Source Derivation Rule Phase Req ID
21 other_assessed Reports assessed for eligibility at full-text (other sources) Study.fullTextStatus, Study.citations[].sourceType, full-text screening stage pool COUNT(studies) WHERE studies.projectId = :projectId AND study HAS citation WITH sourceType IN (Website, Organisation, CitationSearching, Other) AND study.fullTextStatus = Retrieved AND study ENTERED full-text screening stage pool 13, 14, 15 PRISMA-01, SCR-05

Box 15: Reports Excluded with Reasons (Other Sources)

# PRISMA Field Description SyRF Data Source Derivation Rule Phase Req ID
22 other_excluded Reports excluded with reasons at full-text (other sources) Study.screeningOutcomes[], Study.citations[].sourceType GROUP BY FinalScreeningOutcome(fullTextProfileId).annotation.primaryReason WHERE study HAS citation WITH sourceType IN (Website, Organisation, CitationSearching, Other) AND FinalScreeningOutcome(fullTextProfileId) = Excluded; FORMAT AS "{reason} (n={count})" 15 PRISMA-01, SCR-02

3.4 Included Phase

Box 10: New Studies Included

# PRISMA Field Description SyRF Data Source Derivation Rule Phase Req ID
23 new_studies New studies included in the review Study.lifecycleStatus COUNT(studies) WHERE studies.projectId = :projectId AND studies.lifecycleStatus = Included 12, 16 PRISMA-01, PRISMA-03
24 new_reports Reports of new included studies Study.lifecycleStatus, Study.citations[] LET includedStudies = studies WHERE lifecycleStatus = Included; SUM(COUNT(study.citations)) for each includedStudy 12, 16 PRISMA-01, PRISMA-03

Box 16: Total Studies Included

# PRISMA Field Description SyRF Data Source Derivation Rule Phase Req ID
25 total_studies Total studies included in the review (new + previous) Study.lifecycleStatus COUNT(studies) WHERE studies.projectId = :projectId AND studies.lifecycleStatus = Included (same as new_studies for non-updated reviews) 12, 16 PRISMA-01
26 total_reports Total reports of included studies Study.lifecycleStatus, Study.citations[] LET includedStudies = studies WHERE lifecycleStatus = Included; SUM(COUNT(study.citations)) for each includedStudy (same as new_reports for non-updated reviews) 12, 16 PRISMA-01

Box 17: Studies Included in Meta-Analysis

# PRISMA Field Description SyRF Data Source Derivation Rule Phase Req ID
27 total_studies_ma Studies included in quantitative synthesis (meta-analysis) Study.lifecycleStatus, Study.metaAnalysisIncluded (future) COUNT(studies) WHERE studies.projectId = :projectId AND studies.lifecycleStatus = Included AND study.metaAnalysisIncluded = true (requires future metaAnalysisIncluded boolean on Study) 16 PRISMA-01
28 total_reports_ma Reports of studies included in meta-analysis Study.metaAnalysisIncluded, Study.citations[] LET maStudies = studies WHERE lifecycleStatus = Included AND metaAnalysisIncluded = true; SUM(COUNT(study.citations)) for each maStudy 16 PRISMA-01

3.5 Supplementary Fields (Other Source Identification Detail)

These fields appear in the PRISMA2020 R package template as sub-items of the "other sources" identification:

# PRISMA Field Description SyRF Data Source Derivation Rule Phase Req ID
29 other_results Total records from all other sources combined SystematicSearch.numberOfCitations SUM(ss.numberOfCitations) WHERE ss.projectId = :projectId AND ss.sourceType IN (Website, Organisation, CitationSearching, Other) 7, 12 PRISMA-01, PRISMA-02
30 other_specific_results Per-source breakdown for other sources SystematicSearch.sourceName, SystematicSearch.numberOfCitations GROUP BY ss.sourceName WHERE ss.sourceType IN (Website, Organisation, CitationSearching, Other); FORMAT AS "{sourceName} (n={numberOfCitations})" 7, 12 PRISMA-01, PRISMA-02

3.6 Derived Aggregate Fields

These fields appear in the PRISMA2020 R package and are computed from the box-level fields above:

# PRISMA Field Description SyRF Data Source Derivation Rule Phase Req ID
31 dbr_total_identified Total records from databases + registers Derived database_results + register_results (fields #3 + #5) 7, 12 PRISMA-01
32 other_total_identified Total records from other sources Derived website_results + organisation_results + citations_results (fields #10 + #11 + #12) 7, 12 PRISMA-01
33 total_identified Grand total records identified from all sources Derived dbr_total_identified + other_total_identified (fields #31 + #32) 7, 12 PRISMA-01
34 records_after_removal Records remaining after duplicates and pre-screen removals Derived total_identified - duplicates - excluded_automatic - excluded_other (field #33 - #7 - #8 - #9) 12 PRISMA-01

4. Source Type Taxonomy

The system MUST categorize every SystematicSearch into one of the following source types. This taxonomy directly maps to the two PRISMA columns.

SearchSourceType enum:
  Database = 0          // PRISMA Column 1: Bibliographic databases (PubMed, Embase, Scopus, etc.)
  Register = 1          // PRISMA Column 1: Study registers (ClinicalTrials.gov, CENTRAL, ICTRP)
  Website = 2           // PRISMA Column 2: Website searches
  Organisation = 3      // PRISMA Column 2: Organisational contacts
  CitationSearching = 4 // PRISMA Column 2: Forward/backward citation searching
  Other = 5             // PRISMA Column 2: Other methods (expert contacts, conference abstracts)

Column assignment rule:

  • Column 1 (Databases/Registers): sourceType IN (Database, Register)
  • Column 2 (Other Sources): sourceType IN (Website, Organisation, CitationSearching, Other)

Migration note: Existing SystematicSearch records have LibraryFileType (file format) but no sourceType. The sourceType field SHALL be nullable. Migration in Phase 7 adds the field; Phase 16 backfills where determinable (e.g., LibraryFileType.PubmedXml implies sourceType = Database). See requirement PRISMA-02.

5. Gap Analysis

The following table documents every gap between the current SyRF data model and the PRISMA 2020 requirements.

# PRISMA Requirement Current SyRF State Gap Description Resolution Phase Priority
G1 Per-source-type record counts (boxes 2, 11) SystematicSearch has NumberOfStudies but no source type classification No sourceType field on SystematicSearch -- cannot categorize searches into Database/Register/Website/Organisation/CitationSearching/Other Phase 7 (add field), Phase 16 (backfill) Critical
G2 Per-database/register breakdown (box 2) No explicit sourceName on SystematicSearch Cannot produce "PubMed, n=450; Embase, n=320" format without explicit source name Phase 7 (add field), Phase 12 (populate on import) Critical
G3 Duplicate record count (box 3) No deduplication tracking at all No mechanism to track which Citations are duplicates or how many were removed Phase 12 Critical
G4 Records excluded by automation (box 3) No lifecycle status on Study No way to mark studies as RemovedByAutomation Phase 12 (lifecycle status model) High
G5 Records excluded for other reasons (box 3) No lifecycle status on Study No way to mark studies as RemovedOther Phase 12 (lifecycle status model) High
G6 Records vs. Studies distinction Study entity conflates individual citations with unique research investigations (1:1 mapping today) Cannot derive "records identified" separately from "studies included" -- the three-level model (Publication/Citation/Study) resolves this Phase 12 Critical
G7 Full-text retrieval tracking (boxes 6-7, 12-13) Not tracked at all No fullTextStatus field -- cannot count "reports sought for retrieval" or "reports not retrieved" Phase 12 (add field to Study) High
G8 Reports excluded with structured reasons (boxes 9, 15) Screening decisions are binary (Included/Excluded) with no structured reason No exclusion reason taxonomy -- cannot produce "Reason1, n=X; Reason2, n=Y" format Phase 15 (screening annotations) Critical
G9 Per-profile screening outcomes ScreeningInfo tracks individual decisions but not stage-scoped outcomes Cannot derive per-profile Included/Excluded counts needed for boxes 4-5, 8-9, 14-15 Phase 13 (screening profiles), Phase 15 (screening outcomes) Critical
G10 Lifecycle status model No lifecycle status field on Study Cannot track study progression (Active -> Duplicate/Screened/Included) for any PRISMA terminal state boxes Phase 12 (add field + enum) Critical
G11 Meta-analysis inclusion tracking (box 17) Not tracked No metaAnalysisIncluded flag on Study -- cannot count studies/reports included in quantitative synthesis Phase 16 Medium

Summary: 11 gaps identified. 6 are Critical (block PRISMA reporting entirely), 3 are High (block specific boxes), 2 are Medium (block optional boxes).

6. Deferred Boxes

Box 1: Previous Studies (Updated Reviews)

The PRISMA 2020 flow diagram includes box 1 (previous_studies, previous_reports) for updated systematic reviews -- reviews that build upon a prior version. SyRF does not currently support the "updated review" workflow.

Decision: Box 1 fields are deferred. The data model SHALL NOT preclude future support:

  • The Study.lifecycleStatus enum MAY be extended with a PreviouslyIncluded value in a future phase.
  • The total_studies and total_reports fields (box 16) currently equal new_studies and new_reports (box 10) because there are no previous studies to add. When updated review support is added, box 16 SHALL become new_studies + previous_studies and new_reports + previous_reports.
  • No architectural changes are required to support this -- it is an additive data model extension.

Implication: For non-updated reviews, total_studies == new_studies and total_reports == new_reports. The PRISMA flow diagram generator MUST handle this equivalence correctly.

7. Cross-References

  • Three-Level Data Model Specification: three-level-data-model.md -- Formal entity specifications for Publication, Citation, and Study that this mapping references.
  • PRISMA 2020 Statement: Page et al. (2021), BMJ 372:n71 -- prisma-statement.org
  • PRISMA2020 R Package: github.com/prisma-flowdiagram/PRISMA2020 -- CSV template defining all 34 fields.
  • ASySD Paper: Hair et al. (2023), BMC Biology 21, 189 -- Deduplication algorithm.

Requirement Coverage

Requirement ID Coverage in This Document
PRISMA-01 Complete: All 34 PRISMA fields mapped to SyRF entities with derivation rules
PRISMA-02 Complete: Source type taxonomy defined with 6-value enum and column assignment rules
PRISMA-03 Partial: Terminology mapping to three-level model; full entity specs in three-level-data-model.md
DEDUP-01 Referenced: Duplicate count derivation rules reference dedup service output
SCR-02 Referenced: Structured exclusion reasons needed for boxes 9, 15
SCR-04 Referenced: FinalScreeningOutcome needed for boxes 5, 9, 14, 15
SCR-05 Referenced: Stage study pools needed for boxes 4, 8, 13, 14, 17, 21

Appendix A: PRISMA Box Reference Diagram

+------------------------------------------------------------------+
|                        IDENTIFICATION                            |
|                                                                  |
|  Databases/Registers (Column 1)    Other Sources (Column 2)      |
|  +----------------------------+    +----------------------------+|
|  | box2: Records identified   |    | box11: Records identified  ||
|  | - database_results         |    | - website_results          ||
|  | - database_specific_results|    | - organisation_results     ||
|  | - register_results         |    | - citations_results        ||
|  | - register_specific_results|    +----------------------------+|
|  +----------------------------+                                  |
|                                                                  |
|  +----------------------------+                                  |
|  | box3: Records removed      |                                  |
|  | - duplicates               |                                  |
|  | - excluded_automatic       |                                  |
|  | - excluded_other           |                                  |
|  +----------------------------+                                  |
+------------------------------------------------------------------+
|                         SCREENING                                |
|                                                                  |
|  Databases/Registers           Other Sources                     |
|  +----------------------------+  +------------------------------+|
|  | box4: records_screened     |  |                              ||
|  | box5: records_excluded     |  |                              ||
|  | box6: dbr_sought_reports   |  | box12: other_sought_reports  ||
|  | box7: dbr_notretrieved     |  | box13: other_notretrieved   ||
|  | box8: dbr_assessed         |  | box14: other_assessed       ||
|  | box9: dbr_excluded         |  | box15: other_excluded       ||
|  +----------------------------+  +------------------------------+|
+------------------------------------------------------------------+
|                         INCLUDED                                 |
|                                                                  |
|  +----------------------------+                                  |
|  | box10: new_studies,        |                                  |
|  |        new_reports         |                                  |
|  +----------------------------+                                  |
|  | box16: total_studies,      |                                  |
|  |        total_reports       |                                  |
|  +----------------------------+                                  |
|  | box17: total_studies_ma,   |                                  |
|  |        total_reports_ma    |                                  |
|  +----------------------------+                                  |
+------------------------------------------------------------------+

Appendix B: Field Index

Quick reference: all 34 fields sorted by field number.

# Field Box
1 previous_studies box1
2 previous_reports box1
3 database_results box2
4 database_specific_results box2
5 register_results box2
6 register_specific_results box2
7 duplicates box3
8 excluded_automatic box3
9 excluded_other box3
10 website_results box11
11 organisation_results box11
12 citations_results box11
13 records_screened box4
14 records_excluded box5
15 dbr_sought_reports box6
16 dbr_notretrieved_reports box7
17 dbr_assessed box8
18 dbr_excluded box9
19 other_sought_reports box12
20 other_notretrieved_reports box13
21 other_assessed box14
22 other_excluded box15
23 new_studies box10
24 new_reports box10
25 total_studies box16
26 total_reports box16
27 total_studies_ma box17
28 total_reports_ma box17
29 other_results box11
30 other_specific_results box11
31 dbr_total_identified Derived
32 other_total_identified Derived
33 total_identified Derived
34 records_after_removal Derived