Skip to content

Phase 16: Release 3 Migration, Data Export, and PRISMA

Release 3 -- The Capstone Phase 16 is the final phase of the entire roadmap. It migrates existing data, delivers enhanced data export, and produces the PRISMA 2020 flow diagram -- the deliverable that researchers and journals require.

Summary

This phase has three parts: (1) data migration to backfill study statuses and convert existing screening data to the new structured format, (2) enhanced data export including reconciliation status, agreement metrics, and screening decisions, and (3) automatic PRISMA 2020 flow diagram generation from the study data. The PRISMA diagram is the ultimate deliverable -- everything built across all three releases feeds into it.

Part 1: Data Migration

What Happens

  • Study status backfill: All existing studies receive a lifecycle status. Currently, studies have no formal status tracking. This migration sets all existing studies to "Active" and introduces the nine-state lifecycle model that tracks a study's position from import through to inclusion
  • Screening data conversion: Existing binary screening decisions (include/exclude) are migrated to the new structured format, with screeningOutcomes[] entries created on each study document
  • Source type metadata: Systematic searches are tagged with their source type (Database, Register, Website, Organisation, Citation Searching, Other) where determinable. This metadata drives the dual-column PRISMA layout
  • Stage settings migration: Legacy boolean configuration fields on stages are migrated to the new unified stage settings schema

Safety Guarantees

All changes are additive. No existing data is deleted. Rollback is possible by removing new fields with a simple database operation.

Part 2: Enhanced Data Export

What Changes

Existing CSV and Google Sheets exports are enriched with data from all three releases:

New Export Data What It Shows Source
Reconciliation status Whether each question per study has been reconciled, and how (auto-promoted, candidate agreement, or manual reconciliation) Release 2
Question version references Which question version each annotation was collected against, and which question set version was active Release 1
Agreement metrics Percent Agreement and Cohen's Kappa per question and per study Release 2
Screening decisions Structured exclusion reasons with primary and sub-reasons Release 3
Deduplication reports Import record counts, duplicate group membership, canonical enrichment provenance Release 3

Standalone Exports

  • Agreement metrics CSV: Per-question and per-study agreement statistics for publication
  • Deduplication report: Complete audit trail of duplicate detection and resolution decisions
  • Screening breakdown: Exclusion counts by reason, ready for PRISMA integration

Part 3: PRISMA 2020 Flow Diagram

The Deliverable

The PRISMA 2020 flow diagram is a standardised visual summary of the entire systematic review process. Journals require it. Funding bodies expect it. Producing it manually is tedious and error-prone.

SyRF auto-generates this diagram from the study data, showing exactly how many studies went through each stage of the review:

flowchart TD
    subgraph Identification
        B2["Records from databases/registers<br/>(n = ?)"]
        B11["Records from other sources<br/>(n = ?)"]
        B3["Duplicates removed (n = ?)<br/>Removed by automation (n = ?)<br/>Removed other (n = ?)"]
    end

    subgraph Screening
        B4["Records screened<br/>(n = ?)"]
        B5["Records excluded<br/>(n = ?)"]
        B6["Reports sought<br/>(n = ?)"]
        B7["Reports not retrieved<br/>(n = ?)"]
        B8["Reports assessed<br/>(n = ?)"]
        B9["Reports excluded with reasons<br/>Reason 1 (n = ?)<br/>Reason 2 (n = ?)<br/>Reason 3 (n = ?)"]
    end

    subgraph Included
        B10["New studies included<br/>(n = ?)"]
        B17["Studies in meta-analysis<br/>(n = ?)"]
    end

    B2 --> B3
    B3 --> B4
    B11 --> B4
    B4 --> B5
    B4 --> B6
    B6 --> B7
    B6 --> B8
    B8 --> B9
    B8 --> B10
    B10 --> B17

    style Identification fill:#e1f5fe
    style Screening fill:#fff3e0
    style Included fill:#e8f5e9

How the Numbers Are Computed

Every box in the PRISMA diagram is computed automatically from existing data:

PRISMA Section Data Source Phases That Provide the Data
Records identified Citation counts, grouped by source type Phase 12 (deduplication creates Citations)
Duplicates removed Study lifecycle status counts Phase 12 (dedup sets Duplicate/Merged status)
Records screened / excluded Screening outcomes per profile Phase 13 (profiles), Phase 15 (screening annotations)
Reports sought / not retrieved Full-text retrieval status Existing functionality, enriched with lifecycle status
Reports excluded with reasons Structured exclusion reasons Phase 15 (screening annotations)
Studies included Lifecycle status = Included Phase 12 (lifecycle model)
Studies in meta-analysis metaAnalysisIncluded flag Phase 16 (new field)

Dual-Column Layout

The PRISMA 2020 diagram separates sources into two columns:

  • Column 1 (Databases and Registers): PubMed, Embase, Scopus, ClinicalTrials.gov, CENTRAL, etc.
  • Column 2 (Other Sources): Websites, organisations contacted, citation searching, other methods

The source type taxonomy defined in Phase 2 and populated in Phase 12 drives this column assignment automatically.

Export Formats

The PRISMA diagram is available as:

  • Interactive web view: Rendered in the browser with clickable boxes that drill down to the underlying studies
  • Structured data export: JSON/CSV with all 34 PRISMA fields for use in other tools
  • Compatible with the PRISMA 2020 R package: Exported data can be fed into the community PRISMA2020 R package for publication-quality diagrams

Why This Is the Capstone

Everything in the roadmap feeds into the PRISMA diagram:

  • Release 1 (Phases 3-7): Question versioning and annotation form provide the audit trail
  • Release 2 (Phases 8-11): Reconciliation provides gold-standard answers and agreement metrics
  • Release 3 (Phases 12-16): Deduplication provides accurate record counts, screening profiles provide structured decisions, and the three-level data model makes all counts derivable

The PRISMA 2020 flow diagram is the single most important deliverable of the entire platform evolution. It transforms SyRF from a data collection tool into a complete systematic review platform.

How It Connects

Connection Detail
Phase 2 (PRISMA Specification) Implements the binding constraints defined in Phase 2 -- all 17 PRISMA boxes, 34 fields
Phase 12 (Deduplication) Citation counts and dedup status feed PRISMA identification boxes
Phase 15 (Screening Annotations) Structured exclusion reasons feed PRISMA screening boxes
Phase 10 (Reconciliation Workflow) Agreement metrics and reconciliation status feed enhanced exports
All phases Every phase contributes data to the PRISMA diagram

For the platform architecture overview, see platform-architecture.md. For the PRISMA box-to-field mapping, see prisma-flow-diagram-mapping.md.


Phase 12 (dedup) cleans the data. Phase 13 (profiles) configures screening criteria. Phase 14 (filtering) routes studies to stages. Phase 15 (screening) adds structured decisions. Phase 16 (export/PRISMA) delivers the output.