Phase 16: Release 3 Migration, Data Export, and PRISMA¶

Release 3 -- The Capstone Phase 16 is the final phase of the entire roadmap. It migrates existing data, delivers enhanced data export, and produces the PRISMA 2020 flow diagram -- the deliverable that researchers and journals require.

Summary¶

This phase has three parts: (1) data migration to backfill study statuses and convert existing screening data to the new structured format, (2) enhanced data export including reconciliation status, agreement metrics, and screening decisions, and (3) automatic PRISMA 2020 flow diagram generation from the study data. The PRISMA diagram is the ultimate deliverable -- everything built across all three releases feeds into it.

Part 1: Data Migration¶

What Happens¶

Study status backfill: All existing studies receive a lifecycle status. Currently, studies have no formal status tracking. This migration sets all existing studies to "Active" and introduces the nine-state lifecycle model that tracks a study's position from import through to inclusion
Screening data conversion: Existing binary screening decisions (include/exclude) are migrated to the new structured format, with screeningOutcomes[] entries created on each study document
Source type metadata: Systematic searches are tagged with their source type (Database, Register, Website, Organisation, Citation Searching, Other) where determinable. This metadata drives the dual-column PRISMA layout
Stage settings migration: Legacy boolean configuration fields on stages are migrated to the new unified stage settings schema

Safety Guarantees¶

All changes are additive. No existing data is deleted. Rollback is possible by removing new fields with a simple database operation.

Part 2: Enhanced Data Export¶

What Changes¶

Existing CSV and Google Sheets exports are enriched with data from all three releases:

New Export Data	What It Shows	Source
Reconciliation status	Whether each question per study has been reconciled, and how (auto-promoted, candidate agreement, or manual reconciliation)	Release 2
Question version references	Which question version each annotation was collected against, and which question set version was active	Release 1
Agreement metrics	Percent Agreement and Cohen's Kappa per question and per study	Release 2
Screening decisions	Structured exclusion reasons with primary and sub-reasons	Release 3
Deduplication reports	Import record counts, duplicate group membership, canonical enrichment provenance	Release 3

Standalone Exports¶

Agreement metrics CSV: Per-question and per-study agreement statistics for publication
Deduplication report: Complete audit trail of duplicate detection and resolution decisions
Screening breakdown: Exclusion counts by reason, ready for PRISMA integration

Part 3: PRISMA 2020 Flow Diagram¶

The Deliverable¶

The PRISMA 2020 flow diagram is a standardised visual summary of the entire systematic review process. Journals require it. Funding bodies expect it. Producing it manually is tedious and error-prone.

SyRF auto-generates this diagram from the study data, showing exactly how many studies went through each stage of the review:

flowchart TD
    subgraph Identification
        B2["Records from databases/registers<br/>(n = ?)"]
        B11["Records from other sources<br/>(n = ?)"]
        B3["Duplicates removed (n = ?)<br/>Removed by automation (n = ?)<br/>Removed other (n = ?)"]
    end

    subgraph Screening
        B4["Records screened<br/>(n = ?)"]
        B5["Records excluded<br/>(n = ?)"]
        B6["Reports sought<br/>(n = ?)"]
        B7["Reports not retrieved<br/>(n = ?)"]
        B8["Reports assessed<br/>(n = ?)"]
        B9["Reports excluded with reasons<br/>Reason 1 (n = ?)<br/>Reason 2 (n = ?)<br/>Reason 3 (n = ?)"]
    end

    subgraph Included
        B10["New studies included<br/>(n = ?)"]
        B17["Studies in meta-analysis<br/>(n = ?)"]
    end

    B2 --> B3
    B3 --> B4
    B11 --> B4
    B4 --> B5
    B4 --> B6
    B6 --> B7
    B6 --> B8
    B8 --> B9
    B8 --> B10
    B10 --> B17

    style Identification fill:#e1f5fe
    style Screening fill:#fff3e0
    style Included fill:#e8f5e9

How the Numbers Are Computed¶

Every box in the PRISMA diagram is computed automatically from existing data:

PRISMA Section	Data Source	Phases That Provide the Data
Records identified	Citation counts, grouped by source type	Phase 12 (deduplication creates Citations)
Duplicates removed	Study lifecycle status counts	Phase 12 (dedup sets Duplicate/Merged status)
Records screened / excluded	Screening outcomes per profile	Phase 13 (profiles), Phase 15 (screening annotations)
Reports sought / not retrieved	Full-text retrieval status	Existing functionality, enriched with lifecycle status
Reports excluded with reasons	Structured exclusion reasons	Phase 15 (screening annotations)
Studies included	Lifecycle status = Included	Phase 12 (lifecycle model)
Studies in meta-analysis	`metaAnalysisIncluded` flag	Phase 16 (new field)

Dual-Column Layout¶

The PRISMA 2020 diagram separates sources into two columns:

Column 1 (Databases and Registers): PubMed, Embase, Scopus, ClinicalTrials.gov, CENTRAL, etc.
Column 2 (Other Sources): Websites, organisations contacted, citation searching, other methods

The source type taxonomy defined in Phase 2 and populated in Phase 12 drives this column assignment automatically.

Export Formats¶

The PRISMA diagram is available as:

Interactive web view: Rendered in the browser with clickable boxes that drill down to the underlying studies
Structured data export: JSON/CSV with all 34 PRISMA fields for use in other tools
Compatible with the PRISMA 2020 R package: Exported data can be fed into the community PRISMA2020 R package for publication-quality diagrams

Why This Is the Capstone¶

Everything in the roadmap feeds into the PRISMA diagram:

Release 1 (Phases 3-7): Question versioning and annotation form provide the audit trail
Release 2 (Phases 8-11): Reconciliation provides gold-standard answers and agreement metrics
Release 3 (Phases 12-16): Deduplication provides accurate record counts, screening profiles provide structured decisions, and the three-level data model makes all counts derivable

The PRISMA 2020 flow diagram is the single most important deliverable of the entire platform evolution. It transforms SyRF from a data collection tool into a complete systematic review platform.

How It Connects¶

Connection	Detail
Phase 2 (PRISMA Specification)	Implements the binding constraints defined in Phase 2 -- all 17 PRISMA boxes, 34 fields
Phase 12 (Deduplication)	Citation counts and dedup status feed PRISMA identification boxes
Phase 15 (Screening Annotations)	Structured exclusion reasons feed PRISMA screening boxes
Phase 10 (Reconciliation Workflow)	Agreement metrics and reconciliation status feed enhanced exports
All phases	Every phase contributes data to the PRISMA diagram

For the platform architecture overview, see platform-architecture.md. For the PRISMA box-to-field mapping, see prisma-flow-diagram-mapping.md.

Phase 12 (dedup) cleans the data. Phase 13 (profiles) configures screening criteria. Phase 14 (filtering) routes studies to stages. Phase 15 (screening) adds structured decisions. Phase 16 (export/PRISMA) delivers the output.