Phase 16: Release 3 Migration, Data Export, and PRISMA¶
Release 3 -- The Capstone Phase 16 is the final phase of the entire roadmap. It migrates existing data, delivers enhanced data export, and produces the PRISMA 2020 flow diagram -- the deliverable that researchers and journals require.
Summary¶
This phase has three parts: (1) data migration to backfill study statuses and convert existing screening data to the new structured format, (2) enhanced data export including reconciliation status, agreement metrics, and screening decisions, and (3) automatic PRISMA 2020 flow diagram generation from the study data. The PRISMA diagram is the ultimate deliverable -- everything built across all three releases feeds into it.
Part 1: Data Migration¶
What Happens¶
- Study status backfill: All existing studies receive a lifecycle status. Currently, studies have no formal status tracking. This migration sets all existing studies to "Active" and introduces the nine-state lifecycle model that tracks a study's position from import through to inclusion
- Screening data conversion: Existing binary screening decisions (include/exclude) are migrated to the new structured format, with
screeningOutcomes[]entries created on each study document - Source type metadata: Systematic searches are tagged with their source type (Database, Register, Website, Organisation, Citation Searching, Other) where determinable. This metadata drives the dual-column PRISMA layout
- Stage settings migration: Legacy boolean configuration fields on stages are migrated to the new unified stage settings schema
Safety Guarantees¶
All changes are additive. No existing data is deleted. Rollback is possible by removing new fields with a simple database operation.
Part 2: Enhanced Data Export¶
What Changes¶
Existing CSV and Google Sheets exports are enriched with data from all three releases:
| New Export Data | What It Shows | Source |
|---|---|---|
| Reconciliation status | Whether each question per study has been reconciled, and how (auto-promoted, candidate agreement, or manual reconciliation) | Release 2 |
| Question version references | Which question version each annotation was collected against, and which question set version was active | Release 1 |
| Agreement metrics | Percent Agreement and Cohen's Kappa per question and per study | Release 2 |
| Screening decisions | Structured exclusion reasons with primary and sub-reasons | Release 3 |
| Deduplication reports | Import record counts, duplicate group membership, canonical enrichment provenance | Release 3 |
Standalone Exports¶
- Agreement metrics CSV: Per-question and per-study agreement statistics for publication
- Deduplication report: Complete audit trail of duplicate detection and resolution decisions
- Screening breakdown: Exclusion counts by reason, ready for PRISMA integration
Part 3: PRISMA 2020 Flow Diagram¶
The Deliverable¶
The PRISMA 2020 flow diagram is a standardised visual summary of the entire systematic review process. Journals require it. Funding bodies expect it. Producing it manually is tedious and error-prone.
SyRF auto-generates this diagram from the study data, showing exactly how many studies went through each stage of the review:
flowchart TD
subgraph Identification
B2["Records from databases/registers<br/>(n = ?)"]
B11["Records from other sources<br/>(n = ?)"]
B3["Duplicates removed (n = ?)<br/>Removed by automation (n = ?)<br/>Removed other (n = ?)"]
end
subgraph Screening
B4["Records screened<br/>(n = ?)"]
B5["Records excluded<br/>(n = ?)"]
B6["Reports sought<br/>(n = ?)"]
B7["Reports not retrieved<br/>(n = ?)"]
B8["Reports assessed<br/>(n = ?)"]
B9["Reports excluded with reasons<br/>Reason 1 (n = ?)<br/>Reason 2 (n = ?)<br/>Reason 3 (n = ?)"]
end
subgraph Included
B10["New studies included<br/>(n = ?)"]
B17["Studies in meta-analysis<br/>(n = ?)"]
end
B2 --> B3
B3 --> B4
B11 --> B4
B4 --> B5
B4 --> B6
B6 --> B7
B6 --> B8
B8 --> B9
B8 --> B10
B10 --> B17
style Identification fill:#e1f5fe
style Screening fill:#fff3e0
style Included fill:#e8f5e9
How the Numbers Are Computed¶
Every box in the PRISMA diagram is computed automatically from existing data:
| PRISMA Section | Data Source | Phases That Provide the Data |
|---|---|---|
| Records identified | Citation counts, grouped by source type | Phase 12 (deduplication creates Citations) |
| Duplicates removed | Study lifecycle status counts | Phase 12 (dedup sets Duplicate/Merged status) |
| Records screened / excluded | Screening outcomes per profile | Phase 13 (profiles), Phase 15 (screening annotations) |
| Reports sought / not retrieved | Full-text retrieval status | Existing functionality, enriched with lifecycle status |
| Reports excluded with reasons | Structured exclusion reasons | Phase 15 (screening annotations) |
| Studies included | Lifecycle status = Included | Phase 12 (lifecycle model) |
| Studies in meta-analysis | metaAnalysisIncluded flag |
Phase 16 (new field) |
Dual-Column Layout¶
The PRISMA 2020 diagram separates sources into two columns:
- Column 1 (Databases and Registers): PubMed, Embase, Scopus, ClinicalTrials.gov, CENTRAL, etc.
- Column 2 (Other Sources): Websites, organisations contacted, citation searching, other methods
The source type taxonomy defined in Phase 2 and populated in Phase 12 drives this column assignment automatically.
Export Formats¶
The PRISMA diagram is available as:
- Interactive web view: Rendered in the browser with clickable boxes that drill down to the underlying studies
- Structured data export: JSON/CSV with all 34 PRISMA fields for use in other tools
- Compatible with the PRISMA 2020 R package: Exported data can be fed into the community PRISMA2020 R package for publication-quality diagrams
Why This Is the Capstone¶
Everything in the roadmap feeds into the PRISMA diagram:
- Release 1 (Phases 3-7): Question versioning and annotation form provide the audit trail
- Release 2 (Phases 8-11): Reconciliation provides gold-standard answers and agreement metrics
- Release 3 (Phases 12-16): Deduplication provides accurate record counts, screening profiles provide structured decisions, and the three-level data model makes all counts derivable
The PRISMA 2020 flow diagram is the single most important deliverable of the entire platform evolution. It transforms SyRF from a data collection tool into a complete systematic review platform.
How It Connects¶
| Connection | Detail |
|---|---|
| Phase 2 (PRISMA Specification) | Implements the binding constraints defined in Phase 2 -- all 17 PRISMA boxes, 34 fields |
| Phase 12 (Deduplication) | Citation counts and dedup status feed PRISMA identification boxes |
| Phase 15 (Screening Annotations) | Structured exclusion reasons feed PRISMA screening boxes |
| Phase 10 (Reconciliation Workflow) | Agreement metrics and reconciliation status feed enhanced exports |
| All phases | Every phase contributes data to the PRISMA diagram |
For the platform architecture overview, see platform-architecture.md. For the PRISMA box-to-field mapping, see prisma-flow-diagram-mapping.md.
Phase 12 (dedup) cleans the data. Phase 13 (profiles) configures screening criteria. Phase 14 (filtering) routes studies to stages. Phase 15 (screening) adds structured decisions. Phase 16 (export/PRISMA) delivers the output.