AI Development Readiness Analysis — Releases 1-3¶

Analysis of specification gaps, tooling requirements, and strategies for reliably shipping professional-quality features using Claude Code.

Part 1: Specification Gap Analysis¶

Release 1 — Annotation Versioning & Form Rebuild (Phases 3-7)¶

CRITICAL Gaps (Block Implementation)¶

Phase	Category	Gap
3-7	Spec Organisation	No single source of truth — requirements scattered across `design-decisions.md`, `annotation-versioning-design.md`, phase READMEs, and `prisma-constraint-annotations.md`. Implementers must synthesise 5+ documents per phase.
4	API Contracts	Activation endpoint undefined — no request/response schemas for the DraftAQ → ActiveAQ transition.
5	API Contracts	Auto-save endpoint unspecified — no endpoint path, payload schema, debounce strategy, or conflict resolution on concurrent saves.
5	Error Handling	Auto-save failure recovery undefined — no retry strategy, no offline queue, no user notification pattern.
7 vs 16	Migration	PRISMA field backfill timing contradiction — Phase 7 says "forward-compatible fields added" but Phase 16 says "backfill lifecycle statuses". Unclear which phase actually writes the fields.
7	Migration	No failure/recovery procedures — partial migration failure (OOM, network timeout) has no documented restart or rollback path.
All	Rollback	No rollback procedures for any phase — feature flag toggle behaviour, data consistency guarantees, and rollback testing scenarios are all absent.

HIGH Gaps (Significant Risk)¶

Phase	Category	Gap
3	Testing	No test strategy for optimistic concurrency under contention.
4	Edge Cases	DraftAQ deletion rules undefined (can a draft be deleted after partial use?).
5	Performance	`<200ms` load target stated but no measurement methodology, no baseline, no CI regression detection.
5	Edge Cases	Cross-stage annotation sharing with QSV transitions ambiguous.
5	Security	"Candidate blinding enforced at API level" — no security matrix specifying per-endpoint access control.
6	Edge Cases	QSV transition with breaking changes (removing required questions) has undefined sub-cases.
6	UI	Diff visualisation format for version comparison unspecified.
7	Migration	Question backfill algorithm lacks complete field mapping for all AQ types.
¾	Security	Access control rules for Scope (System/Organisation/Researcher/Project) undefined — no RBAC matrix.

MEDIUM Gaps¶

No performance contention-reduction targets for Phase 3.
Angular 21 @angular/forms/signals is experimental — no contingency if API changes.
Version history endpoint response format unspecified (Phase 6).
No integration test plan matrix (which existing entities must each phase modify).

Release 2 — Reconciliation (Phases 8-11)¶

CRITICAL Gaps¶

Phase	Category	Gap
8	API Contracts	Group CRUD endpoints undefined — no OpenAPI spec for creating, assigning, or managing permission groups.
9	Trigger	Authority determination trigger undefined — what event triggers the check (import completion? annotation session completion? admin action?)? No endpoint to manually trigger or query pending determinations.
9-10	State Machine	Reconciliation state transitions not formally specified — when does a study enter the pool? What happens on partial completion? No endpoint to override auto-promotion (e.g., force single-annotator study into reconciliation).
10	Algorithm	Random assignment algorithm unspecified — no fairness guarantees, no load-balancing, no handling of reconciler unavailability or assignment timeout/expiration.
10	Edge Cases	Bulk approve threshold undefined — "all candidates agree" needs precise equality: floating-point tolerance? case sensitivity? set order for multi-select? datetime precision?
10	Concurrency	Race condition resolution undefined — two reconcilers submit simultaneously; no conflict resolution, retry, or notification strategy. Data corruption risk.
11	Migration	Permission mapping incomplete — existing admin/member roles map to new groups, but edge cases (users with both roles, orphaned memberships) not addressed.
11	Migration	Reconciliation backfill ambiguity — RSV for single-annotator studies: how is `committedBy` set? What if MinAnnotators changed since session completion? What if study edited after last session?

HIGH Gaps¶

Phase	Category	Gap
8	Security	Permission enforcement at API level — no middleware/attribute specification.
9	Metrics	Cohen's Kappa calculation not specified for non-binary annotation types.
9	Edge Cases	MinAnnotators changed mid-review — what happens to studies with 0 completed sessions?
10	UI	Anonymised side-by-side comparison layout unspecified.
10	Edge Cases	Reconciler assigned study but goes inactive — no timeout/reassignment specified.
10	Cross-Stage	Question in Stage A and Stage B — if Stage A not yet reconciled when Stage B starts, which answer prevails?
10	Terminology	"Consensus" vs "agreement" undefined for 3+ annotators (majority ≠ unanimity). Optional questions: if some skip, is that agreement?
11	Migration	Rollback procedures absent; partial migration recovery undefined.

Release 3 — Screening & PRISMA (Phases 12-16)¶

CRITICAL Gaps¶

Phase	Category	Gap
12	External Dependency	ASySD (R-based dedup engine) integration unspecified — no API contract, error handling for R subprocess failures, timeout/memory limits, or fallback. No test data sets or expected outputs defined.
12	Database	Publication collection schema incomplete — index definitions stated but field types, constraints, and CSUUID handling not fully specified.
12	Error Handling	No recovery strategy if dedup subprocess fails — orphaned `PendingDuplicateReview` studies could accumulate. No rollback if bulk merge is wrong (unmerge not specified).
12	Edge Cases	Dedup reversal undefined — if Duplicate → Active reversal happens, do annotation sessions reactivate? Does study re-enter screening pools?
13	State Machine	Agreement mode logic undefined — 3 modes (single, dual-manual, dual-automated) have no formal state transition diagrams.
13	Immutability	"Immutable once used" definition ambiguous — does "used" mean: one screener made one decision (strictest)? profile assigned to stage (loose)? study entered reconciliation (middle)?
13-15	API Contracts	10+ endpoints undefined — screening profile CRUD, stage filtering, exclusion reason taxonomy, screening reconciliation, filter rule dry-run, PRISMA generation all lack request/response schemas.
16	PRISMA	Concrete validation queries missing — all 34 PRISMA fields need testable derivation queries; none are provided.
16	Migration	screeningOutcomes[] migration source unclear — Phase 13, 14, 15 introduce components in sequence. Which phase migrates existing screening decisions? Which profile do old decisions belong to?
16	Migration	Phase 16 migration scripts not written — pre/post validation queries absent. No specification for handling null sourceType in PRISMA counts.

HIGH Gaps¶

Phase	Category	Gap
12	Performance	Dedup performance at scale (100K+ studies) — no benchmarks or targets. Async vs sync unclear (must complete before 200 OK?).
12	Security	Cross-project Publication enrichment — Publications accumulate metadata from all projects. Confidential project metadata could leak to other projects. No access control specified.
12	Lifecycle	StudyLifecycleStatus vs ScreeningOutcome relationship unclear — who sets `Included`? Automatic when all profiles return Included? What if excluded in one profile, included in another?
12 vs 10	Contradiction	Dedup Scenario 3 conflicts with reconciliation — merged study's annotation sessions become candidates on canonical study, but reconciliation spec says random blinded assignment. Interaction undefined.
13	Immutability	Screening profile immutability enforcement mechanism unspecified. How does admin correct errors on immutable profiles?
13	Edge Cases	Profile clone semantics undefined — does clone inherit relationships? Are decisions tracked separately?
14	Filtering	Filter rule engine specification incomplete — operators, field types, combinators, AND/OR logic undefined. No dry-run endpoint to test rules.
14	Edge Cases	Filter rules referencing deleted Screening Profile — undefined behaviour.
15	Edge Cases	Screening exclusion across profiles — what if Study excluded in Profile A is included in Profile B? What is its lifecycleStatus?
15	Structure	Structured exclusion reasons — hierarchical sub-reasons mandatory vs optional not specified. How are reasons versioned if profile edited after decisions recorded?
15	Ordering	"Screening must complete before annotation" — enforced by API (400 error) or advisory?
16	Export	PRISMA PDF/SVG generation technology choice not made. Incremental update strategy undefined.
16	Performance	No performance target for PRISMA generation at scale (10K, 100K studies).
All	Testing	No coverage expectations, no performance benchmarks, no E2E test strategy.

Contradictions & Ambiguities (All Releases)¶

Entity naming inconsistency: FullTextStatus vs fullTextStatus, ScreeningOutcome vs ScreeningDecision.
Citation immutability vs sourceType correction: Citations are immutable (Phase 12), but Phase 16 backfill may need to update sourceType — violating immutability.
"Active" study definition overloaded: Phase 12 says lifecycleStatus = Active is default; Phase 14 says "Only Active in pools". But PendingDuplicateReview and FullTextSought are excluded — so "Active" operationally means only the literal enum value, not a category.
"Authority" terminology overloaded: Used for (1) auto-promote vs reconciliation (Phase 9), (2) ScreeningAuthority enum (Phase 12/13), (3) reconciler role (Phase 10).
Deferred decisions with no timeline: Living search dedup (Phase 12), updated reviews (post-R3), threshold tuning (future).
DedupAuditLog mentioned in constraints but never formally defined.
Cross-reference inconsistencies between spec documents; some reference ../../../.planning/ paths that may not exist.
Release boundary unclear: StudyLifecycleStatus is Release 3 (Phase 12) but used by Release 2 reconciliation.

Gap Count Summary¶

Release	Critical	High	Medium	Total
R1 (Phases 3-7)	7	9	4	20
R2 (Phases 8-11)	8	8	0	16
R3 (Phases 12-16)	10	14	0	24
Cross-Release	0	0	8	8
Total	25	31	12	68

Part 2: Current Test Infrastructure Assessment¶

What Exists¶

Layer	Framework	Coverage	Status
.NET Unit Tests	xUnit 2.6.6 + Moq	8 test projects, 66 test files	Running in CI
Angular Unit Tests	Vitest 4.0.14	258 spec files	~30% disabled (broken standalone migration)
MongoDB Integration	TestContainers (MongoDB 7.0)	MongoDbTestFixture with CSUUID support	Available but excluded from CI
Test Builders	Fluent builders	ProjectBuilder, StudyBuilder, InvestigatorBuilder	Good foundation
Coverage Thresholds	Angular: 50% stmt, 40% branch	.NET: collected but not enforced	Partial
E2E Tests	Minimal	1 sidenav test	Effectively none
API Contract Tests	None	—	Missing entirely

Critical Gaps for AI-Driven Development¶

~30% of Angular tests disabled — broken standalone component migration in project-admin/, shared/annotation/, pdf-tools/, stage/ folders.
No .NET coverage thresholds — AI can silently reduce coverage.
No E2E tests — no safety net for cross-service integration.
No API contract tests — schema drift between frontend and backend goes undetected.
Integration tests excluded from CI — only run manually.

Part 3: Essential Tools & Services¶

Tier 1 — Must Have (Before Starting Phase 3)¶

MCP Servers¶

Server	Purpose	Why Essential
MongoDB MCP Server	Query collections, inspect schemas, test migrations	Direct DB interaction during development; validates CSUUID queries. Use `readonly` flag for production.
SonarQube MCP Server	Real-time quality gate feedback	Autonomous "PR-to-green" workflow — Claude checks quality, fixes issues, re-scans. 25 tools including `get_project_quality_gate_status`.
GitHub MCP Server	PR management, issue tracking, CI status	Already partially configured; essential for PR workflow automation
Context7	Up-to-date, version-specific documentation	Eliminates hallucinated APIs for Angular 21, .NET 10, MongoDB driver. Fetches real docs instead of training data.

Claude Code Configuration¶

Item	Purpose
CLAUDE.md restructure	Current file is ~600 lines. Split into root (50-100 lines) + `@import` files in `.claude/rules/`. Target <200 lines per file.
PreToolUse hooks	Block `kubectl apply`, `helm install` (GitOps enforcement). Block edits to linter/formatter configs (known AI anti-pattern: Claude modifies configs to pass instead of fixing code).
PostToolUse hooks	Auto-run `dotnet format` on .cs files and `prettier` on .ts files after every edit. Auto-run affected test file to catch regressions immediately.
Stop hook auto-reviewer	Spawn a subagent with critical reviewer persona when Claude finishes. Reviews all modified files, returns errors to force corrections before human review. See O'Reilly pattern.
Phase-specific CLAUDE.md	Per-phase context files loaded via `@import` — entity schemas, API contracts, acceptance criteria.
PreToolUse config protection	Block edits to `.eslintrc`, `.prettierrc`, `Directory.Build.props`, `vitest.config.ts` — prevents AI from weakening quality gates.

Quality Gates in CI¶

Gate	Tool	Current	Needed
.NET coverage threshold	coverlet	Collected, not enforced	Enforce 70% on new code
Angular test restoration	Vitest	30% disabled	Fix or delete broken tests before Phase 3
API contract tests	Pact or similar	None	Add for all API ↔ PM service boundaries
.NET analysers	Roslyn / SonarQube	Basic	Enable nullable reference types, security analysers

Tier 2 — Strongly Recommended¶

Tool/Service	Purpose
Playwright MCP Server	E2E browser testing for Angular UI. Claude can write and run Playwright tests against the dev server.
SonarCloud (or SonarQube Cloud)	Continuous code quality analysis on every PR. Combined with MCP server, enables autonomous quality remediation.
OpenAPI/Swagger spec generation	Auto-generate API contracts from .NET controllers. Provides machine-readable specs Claude can implement against.
Architecture Decision Records (ADRs)	Already in use. Add ADR for each resolved specification gap to maintain decision audit trail.
Sentry MCP Server	Already configured. Use for production error context during development.
Claude Code GitHub Action	`@claude` mentions in PRs for AI-powered review. Install via `/install-github-app`.
Claude Code Security Review Action	OWASP-aligned security analysis on PR diffs with severity ratings.

Tier 3 — Nice to Have¶

Tool/Service	Purpose
Docker MCP Server	Manage dev containers, spin up local MongoDB/RabbitMQ for integration testing.
Apidog MCP Server	Connect AI to API specifications for contract-first development.
Custom prompt hooks	Semantic code review on every `Write`/`Edit` — evaluate against phase spec constraints.

Part 4: Workflow Strategy for AI-Driven Feature Development¶

1. Specification Preparation (Before Each Phase)¶

Each phase needs a single, self-contained technical specification containing:

Phase N Specification
├── Entity Schemas (with field types, indexes, constraints)
├── API Contracts (OpenAPI-style: endpoints, request/response, error codes)
├── State Transitions (formal diagrams for any state machines)
├── Acceptance Criteria (testable, with concrete scenarios)
├── Error Handling (every failure mode with recovery strategy)
├── Migration Plan (with rollback, validation queries, failure recovery)
├── Security Matrix (per-endpoint access control)
├── Performance Targets (measurable, with baseline methodology)
└── Test Strategy (unit, integration, E2E scenarios)

Why: Claude Code works best with unambiguous, consolidated specifications. Scattered requirements across multiple documents cause hallucination and inconsistency. This aligns with the emerging Spec-Driven Development (SDD) methodology: review at phase gates, not during implementation.

Phase spec prompt structure (use XML tags for Claude's contract-style processing):

<task>Implement the Publication entity and repository</task>
<constraints>
- Must use CSUUID (GuidRepresentation.CSharpLegacy)
- Collection name: pmPublication (follows pm prefix convention)
- DOI and PMID fields must have unique sparse indexes
</constraints>
<acceptance-criteria>
- All tests in ReferenceRepositoryTests pass
- dotnet build succeeds with zero warnings
- dotnet format --verify-no-changes passes
</acceptance-criteria>

2. Implementation Workflow Per Phase¶

┌─────────────────────────────────────────────┐
│  1. RESEARCH (SubAgent, read-only)          │
│     - Read phase spec + existing code       │
│     - Identify all files to modify          │
│     - Map dependencies                      │
│  ↓                                          │
│  2. PLAN (Plan Mode)                        │
│     - Claude proposes implementation plan   │
│     - Human reviews & approves              │
│  ↓  /clear                                  │
│  3. IMPLEMENT (Fresh session per component) │
│     - Write failing tests first (TDD)       │
│     - Implement against tests               │
│     - PostToolUse hooks run linter + tests   │
│     - SonarQube MCP checks quality          │
│  ↓                                          │
│  4. VERIFY                                  │
│     - Full test suite                       │
│     - SonarQube quality gate                │
│     - Integration tests                     │
│     - Manual review of security-critical    │
│  ↓                                          │
│  5. PR + REVIEW                             │
│     - PR created with /commit-push-pr       │
│     - CI runs full pipeline                 │
│     - SonarCloud PR decoration              │
│     - Human reviews diff                    │
└─────────────────────────────────────────────┘

3. Context Management Rules¶

Rule	Rationale
One phase per session	Prevents context bleed between phases.
`/clear` between components	Each component (entity, API, UI) gets fresh context.
Delegate exploration to SubAgents	Keep main context clean for implementation.
Manual `/compact` at 50%	Avoid the "agent dumb zone" where quality degrades.
Phase spec loaded via `@import`	Always available without consuming conversation context.

4. What to Delegate vs. Supervise¶

Fully Delegate to AI	Supervise Closely
Test generation	Authentication/authorisation logic
Entity/model classes	Data migration scripts
Boilerplate CRUD endpoints	Optimistic concurrency implementation
UI components (non-security)	CSUUID/MongoDB serialisation
Helm chart updates	Feature flag toggle logic
Documentation updates	Cross-service message contracts
Formatting, linting fixes	Rollback procedures

Part 5: Key Risks & Mitigations¶

Risk 1: Specification Ambiguity → AI Hallucination¶

Risk: Claude fills gaps in specs with plausible but incorrect implementations. Mitigation: Consolidate each phase into a single spec document with zero ambiguity. Use TDD so tests define behaviour, not prose.

Risk 2: Angular Test Debt → Silent Regressions¶

Risk: 30% of Angular tests disabled means AI changes to project-admin/, annotation/, stage/ components have no safety net. Mitigation: Fix or delete broken tests before starting Phase 5 (Annotation Form V2). This is a prerequisite.

Risk 3: No API Contract Tests → Schema Drift¶

Risk: Frontend and backend evolve independently; schema mismatches only found in production. Mitigation: Add Pact or similar contract testing for API ↔ PM service boundary. Generate OpenAPI specs from controllers.

Risk 4: CSUUID Complexity → Data Corruption¶

Risk: MongoDB CSUUID (BinData subtype 3) is a non-obvious serialisation format. AI-generated queries or migrations could use wrong GUID format. Mitigation: Document CSUUID patterns in phase-specific CLAUDE.md. Add test helpers that validate GUID format. Use TestContainers for all DB-touching code.

Risk 5: Context Window Limits → Incomplete Implementation¶

Risk: Complex phases (especially Phase 12 - Dedup) may exceed single-session capacity. Mitigation: Break phases into sub-tasks of ≤500 lines of change each. Use worktree isolation for parallel sub-tasks. Maintain a phase-level checklist in the todo system.

Risk 6: Context Degradation → Quality Collapse¶

Risk: Performance degrades as context fills. Long debugging sessions, accumulated exploration, and conversation history degrade output quality. 66% of AI-generated code has subtle issues ("almost right" problem). 41% of AI-generated code is revised within 2 weeks. Mitigation: /clear between tasks. Manual /compact at 50% (not 80%). Use subagents for exploration. Write handoff files for multi-session work. Never trust claims — require test execution as proof.

Risk 7: Config Modification Anti-Pattern¶

Risk: Claude modifies linter/formatter configs to make violations pass instead of fixing the code. Mitigation: PreToolUse hook blocking edits to .eslintrc, .prettierrc, Directory.Build.props, vitest.config.ts, angular.json.

Risk 8: Migration Safety → Production Data Loss¶

Risk: Phase 7, 11, 16 migrations run against production database (syrftest). Staging shares the same DB. Mitigation: Implement MongoDB Testing Strategy (database isolation) BEFORE Release 1 deployment. Add pre-migration snapshots. Test migrations against anonymised production copies.

Part 6: Recommended Action Plan¶

Immediate (Before Phase 3 Development)¶

Restructure CLAUDE.md — Split into root + @import files, target <200 lines each.
Install MCP servers — MongoDB, SonarQube, GitHub (if not already).
Configure Claude Code hooks:
PostToolUse on Edit/Write: run affected test file.
PreToolUse on Bash: block kubectl apply, helm install.
Fix Angular test debt — Restore or delete the 30% disabled tests.
Enforce .NET coverage thresholds — Minimum 70% on new code in CI.
Consolidate Phase 3-4 specs — Single document per phase with all sections above.

Short-Term (During Release 1)¶

Add API contract tests — Pact or similar for API ↔ PM service boundary.
Generate OpenAPI specs — From .NET controllers, use as implementation contracts.
Implement database isolation — syrf_staging and syrf_pr_N databases (MongoDB Testing Strategy).
Create phase-specific @import files — Entity schemas, API contracts, acceptance criteria.

Medium-Term (Before Release 2)¶

Add Playwright E2E tests — Critical user journeys (login, create project, annotate, export).
Set up SonarCloud PR decoration — Quality gate on every PR.
Consolidate Phase 8-11 specs — Resolve all gaps identified above.
Define reconciliation algorithm formally — State machine diagrams, fairness proofs.

Long-Term (Before Release 3)¶

ASySD integration contract — Formal API spec for R-based dedup engine.
PRISMA validation test suite — All 34 fields with derivation queries.
Performance benchmarking framework — For dedup at scale (100K+ studies).
Consolidate Phase 12-16 specs — Resolve all gaps identified above.

Summary¶

The phase specifications are strong on business logic and domain modelling but weak on:

Technical implementation details (API contracts, schemas, state machines)
Error handling and recovery (auto-save failures, migration rollback, dedup subprocess crashes)
Testing and verification (no E2E strategy, disabled tests, no contract tests)
Security specifications (access control matrices, blinding enforcement)

The current test infrastructure provides a medium-high safety level for backend changes but is insufficient for confident Angular development due to disabled tests.

With the recommended tooling (MCP servers, hooks, SonarQube, contract tests) and workflow (TDD, fresh sessions, specification consolidation), Claude Code can reliably implement these features — but the specifications must be completed first. AI amplifies specification quality: good specs → excellent code; ambiguous specs → plausible but wrong code.

AI Development Readiness Analysis — Releases 1-3¶

Part 1: Specification Gap Analysis¶

Release 1 — Annotation Versioning & Form Rebuild (Phases 3-7)¶

CRITICAL Gaps (Block Implementation)¶

HIGH Gaps (Significant Risk)¶

MEDIUM Gaps¶

Release 2 — Reconciliation (Phases 8-11)¶

CRITICAL Gaps¶

HIGH Gaps¶

Release 3 — Screening & PRISMA (Phases 12-16)¶

CRITICAL Gaps¶

HIGH Gaps¶

Contradictions & Ambiguities (All Releases)¶

Gap Count Summary¶

Part 2: Current Test Infrastructure Assessment¶

What Exists¶

Critical Gaps for AI-Driven Development¶

Part 3: Essential Tools & Services¶

Tier 1 — Must Have (Before Starting Phase 3)¶

MCP Servers¶

Claude Code Configuration¶

Quality Gates in CI¶

Tier 2 — Strongly Recommended¶

Tier 3 — Nice to Have¶

Part 4: Workflow Strategy for AI-Driven Feature Development¶

1. Specification Preparation (Before Each Phase)¶

2. Implementation Workflow Per Phase¶

3. Context Management Rules¶

4. What to Delegate vs. Supervise¶

Part 5: Key Risks & Mitigations¶

Risk 1: Specification Ambiguity → AI Hallucination¶

Risk 2: Angular Test Debt → Silent Regressions¶

Risk 3: No API Contract Tests → Schema Drift¶

Risk 4: CSUUID Complexity → Data Corruption¶

Risk 5: Context Window Limits → Incomplete Implementation¶

Risk 6: Context Degradation → Quality Collapse¶

Risk 7: Config Modification Anti-Pattern¶

Risk 8: Migration Safety → Production Data Loss¶

Part 6: Recommended Action Plan¶

Immediate (Before Phase 3 Development)¶

Short-Term (During Release 1)¶

Medium-Term (Before Release 2)¶

Long-Term (Before Release 3)¶

Summary¶

Sources¶