AI Development Readiness Analysis — Releases 1-3¶
Analysis of specification gaps, tooling requirements, and strategies for reliably shipping professional-quality features using Claude Code.
Part 1: Specification Gap Analysis¶
Release 1 — Annotation Versioning & Form Rebuild (Phases 3-7)¶
CRITICAL Gaps (Block Implementation)¶
| Phase | Category | Gap |
|---|---|---|
| 3-7 | Spec Organisation | No single source of truth — requirements scattered across design-decisions.md, annotation-versioning-design.md, phase READMEs, and prisma-constraint-annotations.md. Implementers must synthesise 5+ documents per phase. |
| 4 | API Contracts | Activation endpoint undefined — no request/response schemas for the DraftAQ → ActiveAQ transition. |
| 5 | API Contracts | Auto-save endpoint unspecified — no endpoint path, payload schema, debounce strategy, or conflict resolution on concurrent saves. |
| 5 | Error Handling | Auto-save failure recovery undefined — no retry strategy, no offline queue, no user notification pattern. |
| 7 vs 16 | Migration | PRISMA field backfill timing contradiction — Phase 7 says "forward-compatible fields added" but Phase 16 says "backfill lifecycle statuses". Unclear which phase actually writes the fields. |
| 7 | Migration | No failure/recovery procedures — partial migration failure (OOM, network timeout) has no documented restart or rollback path. |
| All | Rollback | No rollback procedures for any phase — feature flag toggle behaviour, data consistency guarantees, and rollback testing scenarios are all absent. |
HIGH Gaps (Significant Risk)¶
| Phase | Category | Gap |
|---|---|---|
| 3 | Testing | No test strategy for optimistic concurrency under contention. |
| 4 | Edge Cases | DraftAQ deletion rules undefined (can a draft be deleted after partial use?). |
| 5 | Performance | <200ms load target stated but no measurement methodology, no baseline, no CI regression detection. |
| 5 | Edge Cases | Cross-stage annotation sharing with QSV transitions ambiguous. |
| 5 | Security | "Candidate blinding enforced at API level" — no security matrix specifying per-endpoint access control. |
| 6 | Edge Cases | QSV transition with breaking changes (removing required questions) has undefined sub-cases. |
| 6 | UI | Diff visualisation format for version comparison unspecified. |
| 7 | Migration | Question backfill algorithm lacks complete field mapping for all AQ types. |
| ¾ | Security | Access control rules for Scope (System/Organisation/Researcher/Project) undefined — no RBAC matrix. |
MEDIUM Gaps¶
- No performance contention-reduction targets for Phase 3.
- Angular 21
@angular/forms/signalsis experimental — no contingency if API changes. - Version history endpoint response format unspecified (Phase 6).
- No integration test plan matrix (which existing entities must each phase modify).
Release 2 — Reconciliation (Phases 8-11)¶
CRITICAL Gaps¶
| Phase | Category | Gap |
|---|---|---|
| 8 | API Contracts | Group CRUD endpoints undefined — no OpenAPI spec for creating, assigning, or managing permission groups. |
| 9 | Trigger | Authority determination trigger undefined — what event triggers the check (import completion? annotation session completion? admin action?)? No endpoint to manually trigger or query pending determinations. |
| 9-10 | State Machine | Reconciliation state transitions not formally specified — when does a study enter the pool? What happens on partial completion? No endpoint to override auto-promotion (e.g., force single-annotator study into reconciliation). |
| 10 | Algorithm | Random assignment algorithm unspecified — no fairness guarantees, no load-balancing, no handling of reconciler unavailability or assignment timeout/expiration. |
| 10 | Edge Cases | Bulk approve threshold undefined — "all candidates agree" needs precise equality: floating-point tolerance? case sensitivity? set order for multi-select? datetime precision? |
| 10 | Concurrency | Race condition resolution undefined — two reconcilers submit simultaneously; no conflict resolution, retry, or notification strategy. Data corruption risk. |
| 11 | Migration | Permission mapping incomplete — existing admin/member roles map to new groups, but edge cases (users with both roles, orphaned memberships) not addressed. |
| 11 | Migration | Reconciliation backfill ambiguity — RSV for single-annotator studies: how is committedBy set? What if MinAnnotators changed since session completion? What if study edited after last session? |
HIGH Gaps¶
| Phase | Category | Gap |
|---|---|---|
| 8 | Security | Permission enforcement at API level — no middleware/attribute specification. |
| 9 | Metrics | Cohen's Kappa calculation not specified for non-binary annotation types. |
| 9 | Edge Cases | MinAnnotators changed mid-review — what happens to studies with 0 completed sessions? |
| 10 | UI | Anonymised side-by-side comparison layout unspecified. |
| 10 | Edge Cases | Reconciler assigned study but goes inactive — no timeout/reassignment specified. |
| 10 | Cross-Stage | Question in Stage A and Stage B — if Stage A not yet reconciled when Stage B starts, which answer prevails? |
| 10 | Terminology | "Consensus" vs "agreement" undefined for 3+ annotators (majority ≠ unanimity). Optional questions: if some skip, is that agreement? |
| 11 | Migration | Rollback procedures absent; partial migration recovery undefined. |
Release 3 — Screening & PRISMA (Phases 12-16)¶
CRITICAL Gaps¶
| Phase | Category | Gap |
|---|---|---|
| 12 | External Dependency | ASySD (R-based dedup engine) integration unspecified — no API contract, error handling for R subprocess failures, timeout/memory limits, or fallback. No test data sets or expected outputs defined. |
| 12 | Database | Publication collection schema incomplete — index definitions stated but field types, constraints, and CSUUID handling not fully specified. |
| 12 | Error Handling | No recovery strategy if dedup subprocess fails — orphaned PendingDuplicateReview studies could accumulate. No rollback if bulk merge is wrong (unmerge not specified). |
| 12 | Edge Cases | Dedup reversal undefined — if Duplicate → Active reversal happens, do annotation sessions reactivate? Does study re-enter screening pools? |
| 13 | State Machine | Agreement mode logic undefined — 3 modes (single, dual-manual, dual-automated) have no formal state transition diagrams. |
| 13 | Immutability | "Immutable once used" definition ambiguous — does "used" mean: one screener made one decision (strictest)? profile assigned to stage (loose)? study entered reconciliation (middle)? |
| 13-15 | API Contracts | 10+ endpoints undefined — screening profile CRUD, stage filtering, exclusion reason taxonomy, screening reconciliation, filter rule dry-run, PRISMA generation all lack request/response schemas. |
| 16 | PRISMA | Concrete validation queries missing — all 34 PRISMA fields need testable derivation queries; none are provided. |
| 16 | Migration | screeningOutcomes[] migration source unclear — Phase 13, 14, 15 introduce components in sequence. Which phase migrates existing screening decisions? Which profile do old decisions belong to? |
| 16 | Migration | Phase 16 migration scripts not written — pre/post validation queries absent. No specification for handling null sourceType in PRISMA counts. |
HIGH Gaps¶
| Phase | Category | Gap |
|---|---|---|
| 12 | Performance | Dedup performance at scale (100K+ studies) — no benchmarks or targets. Async vs sync unclear (must complete before 200 OK?). |
| 12 | Security | Cross-project Publication enrichment — Publications accumulate metadata from all projects. Confidential project metadata could leak to other projects. No access control specified. |
| 12 | Lifecycle | StudyLifecycleStatus vs ScreeningOutcome relationship unclear — who sets Included? Automatic when all profiles return Included? What if excluded in one profile, included in another? |
| 12 vs 10 | Contradiction | Dedup Scenario 3 conflicts with reconciliation — merged study's annotation sessions become candidates on canonical study, but reconciliation spec says random blinded assignment. Interaction undefined. |
| 13 | Immutability | Screening profile immutability enforcement mechanism unspecified. How does admin correct errors on immutable profiles? |
| 13 | Edge Cases | Profile clone semantics undefined — does clone inherit relationships? Are decisions tracked separately? |
| 14 | Filtering | Filter rule engine specification incomplete — operators, field types, combinators, AND/OR logic undefined. No dry-run endpoint to test rules. |
| 14 | Edge Cases | Filter rules referencing deleted Screening Profile — undefined behaviour. |
| 15 | Edge Cases | Screening exclusion across profiles — what if Study excluded in Profile A is included in Profile B? What is its lifecycleStatus? |
| 15 | Structure | Structured exclusion reasons — hierarchical sub-reasons mandatory vs optional not specified. How are reasons versioned if profile edited after decisions recorded? |
| 15 | Ordering | "Screening must complete before annotation" — enforced by API (400 error) or advisory? |
| 16 | Export | PRISMA PDF/SVG generation technology choice not made. Incremental update strategy undefined. |
| 16 | Performance | No performance target for PRISMA generation at scale (10K, 100K studies). |
| All | Testing | No coverage expectations, no performance benchmarks, no E2E test strategy. |
Contradictions & Ambiguities (All Releases)¶
- Entity naming inconsistency:
FullTextStatusvsfullTextStatus,ScreeningOutcomevsScreeningDecision. - Citation immutability vs sourceType correction: Citations are immutable (Phase 12), but Phase 16 backfill may need to update sourceType — violating immutability.
- "Active" study definition overloaded: Phase 12 says
lifecycleStatus = Activeis default; Phase 14 says "Only Active in pools". ButPendingDuplicateReviewandFullTextSoughtare excluded — so "Active" operationally means only the literal enum value, not a category. - "Authority" terminology overloaded: Used for (1) auto-promote vs reconciliation (Phase 9), (2) ScreeningAuthority enum (Phase 12/13), (3) reconciler role (Phase 10).
- Deferred decisions with no timeline: Living search dedup (Phase 12), updated reviews (post-R3), threshold tuning (future).
DedupAuditLogmentioned in constraints but never formally defined.- Cross-reference inconsistencies between spec documents; some reference
../../../.planning/paths that may not exist. - Release boundary unclear:
StudyLifecycleStatusis Release 3 (Phase 12) but used by Release 2 reconciliation.
Gap Count Summary¶
| Release | Critical | High | Medium | Total |
|---|---|---|---|---|
| R1 (Phases 3-7) | 7 | 9 | 4 | 20 |
| R2 (Phases 8-11) | 8 | 8 | 0 | 16 |
| R3 (Phases 12-16) | 10 | 14 | 0 | 24 |
| Cross-Release | 0 | 0 | 8 | 8 |
| Total | 25 | 31 | 12 | 68 |
Part 2: Current Test Infrastructure Assessment¶
What Exists¶
| Layer | Framework | Coverage | Status |
|---|---|---|---|
| .NET Unit Tests | xUnit 2.6.6 + Moq | 8 test projects, 66 test files | Running in CI |
| Angular Unit Tests | Vitest 4.0.14 | 258 spec files | ~30% disabled (broken standalone migration) |
| MongoDB Integration | TestContainers (MongoDB 7.0) | MongoDbTestFixture with CSUUID support | Available but excluded from CI |
| Test Builders | Fluent builders | ProjectBuilder, StudyBuilder, InvestigatorBuilder | Good foundation |
| Coverage Thresholds | Angular: 50% stmt, 40% branch | .NET: collected but not enforced | Partial |
| E2E Tests | Minimal | 1 sidenav test | Effectively none |
| API Contract Tests | None | — | Missing entirely |
Critical Gaps for AI-Driven Development¶
- ~30% of Angular tests disabled — broken standalone component migration in
project-admin/,shared/annotation/,pdf-tools/,stage/folders. - No .NET coverage thresholds — AI can silently reduce coverage.
- No E2E tests — no safety net for cross-service integration.
- No API contract tests — schema drift between frontend and backend goes undetected.
- Integration tests excluded from CI — only run manually.
Part 3: Essential Tools & Services¶
Tier 1 — Must Have (Before Starting Phase 3)¶
MCP Servers¶
| Server | Purpose | Why Essential |
|---|---|---|
| MongoDB MCP Server | Query collections, inspect schemas, test migrations | Direct DB interaction during development; validates CSUUID queries. Use readonly flag for production. |
| SonarQube MCP Server | Real-time quality gate feedback | Autonomous "PR-to-green" workflow — Claude checks quality, fixes issues, re-scans. 25 tools including get_project_quality_gate_status. |
| GitHub MCP Server | PR management, issue tracking, CI status | Already partially configured; essential for PR workflow automation |
| Context7 | Up-to-date, version-specific documentation | Eliminates hallucinated APIs for Angular 21, .NET 10, MongoDB driver. Fetches real docs instead of training data. |
Claude Code Configuration¶
| Item | Purpose |
|---|---|
| CLAUDE.md restructure | Current file is ~600 lines. Split into root (50-100 lines) + @import files in .claude/rules/. Target <200 lines per file. |
| PreToolUse hooks | Block kubectl apply, helm install (GitOps enforcement). Block edits to linter/formatter configs (known AI anti-pattern: Claude modifies configs to pass instead of fixing code). |
| PostToolUse hooks | Auto-run dotnet format on .cs files and prettier on .ts files after every edit. Auto-run affected test file to catch regressions immediately. |
| Stop hook auto-reviewer | Spawn a subagent with critical reviewer persona when Claude finishes. Reviews all modified files, returns errors to force corrections before human review. See O'Reilly pattern. |
| Phase-specific CLAUDE.md | Per-phase context files loaded via @import — entity schemas, API contracts, acceptance criteria. |
| PreToolUse config protection | Block edits to .eslintrc, .prettierrc, Directory.Build.props, vitest.config.ts — prevents AI from weakening quality gates. |
Quality Gates in CI¶
| Gate | Tool | Current | Needed |
|---|---|---|---|
| .NET coverage threshold | coverlet | Collected, not enforced | Enforce 70% on new code |
| Angular test restoration | Vitest | 30% disabled | Fix or delete broken tests before Phase 3 |
| API contract tests | Pact or similar | None | Add for all API ↔ PM service boundaries |
| .NET analysers | Roslyn / SonarQube | Basic | Enable nullable reference types, security analysers |
Tier 2 — Strongly Recommended¶
| Tool/Service | Purpose |
|---|---|
| Playwright MCP Server | E2E browser testing for Angular UI. Claude can write and run Playwright tests against the dev server. |
| SonarCloud (or SonarQube Cloud) | Continuous code quality analysis on every PR. Combined with MCP server, enables autonomous quality remediation. |
| OpenAPI/Swagger spec generation | Auto-generate API contracts from .NET controllers. Provides machine-readable specs Claude can implement against. |
| Architecture Decision Records (ADRs) | Already in use. Add ADR for each resolved specification gap to maintain decision audit trail. |
| Sentry MCP Server | Already configured. Use for production error context during development. |
| Claude Code GitHub Action | @claude mentions in PRs for AI-powered review. Install via /install-github-app. |
| Claude Code Security Review Action | OWASP-aligned security analysis on PR diffs with severity ratings. |
Tier 3 — Nice to Have¶
| Tool/Service | Purpose |
|---|---|
| Docker MCP Server | Manage dev containers, spin up local MongoDB/RabbitMQ for integration testing. |
| Apidog MCP Server | Connect AI to API specifications for contract-first development. |
| Custom prompt hooks | Semantic code review on every Write/Edit — evaluate against phase spec constraints. |
Part 4: Workflow Strategy for AI-Driven Feature Development¶
1. Specification Preparation (Before Each Phase)¶
Each phase needs a single, self-contained technical specification containing:
Phase N Specification
├── Entity Schemas (with field types, indexes, constraints)
├── API Contracts (OpenAPI-style: endpoints, request/response, error codes)
├── State Transitions (formal diagrams for any state machines)
├── Acceptance Criteria (testable, with concrete scenarios)
├── Error Handling (every failure mode with recovery strategy)
├── Migration Plan (with rollback, validation queries, failure recovery)
├── Security Matrix (per-endpoint access control)
├── Performance Targets (measurable, with baseline methodology)
└── Test Strategy (unit, integration, E2E scenarios)
Why: Claude Code works best with unambiguous, consolidated specifications. Scattered requirements across multiple documents cause hallucination and inconsistency. This aligns with the emerging Spec-Driven Development (SDD) methodology: review at phase gates, not during implementation.
Phase spec prompt structure (use XML tags for Claude's contract-style processing):
<task>Implement the Publication entity and repository</task>
<constraints>
- Must use CSUUID (GuidRepresentation.CSharpLegacy)
- Collection name: pmPublication (follows pm prefix convention)
- DOI and PMID fields must have unique sparse indexes
</constraints>
<acceptance-criteria>
- All tests in ReferenceRepositoryTests pass
- dotnet build succeeds with zero warnings
- dotnet format --verify-no-changes passes
</acceptance-criteria>
2. Implementation Workflow Per Phase¶
┌─────────────────────────────────────────────┐
│ 1. RESEARCH (SubAgent, read-only) │
│ - Read phase spec + existing code │
│ - Identify all files to modify │
│ - Map dependencies │
│ ↓ │
│ 2. PLAN (Plan Mode) │
│ - Claude proposes implementation plan │
│ - Human reviews & approves │
│ ↓ /clear │
│ 3. IMPLEMENT (Fresh session per component) │
│ - Write failing tests first (TDD) │
│ - Implement against tests │
│ - PostToolUse hooks run linter + tests │
│ - SonarQube MCP checks quality │
│ ↓ │
│ 4. VERIFY │
│ - Full test suite │
│ - SonarQube quality gate │
│ - Integration tests │
│ - Manual review of security-critical │
│ ↓ │
│ 5. PR + REVIEW │
│ - PR created with /commit-push-pr │
│ - CI runs full pipeline │
│ - SonarCloud PR decoration │
│ - Human reviews diff │
└─────────────────────────────────────────────┘
3. Context Management Rules¶
| Rule | Rationale |
|---|---|
| One phase per session | Prevents context bleed between phases. |
/clear between components |
Each component (entity, API, UI) gets fresh context. |
| Delegate exploration to SubAgents | Keep main context clean for implementation. |
Manual /compact at 50% |
Avoid the "agent dumb zone" where quality degrades. |
Phase spec loaded via @import |
Always available without consuming conversation context. |
4. What to Delegate vs. Supervise¶
| Fully Delegate to AI | Supervise Closely |
|---|---|
| Test generation | Authentication/authorisation logic |
| Entity/model classes | Data migration scripts |
| Boilerplate CRUD endpoints | Optimistic concurrency implementation |
| UI components (non-security) | CSUUID/MongoDB serialisation |
| Helm chart updates | Feature flag toggle logic |
| Documentation updates | Cross-service message contracts |
| Formatting, linting fixes | Rollback procedures |
Part 5: Key Risks & Mitigations¶
Risk 1: Specification Ambiguity → AI Hallucination¶
Risk: Claude fills gaps in specs with plausible but incorrect implementations. Mitigation: Consolidate each phase into a single spec document with zero ambiguity. Use TDD so tests define behaviour, not prose.
Risk 2: Angular Test Debt → Silent Regressions¶
Risk: 30% of Angular tests disabled means AI changes to project-admin/, annotation/, stage/ components have no safety net.
Mitigation: Fix or delete broken tests before starting Phase 5 (Annotation Form V2). This is a prerequisite.
Risk 3: No API Contract Tests → Schema Drift¶
Risk: Frontend and backend evolve independently; schema mismatches only found in production. Mitigation: Add Pact or similar contract testing for API ↔ PM service boundary. Generate OpenAPI specs from controllers.
Risk 4: CSUUID Complexity → Data Corruption¶
Risk: MongoDB CSUUID (BinData subtype 3) is a non-obvious serialisation format. AI-generated queries or migrations could use wrong GUID format. Mitigation: Document CSUUID patterns in phase-specific CLAUDE.md. Add test helpers that validate GUID format. Use TestContainers for all DB-touching code.
Risk 5: Context Window Limits → Incomplete Implementation¶
Risk: Complex phases (especially Phase 12 - Dedup) may exceed single-session capacity. Mitigation: Break phases into sub-tasks of ≤500 lines of change each. Use worktree isolation for parallel sub-tasks. Maintain a phase-level checklist in the todo system.
Risk 6: Context Degradation → Quality Collapse¶
Risk: Performance degrades as context fills. Long debugging sessions, accumulated exploration, and conversation history degrade output quality. 66% of AI-generated code has subtle issues ("almost right" problem). 41% of AI-generated code is revised within 2 weeks.
Mitigation: /clear between tasks. Manual /compact at 50% (not 80%). Use subagents for exploration. Write handoff files for multi-session work. Never trust claims — require test execution as proof.
Risk 7: Config Modification Anti-Pattern¶
Risk: Claude modifies linter/formatter configs to make violations pass instead of fixing the code.
Mitigation: PreToolUse hook blocking edits to .eslintrc, .prettierrc, Directory.Build.props, vitest.config.ts, angular.json.
Risk 8: Migration Safety → Production Data Loss¶
Risk: Phase 7, 11, 16 migrations run against production database (syrftest). Staging shares the same DB.
Mitigation: Implement MongoDB Testing Strategy (database isolation) BEFORE Release 1 deployment. Add pre-migration snapshots. Test migrations against anonymised production copies.
Part 6: Recommended Action Plan¶
Immediate (Before Phase 3 Development)¶
- Restructure CLAUDE.md — Split into root +
@importfiles, target <200 lines each. - Install MCP servers — MongoDB, SonarQube, GitHub (if not already).
- Configure Claude Code hooks:
PostToolUseonEdit/Write: run affected test file.PreToolUseonBash: blockkubectl apply,helm install.- Fix Angular test debt — Restore or delete the 30% disabled tests.
- Enforce .NET coverage thresholds — Minimum 70% on new code in CI.
- Consolidate Phase 3-4 specs — Single document per phase with all sections above.
Short-Term (During Release 1)¶
- Add API contract tests — Pact or similar for API ↔ PM service boundary.
- Generate OpenAPI specs — From .NET controllers, use as implementation contracts.
- Implement database isolation —
syrf_stagingandsyrf_pr_Ndatabases (MongoDB Testing Strategy). - Create phase-specific
@importfiles — Entity schemas, API contracts, acceptance criteria.
Medium-Term (Before Release 2)¶
- Add Playwright E2E tests — Critical user journeys (login, create project, annotate, export).
- Set up SonarCloud PR decoration — Quality gate on every PR.
- Consolidate Phase 8-11 specs — Resolve all gaps identified above.
- Define reconciliation algorithm formally — State machine diagrams, fairness proofs.
Long-Term (Before Release 3)¶
- ASySD integration contract — Formal API spec for R-based dedup engine.
- PRISMA validation test suite — All 34 fields with derivation queries.
- Performance benchmarking framework — For dedup at scale (100K+ studies).
- Consolidate Phase 12-16 specs — Resolve all gaps identified above.
Summary¶
The phase specifications are strong on business logic and domain modelling but weak on:
- Technical implementation details (API contracts, schemas, state machines)
- Error handling and recovery (auto-save failures, migration rollback, dedup subprocess crashes)
- Testing and verification (no E2E strategy, disabled tests, no contract tests)
- Security specifications (access control matrices, blinding enforcement)
The current test infrastructure provides a medium-high safety level for backend changes but is insufficient for confident Angular development due to disabled tests.
With the recommended tooling (MCP servers, hooks, SonarQube, contract tests) and workflow (TDD, fresh sessions, specification consolidation), Claude Code can reliably implement these features — but the specifications must be completed first. AI amplifies specification quality: good specs → excellent code; ambiguous specs → plausible but wrong code.
Sources¶
- Claude Code Hooks Reference
- Claude Code Best Practices
- Using CLAUDE.md Files
- SonarQube MCP Server
- PR-to-Green with Claude + SonarQube MCP
- MongoDB MCP Server
- Context7 MCP Server
- Claude Code GitHub Action
- Claude Code Security Review Action
- Auto-Reviewing Claude's Code (O'Reilly)
- Spec-Driven Development
- Claude Code Hooks Production Quality Patterns
- How I Use Every Claude Code Feature
- Parallel AI Coding with Git Worktrees
- Trail of Bits Claude Code Config