initiative discovery

Automated PR Review Service

Robert Matsuoka Updated 2026-03-11 engineering-platform code-intelligence
engineering-platform code-intelligence developer-experience cost-optimization automation q1-2026

Initiative: Automated PR Review Service

The Bet

We believe that by building an automated PR review service on top of mcp-vector-search and duetto-code-intelligence, we can: - Replace CodeRabbit/Augment with internal solution - Save ~$50/seat/month across entire engineering organization - Match or exceed commercial tool quality with vector search + KG context - Maintain full control over code analysis infrastructure and data

Background

Current State

Commercial Tools in Use: - CodeRabbit or Augment for automated PR reviews - Cost: ~$50/seat/month per developer - Annual cost (50 engineers): ~$30,000/year - Limited customization and integration with internal tooling

New Capabilities Available: - mcp-vector-search now has comprehensive code review capabilities: - analyze review command with security, architecture, performance analysis - Vector search + knowledge graph for deep codebase context - SARIF output format for standardized findings - Specialized LLM prompts per review type (OWASP, CWE, SOLID) - duetto-code-intelligence provides foundation for GitHub integration

The Problem

Pain Points with Commercial Tools: 1. Cost: $30K+/year for features we can build internally 2. Limited Context: Generic code analysis without Duetto-specific knowledge 3. No Customization: Can't adapt review criteria to Duetto standards 4. Data Privacy: External services analyzing proprietary code 5. Integration Gaps: Doesn't understand Duetto architecture patterns

The Opportunity

What We Can Build: - Automated service monitoring all Duetto GitHub org PRs - Leverages mcp-vector-search review capabilities: - Security review (OWASP/CWE vulnerabilities) - Architecture review (SOLID principles, patterns) - Performance review (complexity, optimizations) - Contextual analysis using vector search + knowledge graph: - Understands related code across repository - Recognizes Duetto-specific patterns and conventions - Learns from historical PR reviews and decisions - PR Comments as first deliverable (GitHub integration)

Solution Design

Architecture

GitHub Webhooks
    ↓
Automated PR Review Service (duetto-code-intelligence based)
    ↓
mcp-vector-search Code Review Engine
    ↓
    ├─ Vector Search (find relevant code context)
    ├─ Knowledge Graph (understand relationships)
    ├─ LLM Analysis (Claude 3.5 Sonnet via AWS Bedrock)
    └─ SARIF Output (structured findings)
    ↓
GitHub API (post PR comments)

Components

1. PR Monitor Service - GitHub webhook listener for all Duetto org repositories - Filters: new PRs, PR updates, specific labels - Queue management for processing PRs

2. Review Orchestrator - Clones PR branch and target branch - Runs mcp-vector-search review analysis: - Security review (vulnerabilities, secrets) - Architecture review (patterns, SOLID principles) - Performance review (complexity, bottlenecks) - Aggregates findings across all review types

3. Context Engine (mcp-vector-search) - Vector search for similar code and historical fixes - Knowledge graph for understanding code relationships - Duetto-specific pattern recognition

4. Comment Formatter - Converts SARIF findings to GitHub PR comment format - Prioritizes critical/high findings - Provides actionable recommendations with context

5. GitHub Integration - Posts review comments to PR - Inline comments for specific lines (future) - Review status updates (future)

Tech Stack

  • Base: duetto-code-intelligence (existing GitHub integration patterns)
  • Analysis Engine: mcp-vector-search (new review capabilities)
  • LLM: Claude 3.5 Sonnet via AWS Bedrock
  • Deployment: TBD (AWS Lambda, Cloud Run, or Kubernetes)
  • Storage: Vector DB (Chroma/Lance), Knowledge Graph (Küzu)

Success Metrics

Metric Current Target Measurement
Cost savings $30K/year (external tools) $0 licensing Budget tracking
Review quality CodeRabbit baseline Match or exceed Developer satisfaction survey
Review speed TBD < 5 min per PR Service metrics
False positive rate TBD < 10% Developer feedback
Adoption rate 0% 80% of PRs PR coverage tracking
Developer satisfaction Baseline (CodeRabbit) ≥ 4/5 rating Quarterly survey

Risks & Mitigation

Risk Likelihood Impact Mitigation
Review quality below commercial tools Medium High Start with pilot team, iterate based on feedback
Performance too slow for large PRs Medium Medium Optimize vector search, implement incremental analysis
High false positive rate High Medium Fine-tune prompts, build feedback loop for improvements
GitHub rate limiting Low Medium Implement queue management, request rate limit increase
Infrastructure costs exceed savings Low High Monitor costs closely, optimize LLM usage and caching

Timeline

Phase 1: POC (2 weeks)

  • Basic PR webhook listener
  • Single review type integration (security)
  • Console output (no GitHub comments yet)

Phase 2: MVP (2 weeks)

  • All review types (security, architecture, performance)
  • GitHub PR comment integration
  • Deploy to pilot team (5-10 developers)

Phase 3: Production (4 weeks)

  • Production deployment to all repositories
  • Monitoring and alerting
  • Developer feedback loop
  • Cost tracking

Total: 8 weeks to full production deployment

Experiments

E-2026-ENG-001: POC - Single Repository PR Review

Goal: Validate mcp-vector-search code review quality on real PRs

Approach: - Select 1 active repository (e.g., pricing-service) - Run mcp-vector-search reviews on last 10 merged PRs - Compare findings against CodeRabbit/Augment reviews - Measure: precision, recall, developer usefulness ratings

Success Criteria: - ≥80% precision (findings are valid issues) - ≥70% recall (catches issues found by commercial tools) - ≥4/5 developer usefulness rating

Time box: 1 week

E-2026-ENG-002: MVP - Automated Comment Service

Goal: Validate end-to-end automation and GitHub integration

Approach: - Deploy webhook service for pilot repository - Auto-post review comments on new PRs - Collect developer feedback for 2 weeks - Iterate on comment format and findings presentation

Success Criteria: - Service processes 100% of PRs within 5 minutes - ≥70% of comments marked as "helpful" by developers - Zero false positives flagged as critical/high severity

Time box: 3 weeks (1 week build, 2 weeks pilot)

Dependencies

  • mcp-vector-search review capabilities: ✅ Available (recent implementation)
  • duetto-code-intelligence codebase: Access to GitHub integration patterns
  • GitHub org admin access: For webhook configuration
  • OpenRouter API key: For Claude Opus access (cost management)
  • Deployment infrastructure: AWS/GCP account for service hosting

Cost Analysis

Current Cost (Commercial Tools)

  • CodeRabbit/Augment: $50/seat/month
  • 50 engineers: $2,500/month = $30,000/year

Projected Cost (Internal Solution)

Development: - 1 engineer × 8 weeks = ~$20K one-time

Operating Costs (annual): - LLM API (OpenRouter): ~$5K/year (estimated) - Assume 100 PRs/week, 10K tokens/review, $0.01/1K tokens - = $100/week = $5,200/year - Infrastructure: ~$2K/year - Serverless functions or small Kubernetes deployment - Vector DB storage - Maintenance: ~$10K/year (20% of engineer time)

Total Year 1: $37K (development + operating) Total Year 2+: $17K/year (operating only)

ROI: - Year 1: $37K ($7K investment over commercial tools; investment recoups in first 7 months of Year 2) - Year 2+: $13K/year savings (43% cost reduction) - 3-year NPV: ~$30K savings

Note: Cost savings increase with team growth (commercial tools scale linearly with headcount)

Open Questions

  1. Deployment Model: AWS Lambda, Google Cloud Run, or Kubernetes?
  2. Feedback Mechanism: How do developers mark comments as helpful/unhelpful?
  3. Customization: How do teams configure review criteria per repository?
  4. Integration: Should we integrate with Slack for review notifications?
  5. Inline Comments: When to implement line-specific comments vs PR-level?
  6. Rate Limiting: What's Duetto's GitHub API rate limit? Do we need to request increase?
  • mcp-vector-search: Code review engine foundation
  • /analyze review security|architecture|performance
  • Vector search + knowledge graph context
  • SARIF output format
  • duetto-code-intelligence: GitHub integration patterns
  • CodeRabbit/Augment: Commercial tools we're replacing

Acceptance Criteria

  • [ ] POC experiment completed and validated (E-2026-ENG-001)
  • [ ] MVP deployed to pilot team (E-2026-ENG-002)
  • [ ] Developer satisfaction ≥ 4/5 rating
  • [ ] Cost tracking shows ≥30% savings vs commercial tools
  • [ ] False positive rate < 10%
  • [ ] Production deployment to all repositories
  • [ ] Documentation: setup guide, architecture, troubleshooting