Automated PR Review Service
Initiative: Automated PR Review Service
The Bet
We believe that by building an automated PR review service on top of mcp-vector-search and duetto-code-intelligence, we can: - Replace CodeRabbit/Augment with internal solution - Save ~$50/seat/month across entire engineering organization - Match or exceed commercial tool quality with vector search + KG context - Maintain full control over code analysis infrastructure and data
Background
Current State
Commercial Tools in Use: - CodeRabbit or Augment for automated PR reviews - Cost: ~$50/seat/month per developer - Annual cost (50 engineers): ~$30,000/year - Limited customization and integration with internal tooling
New Capabilities Available:
- mcp-vector-search now has comprehensive code review capabilities:
- analyze review command with security, architecture, performance analysis
- Vector search + knowledge graph for deep codebase context
- SARIF output format for standardized findings
- Specialized LLM prompts per review type (OWASP, CWE, SOLID)
- duetto-code-intelligence provides foundation for GitHub integration
The Problem
Pain Points with Commercial Tools: 1. Cost: $30K+/year for features we can build internally 2. Limited Context: Generic code analysis without Duetto-specific knowledge 3. No Customization: Can't adapt review criteria to Duetto standards 4. Data Privacy: External services analyzing proprietary code 5. Integration Gaps: Doesn't understand Duetto architecture patterns
The Opportunity
What We Can Build: - Automated service monitoring all Duetto GitHub org PRs - Leverages mcp-vector-search review capabilities: - Security review (OWASP/CWE vulnerabilities) - Architecture review (SOLID principles, patterns) - Performance review (complexity, optimizations) - Contextual analysis using vector search + knowledge graph: - Understands related code across repository - Recognizes Duetto-specific patterns and conventions - Learns from historical PR reviews and decisions - PR Comments as first deliverable (GitHub integration)
Solution Design
Architecture
GitHub Webhooks
↓
Automated PR Review Service (duetto-code-intelligence based)
↓
mcp-vector-search Code Review Engine
↓
├─ Vector Search (find relevant code context)
├─ Knowledge Graph (understand relationships)
├─ LLM Analysis (Claude 3.5 Sonnet via AWS Bedrock)
└─ SARIF Output (structured findings)
↓
GitHub API (post PR comments)
Components
1. PR Monitor Service - GitHub webhook listener for all Duetto org repositories - Filters: new PRs, PR updates, specific labels - Queue management for processing PRs
2. Review Orchestrator - Clones PR branch and target branch - Runs mcp-vector-search review analysis: - Security review (vulnerabilities, secrets) - Architecture review (patterns, SOLID principles) - Performance review (complexity, bottlenecks) - Aggregates findings across all review types
3. Context Engine (mcp-vector-search) - Vector search for similar code and historical fixes - Knowledge graph for understanding code relationships - Duetto-specific pattern recognition
4. Comment Formatter - Converts SARIF findings to GitHub PR comment format - Prioritizes critical/high findings - Provides actionable recommendations with context
5. GitHub Integration - Posts review comments to PR - Inline comments for specific lines (future) - Review status updates (future)
Tech Stack
- Base: duetto-code-intelligence (existing GitHub integration patterns)
- Analysis Engine: mcp-vector-search (new review capabilities)
- LLM: Claude 3.5 Sonnet via AWS Bedrock
- Deployment: TBD (AWS Lambda, Cloud Run, or Kubernetes)
- Storage: Vector DB (Chroma/Lance), Knowledge Graph (Küzu)
Success Metrics
| Metric | Current | Target | Measurement |
|---|---|---|---|
| Cost savings | $30K/year (external tools) | $0 licensing | Budget tracking |
| Review quality | CodeRabbit baseline | Match or exceed | Developer satisfaction survey |
| Review speed | TBD | < 5 min per PR | Service metrics |
| False positive rate | TBD | < 10% | Developer feedback |
| Adoption rate | 0% | 80% of PRs | PR coverage tracking |
| Developer satisfaction | Baseline (CodeRabbit) | ≥ 4/5 rating | Quarterly survey |
Risks & Mitigation
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Review quality below commercial tools | Medium | High | Start with pilot team, iterate based on feedback |
| Performance too slow for large PRs | Medium | Medium | Optimize vector search, implement incremental analysis |
| High false positive rate | High | Medium | Fine-tune prompts, build feedback loop for improvements |
| GitHub rate limiting | Low | Medium | Implement queue management, request rate limit increase |
| Infrastructure costs exceed savings | Low | High | Monitor costs closely, optimize LLM usage and caching |
Timeline
Phase 1: POC (2 weeks)
- Basic PR webhook listener
- Single review type integration (security)
- Console output (no GitHub comments yet)
Phase 2: MVP (2 weeks)
- All review types (security, architecture, performance)
- GitHub PR comment integration
- Deploy to pilot team (5-10 developers)
Phase 3: Production (4 weeks)
- Production deployment to all repositories
- Monitoring and alerting
- Developer feedback loop
- Cost tracking
Total: 8 weeks to full production deployment
Experiments
E-2026-ENG-001: POC - Single Repository PR Review
Goal: Validate mcp-vector-search code review quality on real PRs
Approach: - Select 1 active repository (e.g., pricing-service) - Run mcp-vector-search reviews on last 10 merged PRs - Compare findings against CodeRabbit/Augment reviews - Measure: precision, recall, developer usefulness ratings
Success Criteria: - ≥80% precision (findings are valid issues) - ≥70% recall (catches issues found by commercial tools) - ≥4/5 developer usefulness rating
Time box: 1 week
E-2026-ENG-002: MVP - Automated Comment Service
Goal: Validate end-to-end automation and GitHub integration
Approach: - Deploy webhook service for pilot repository - Auto-post review comments on new PRs - Collect developer feedback for 2 weeks - Iterate on comment format and findings presentation
Success Criteria: - Service processes 100% of PRs within 5 minutes - ≥70% of comments marked as "helpful" by developers - Zero false positives flagged as critical/high severity
Time box: 3 weeks (1 week build, 2 weeks pilot)
Dependencies
- mcp-vector-search review capabilities: ✅ Available (recent implementation)
- duetto-code-intelligence codebase: Access to GitHub integration patterns
- GitHub org admin access: For webhook configuration
- OpenRouter API key: For Claude Opus access (cost management)
- Deployment infrastructure: AWS/GCP account for service hosting
Cost Analysis
Current Cost (Commercial Tools)
- CodeRabbit/Augment: $50/seat/month
- 50 engineers: $2,500/month = $30,000/year
Projected Cost (Internal Solution)
Development: - 1 engineer × 8 weeks = ~$20K one-time
Operating Costs (annual): - LLM API (OpenRouter): ~$5K/year (estimated) - Assume 100 PRs/week, 10K tokens/review, $0.01/1K tokens - = $100/week = $5,200/year - Infrastructure: ~$2K/year - Serverless functions or small Kubernetes deployment - Vector DB storage - Maintenance: ~$10K/year (20% of engineer time)
Total Year 1: $37K (development + operating) Total Year 2+: $17K/year (operating only)
ROI: - Year 1: $37K ($7K investment over commercial tools; investment recoups in first 7 months of Year 2) - Year 2+: $13K/year savings (43% cost reduction) - 3-year NPV: ~$30K savings
Note: Cost savings increase with team growth (commercial tools scale linearly with headcount)
Open Questions
- Deployment Model: AWS Lambda, Google Cloud Run, or Kubernetes?
- Feedback Mechanism: How do developers mark comments as helpful/unhelpful?
- Customization: How do teams configure review criteria per repository?
- Integration: Should we integrate with Slack for review notifications?
- Inline Comments: When to implement line-specific comments vs PR-level?
- Rate Limiting: What's Duetto's GitHub API rate limit? Do we need to request increase?
Related Work
- mcp-vector-search: Code review engine foundation
/analyze review security|architecture|performance- Vector search + knowledge graph context
- SARIF output format
- duetto-code-intelligence: GitHub integration patterns
- CodeRabbit/Augment: Commercial tools we're replacing
Acceptance Criteria
- [ ] POC experiment completed and validated (E-2026-ENG-001)
- [ ] MVP deployed to pilot team (E-2026-ENG-002)
- [ ] Developer satisfaction ≥ 4/5 rating
- [ ] Cost tracking shows ≥30% savings vs commercial tools
- [ ] False positive rate < 10%
- [ ] Production deployment to all repositories
- [ ] Documentation: setup guide, architecture, troubleshooting
Related Documents
- mcp-vector-search Review System - Code review capabilities
- duetto-code-intelligence - GitHub integration foundation
- Engineering Platform README