Quality Engineering Team Charter
Quality Engineering Team Charter
Mission
Build and maintain shared test infrastructure, CI/CD quality gates, and developer tooling that enable developer-owned testing at scale — so that every engineering team at Duetto has a paved road to high-quality software without building their own testing infrastructure.
Team Metrics
This team uses a dual primary metric model that evolves with maturity. Quality Gate Adoption Rate measures whether the infrastructure is in place; Defect Detection Rate measures whether it actually works. Both are needed — adoption without effectiveness is theatre, effectiveness without adoption is irrelevant.
Primary Metric A — Quality Gate Adoption Rate (Leading)
Measures the reach of the team's infrastructure. This is the dominant metric during Phase 1-2 (months 1-6) when the immediate problem is "we have no gates." It remains important but stabilises as adoption matures.
| Attribute | Value |
|---|---|
| Metric | Quality Gate Adoption Rate |
| Definition | Percentage of active repositories meeting Phase 2+ quality gates (integration tests, contract verification, security scanning, coverage reporting) |
| Baseline | 0% (no quality gates currently enforced beyond basic linting and unit tests) |
| Target | 80% of active repos at Phase 2+ within 6 months; 95%+ within 12 months |
| Measurement | GitHub Actions workflow audit — automated weekly scan of repo CI configurations against gate tier definitions |
| Cadence | Weekly automated report, monthly team review |
Why this metric matters: Today, no repository enforces quality gates beyond basic linting. The team's first job is to build the infrastructure and make it easy to adopt. Until adoption is above 70%, no outcome metric is meaningful because the sample size is too small.
When this metric becomes secondary: Once adoption stabilises above 80%, the focus shifts to Defect Detection Rate. Adoption remains tracked but is no longer the primary driver of team priorities — the question shifts from "are gates in place?" to "are gates catching real problems?"
Primary Metric B — Defect Detection Rate (Lagging)
Measures the effectiveness of the team's infrastructure. This metric becomes the dominant primary metric from Phase 3 onward (months 6+) as adoption matures and the question shifts from "did teams adopt gates?" to "do the gates actually catch bugs?"
| Attribute | Value |
|---|---|
| Metric | Defect Detection Rate |
| Definition | Percentage of defects caught by CI quality gates before reaching production, out of all defects discovered (CI-caught + production-escaped) |
| Baseline | TBD (establish once gate adoption >50% and incident RCA tagging is in place) |
| Target | >70% of detectable defects caught in CI within 12 months |
| Measurement | Jira incident RCA tags (caught-in-ci vs escaped-to-production) + CI gate failure logs correlated with defect tickets |
| Cadence | Monthly calculation, quarterly deep-dive |
Why this metric matters: Adoption alone doesn't prove value. A team could have 100% gate adoption with gates so lenient they catch nothing. Defect Detection Rate answers the question engineering leadership actually cares about: "is the investment in quality infrastructure reducing production incidents?"
How it's measured — the RCA tagging model:
For this metric to work, the team must establish an incident classification practice:
| Classification | Definition | Who Tags | Example |
|---|---|---|---|
| Caught in CI | Defect was detected by a quality gate before merge or deploy | Automatic (CI gate failure → Jira ticket) | SpotBugs caught null pointer; Pact contract test caught breaking schema change |
| Escaped to production | Defect reached production and was discovered via alert, user report, or incident | On-call engineer during RCA | API returned 500 due to unhandled edge case; pricing regression shipped undetected |
| Could have been caught | Escaped defect where RCA determines an existing or proposed gate should have caught it | QE team during monthly review | Missing integration test for new endpoint; Hammer would have caught pricing deviation if automated |
| Not CI-detectable | Escaped defect that no reasonable CI gate could catch (config issues, data-dependent, infrastructure failure) | QE team during monthly review | AWS region failover; third-party API behaviour change |
Defect Detection Rate = caught-in-ci / (caught-in-ci + escaped-to-production − not-ci-detectable) × 100
Phased measurement rollout:
| Phase | Period | Focus | Measurement Readiness |
|---|---|---|---|
| Phase 1 | Months 1-3 | Establish RCA tagging in Jira; begin classifying incidents | Baseline only — no target yet |
| Phase 2 | Months 3-6 | Correlate CI gate failures with prevented defects; refine classification | First meaningful calculation; set initial target |
| Phase 3 | Months 6-12 | Defect Detection Rate becomes dominant primary metric | Monthly tracking; gate effectiveness reviews drive team priorities |
Metric Evolution Summary
Months 1-6 (Phase 1-2) Months 6-12 (Phase 3) Months 12+ (Phase 4)
┌────────────────────────┐ ┌────────────────────────┐ ┌────────────────────────┐
│ PRIMARY: │ │ PRIMARY: │ │ PRIMARY: │
│ Quality Gate Adoption │ → │ Defect Detection Rate │ → │ Defect Detection Rate │
│ │ │ │ │ │
│ ESTABLISHING: │ │ SECONDARY: │ │ MAINTENANCE: │
│ Defect Detection Rate │ │ Quality Gate Adoption │ │ Quality Gate Adoption │
│ (baseline + RCA tags) │ │ (maintain >80%) │ │ (maintain >95%) │
└────────────────────────┘ └────────────────────────┘ └────────────────────────┘
Secondary Metrics
| Metric | Baseline | Target | Measurement |
|---|---|---|---|
| CI Build Success Rate (first run) | TBD | >90% across all repos | GitHub Actions metrics / DataDog |
| Flaky Test Rate | TBD (estimated >5%) | <1% of total test suite | DataDog test visibility dashboard |
| E2E Framework Consolidation | 3 frameworks (Selenium + Cypress + Playwright) | 1 framework (Playwright) | CI pipeline audit — framework count |
| Mean Time to Fix Flaky Test | TBD | <5 business days from detection | Jira ticket SLA tracking |
| Code Coverage Visibility | 0 repos reporting coverage | 100% of active repos | CI artifact / CodeCov / SonarCloud |
| Hammer CI Automation | 0 automated runs | 100% of pricing PRs + nightly develop | GitHub Actions workflow logs |
Counter-Metrics (Guard Rails)
| Metric | Acceptable Range | Alert If |
|---|---|---|
| PR Pipeline Duration (P90) | <15 min | >20 min sustained over 1 week |
| Quality Gate False Positive Rate | <2% of blocked PRs | >5% (gates blocking legitimate changes) |
| Developer Satisfaction with CI/Tooling | >7/10 NPS | <6/10 in quarterly survey |
| Infrastructure Cost (CI runners) | Within budget | >20% increase quarter-over-quarter without corresponding test count growth |
Scope
In Scope
Shared Infrastructure (both tracks): - CI/CD quality gate pipelines — design, implement, and maintain GitHub Actions workflow templates for all quality gate phases (Foundation → Contracts → Performance → Optimization) - Reusable CI pipeline templates — GitHub Actions workflows for Java/Spring, React/Next.js, and Python ML repos with quality gates built in - Flaky test detection and remediation systems — automatic quarantine (>3 failures in 7 days → quarantine + ticket), unified tracking dashboard, SLA enforcement - Quality dashboards — DataDog dashboards for test health (coverage, flaky rate, execution time), pipeline health (build success, duration, gate pass rate), and production health per team and per track - AI code review tooling — CodeRabbit rule configuration (path-based test enforcement, anti-pattern detection) and Augment Code integration (semantic test gap detection) - Test reporting and visibility — unified test result aggregation across frameworks, GitHub Actions test summary annotations, coverage delta PR comments (CodeCov/SonarCloud) - Quality onboarding materials — documentation, golden-path example repos, and onboarding scripts for new engineers
App/Platform-specific:
- Playwright infrastructure — shared Playwright configuration, browser management, CI sharding setup, Page Object conventions, Codegen integration
- Selenium-to-Playwright migration execution — AI-accelerated conversion (Claude Code skill), parallel run validation, infrastructure decommission (20 Selenium runners + 12 Cypress containers → Playwright shards)
- Cypress-to-Playwright migration execution — same-language conversion, Cypress Cloud decommission
- Testcontainers configurations — shared Docker container configs for MongoDB, PostgreSQL, Redis, LocalStack (SQS/SNS/Kinesis/S3), RabbitMQ
- Pact broker management — contract broker infrastructure, can-i-deploy gate integration, message contract support for event-driven services
- Test data factories — shared factories/fixtures for Duetto entities (hotels, rates, reservations, users)
- Static analysis enforcement — make SpotBugs blocking (baseline existing), add Snyk/Trivy dependency scanning
Intelligence-specific: - Great Expectations infrastructure — shared configuration, suite templates, integration into Airflow DAGs as blocking gates - MLflow validation gates — automated accuracy comparison vs baseline, champion/challenger pipeline infrastructure - Data drift monitoring — alerting integration with DataDog for distribution shift detection - Python CI pipeline templates — standardized Ruff + MyPy strict + pytest + coverage for all Intelligence repos - Golden file test harness — reusable snapshot testing infrastructure for inference endpoints - Hammer automation — GitHub Actions workflow for pricing PRs (curated hotel sample), nightly develop runs (full hotel set), configurable tolerance thresholds, structured JSON output, DataDog integration, PR comment summaries; long-term: containerize and decouple from monolith
Out of Scope
- Writing product-level tests — owned by engineering teams; this team builds the infrastructure teams use to test
- Setting team-level test strategy — owned by embedded QEs in the Quality Guild; this team provides the tools, QEs provide the strategy
- Model accuracy decisions — owned by ML engineers; this team builds the validation infrastructure (MLflow gates, drift monitoring)
- Quality standards governance — owned by the Quality Guild (see TC-006); this team implements what the guild decides
- Production incident response — owned by on-call engineering teams
- Application performance optimization — owned by product teams; this team provides k6 infrastructure and CI integration for performance testing
- Security policy definition — owned by Security team; this team integrates scanning tools into CI pipelines
Active Initiatives
No active initiatives yet — to be defined as part of the Quality Engineering Strategy rollout.
Team Members
| Role | Person | Responsibilities |
|---|---|---|
| Staff/Lead QE (Team Lead) | [TBD] | Technical vision, framework decisions (Playwright over Cypress, Pact for contracts, Great Expectations for data quality), CI/CD quality gate architecture, migration strategies, tooling evaluation and selection |
| Quality Engineer (App/Platform focus) | [TBD] | Playwright infrastructure, Testcontainers configs, Pact broker, test data factories, Selenium/Cypress migration execution, static analysis enforcement |
| Quality Engineer (Intelligence focus) | [TBD, Phase 2] | Great Expectations infrastructure, MLflow validation gates, data drift monitoring, Python CI templates, Hammer automation, golden file test harness |
Team Lead Responsibilities (Staff/Lead QE — L6)
The Staff/Lead QE is the most senior technical IC in the Quality Guild. In addition to leading this team, they:
- Set the technical vision for test automation across the organization
- Design reusable CI/CD quality gate architecture — GitHub Actions templates, gate tier definitions, pipeline optimization
- Make tooling decisions — evaluate and select tools, define integration patterns
- Act as the technical counterpart to the Guild Lead — Guild Lead owns people and governance, Staff/Lead QE owns technical strategy and infrastructure
- Mentor team members and embedded QEs on automation best practices
Stakeholders
- Quality Guild (TC-006): Sets standards and governance that this team implements; primary consumer of dashboards and infrastructure
- Embedded QEs (App/Platform + Intelligence): Use infrastructure built by this team; provide requirements for test tooling and framework needs
- Engineering Teams (all): End consumers of CI pipeline templates, quality gates, Testcontainers configs, test data factories, and Playwright infrastructure
- Engineering Leadership: Approve infrastructure investment, quality gate enforcement levels, tool procurement
- DevOps / Platform Engineering: Collaborate on CI runner provisioning, Docker image management, AWS infrastructure for test environments (LocalStack, ECS for Hammer)
- Intelligence Team Leads: Partner on Hammer automation priorities, Great Expectations rollout, MLflow gate requirements
- Security Team: Align on SAST/DAST integration into CI pipelines (Snyk/Trivy configuration)
Quarterly Review
Q1 2026 Review (Planned)
Date: TBD (end of Phase 1) Primary Metric Movement: 0% → TBD (target: 30% of repos at Phase 1 gates)
Note: Q1 2026 is a target date. Actual timeline may be delayed based on hiring pipeline — all team roles are currently unfilled.
| Initiative | Expected Outcome | Success Criteria |
|---|---|---|
| Phase 1 CI gates | All active repos have build + unit + lint gates | 100% repo coverage |
| Coverage visibility | Coverage reports published as CI artifacts | All repos reporting; baselines established |
| Selenium foundation | Playwright project ready, migration skill created | 10+ P0 tests converted and passing |
| Hammer CI | Automated pricing PR runs + nightly develop | Hammer running on 100% of pricing PRs |
| Quality dashboard v1 | DataDog dashboard live | All teams can see their test health metrics |
Next Quarter Focus: Phase 2 gates (contracts, security), bulk Selenium/Cypress conversion, Pact broker, flaky test auto-quarantine, Hammer structured reporting.