team-charter unknown

Quality Engineering Team Charter

Updated 2026-03-11
quality-engineering test-infrastructure ci-cd developer-experience automation

Quality Engineering Team Charter

Mission

Build and maintain shared test infrastructure, CI/CD quality gates, and developer tooling that enable developer-owned testing at scale — so that every engineering team at Duetto has a paved road to high-quality software without building their own testing infrastructure.

Team Metrics

This team uses a dual primary metric model that evolves with maturity. Quality Gate Adoption Rate measures whether the infrastructure is in place; Defect Detection Rate measures whether it actually works. Both are needed — adoption without effectiveness is theatre, effectiveness without adoption is irrelevant.

Primary Metric A — Quality Gate Adoption Rate (Leading)

Measures the reach of the team's infrastructure. This is the dominant metric during Phase 1-2 (months 1-6) when the immediate problem is "we have no gates." It remains important but stabilises as adoption matures.

Attribute Value
Metric Quality Gate Adoption Rate
Definition Percentage of active repositories meeting Phase 2+ quality gates (integration tests, contract verification, security scanning, coverage reporting)
Baseline 0% (no quality gates currently enforced beyond basic linting and unit tests)
Target 80% of active repos at Phase 2+ within 6 months; 95%+ within 12 months
Measurement GitHub Actions workflow audit — automated weekly scan of repo CI configurations against gate tier definitions
Cadence Weekly automated report, monthly team review

Why this metric matters: Today, no repository enforces quality gates beyond basic linting. The team's first job is to build the infrastructure and make it easy to adopt. Until adoption is above 70%, no outcome metric is meaningful because the sample size is too small.

When this metric becomes secondary: Once adoption stabilises above 80%, the focus shifts to Defect Detection Rate. Adoption remains tracked but is no longer the primary driver of team priorities — the question shifts from "are gates in place?" to "are gates catching real problems?"

Primary Metric B — Defect Detection Rate (Lagging)

Measures the effectiveness of the team's infrastructure. This metric becomes the dominant primary metric from Phase 3 onward (months 6+) as adoption matures and the question shifts from "did teams adopt gates?" to "do the gates actually catch bugs?"

Attribute Value
Metric Defect Detection Rate
Definition Percentage of defects caught by CI quality gates before reaching production, out of all defects discovered (CI-caught + production-escaped)
Baseline TBD (establish once gate adoption >50% and incident RCA tagging is in place)
Target >70% of detectable defects caught in CI within 12 months
Measurement Jira incident RCA tags (caught-in-ci vs escaped-to-production) + CI gate failure logs correlated with defect tickets
Cadence Monthly calculation, quarterly deep-dive

Why this metric matters: Adoption alone doesn't prove value. A team could have 100% gate adoption with gates so lenient they catch nothing. Defect Detection Rate answers the question engineering leadership actually cares about: "is the investment in quality infrastructure reducing production incidents?"

How it's measured — the RCA tagging model:

For this metric to work, the team must establish an incident classification practice:

Classification Definition Who Tags Example
Caught in CI Defect was detected by a quality gate before merge or deploy Automatic (CI gate failure → Jira ticket) SpotBugs caught null pointer; Pact contract test caught breaking schema change
Escaped to production Defect reached production and was discovered via alert, user report, or incident On-call engineer during RCA API returned 500 due to unhandled edge case; pricing regression shipped undetected
Could have been caught Escaped defect where RCA determines an existing or proposed gate should have caught it QE team during monthly review Missing integration test for new endpoint; Hammer would have caught pricing deviation if automated
Not CI-detectable Escaped defect that no reasonable CI gate could catch (config issues, data-dependent, infrastructure failure) QE team during monthly review AWS region failover; third-party API behaviour change

Defect Detection Rate = caught-in-ci / (caught-in-ci + escaped-to-production − not-ci-detectable) × 100

Phased measurement rollout:

Phase Period Focus Measurement Readiness
Phase 1 Months 1-3 Establish RCA tagging in Jira; begin classifying incidents Baseline only — no target yet
Phase 2 Months 3-6 Correlate CI gate failures with prevented defects; refine classification First meaningful calculation; set initial target
Phase 3 Months 6-12 Defect Detection Rate becomes dominant primary metric Monthly tracking; gate effectiveness reviews drive team priorities

Metric Evolution Summary

Months 1-6 (Phase 1-2)              Months 6-12 (Phase 3)              Months 12+ (Phase 4)
┌────────────────────────┐          ┌────────────────────────┐          ┌────────────────────────┐
│ PRIMARY:               │          │ PRIMARY:               │          │ PRIMARY:               │
│  Quality Gate Adoption │    →     │  Defect Detection Rate │    →     │  Defect Detection Rate │
│                        │          │                        │          │                        │
│ ESTABLISHING:          │          │ SECONDARY:             │          │ MAINTENANCE:           │
│  Defect Detection Rate │          │  Quality Gate Adoption │          │  Quality Gate Adoption │
│  (baseline + RCA tags) │          │  (maintain >80%)       │          │  (maintain >95%)       │
└────────────────────────┘          └────────────────────────┘          └────────────────────────┘

Secondary Metrics

Metric Baseline Target Measurement
CI Build Success Rate (first run) TBD >90% across all repos GitHub Actions metrics / DataDog
Flaky Test Rate TBD (estimated >5%) <1% of total test suite DataDog test visibility dashboard
E2E Framework Consolidation 3 frameworks (Selenium + Cypress + Playwright) 1 framework (Playwright) CI pipeline audit — framework count
Mean Time to Fix Flaky Test TBD <5 business days from detection Jira ticket SLA tracking
Code Coverage Visibility 0 repos reporting coverage 100% of active repos CI artifact / CodeCov / SonarCloud
Hammer CI Automation 0 automated runs 100% of pricing PRs + nightly develop GitHub Actions workflow logs

Counter-Metrics (Guard Rails)

Metric Acceptable Range Alert If
PR Pipeline Duration (P90) <15 min >20 min sustained over 1 week
Quality Gate False Positive Rate <2% of blocked PRs >5% (gates blocking legitimate changes)
Developer Satisfaction with CI/Tooling >7/10 NPS <6/10 in quarterly survey
Infrastructure Cost (CI runners) Within budget >20% increase quarter-over-quarter without corresponding test count growth

Scope

In Scope

Shared Infrastructure (both tracks): - CI/CD quality gate pipelines — design, implement, and maintain GitHub Actions workflow templates for all quality gate phases (Foundation → Contracts → Performance → Optimization) - Reusable CI pipeline templates — GitHub Actions workflows for Java/Spring, React/Next.js, and Python ML repos with quality gates built in - Flaky test detection and remediation systems — automatic quarantine (>3 failures in 7 days → quarantine + ticket), unified tracking dashboard, SLA enforcement - Quality dashboards — DataDog dashboards for test health (coverage, flaky rate, execution time), pipeline health (build success, duration, gate pass rate), and production health per team and per track - AI code review tooling — CodeRabbit rule configuration (path-based test enforcement, anti-pattern detection) and Augment Code integration (semantic test gap detection) - Test reporting and visibility — unified test result aggregation across frameworks, GitHub Actions test summary annotations, coverage delta PR comments (CodeCov/SonarCloud) - Quality onboarding materials — documentation, golden-path example repos, and onboarding scripts for new engineers

App/Platform-specific: - Playwright infrastructure — shared Playwright configuration, browser management, CI sharding setup, Page Object conventions, Codegen integration - Selenium-to-Playwright migration execution — AI-accelerated conversion (Claude Code skill), parallel run validation, infrastructure decommission (20 Selenium runners + 12 Cypress containers → Playwright shards) - Cypress-to-Playwright migration execution — same-language conversion, Cypress Cloud decommission - Testcontainers configurations — shared Docker container configs for MongoDB, PostgreSQL, Redis, LocalStack (SQS/SNS/Kinesis/S3), RabbitMQ - Pact broker management — contract broker infrastructure, can-i-deploy gate integration, message contract support for event-driven services - Test data factories — shared factories/fixtures for Duetto entities (hotels, rates, reservations, users) - Static analysis enforcement — make SpotBugs blocking (baseline existing), add Snyk/Trivy dependency scanning

Intelligence-specific: - Great Expectations infrastructure — shared configuration, suite templates, integration into Airflow DAGs as blocking gates - MLflow validation gates — automated accuracy comparison vs baseline, champion/challenger pipeline infrastructure - Data drift monitoring — alerting integration with DataDog for distribution shift detection - Python CI pipeline templates — standardized Ruff + MyPy strict + pytest + coverage for all Intelligence repos - Golden file test harness — reusable snapshot testing infrastructure for inference endpoints - Hammer automation — GitHub Actions workflow for pricing PRs (curated hotel sample), nightly develop runs (full hotel set), configurable tolerance thresholds, structured JSON output, DataDog integration, PR comment summaries; long-term: containerize and decouple from monolith

Out of Scope

  • Writing product-level tests — owned by engineering teams; this team builds the infrastructure teams use to test
  • Setting team-level test strategy — owned by embedded QEs in the Quality Guild; this team provides the tools, QEs provide the strategy
  • Model accuracy decisions — owned by ML engineers; this team builds the validation infrastructure (MLflow gates, drift monitoring)
  • Quality standards governance — owned by the Quality Guild (see TC-006); this team implements what the guild decides
  • Production incident response — owned by on-call engineering teams
  • Application performance optimization — owned by product teams; this team provides k6 infrastructure and CI integration for performance testing
  • Security policy definition — owned by Security team; this team integrates scanning tools into CI pipelines

Active Initiatives

No active initiatives yet — to be defined as part of the Quality Engineering Strategy rollout.

Team Members

Role Person Responsibilities
Staff/Lead QE (Team Lead) [TBD] Technical vision, framework decisions (Playwright over Cypress, Pact for contracts, Great Expectations for data quality), CI/CD quality gate architecture, migration strategies, tooling evaluation and selection
Quality Engineer (App/Platform focus) [TBD] Playwright infrastructure, Testcontainers configs, Pact broker, test data factories, Selenium/Cypress migration execution, static analysis enforcement
Quality Engineer (Intelligence focus) [TBD, Phase 2] Great Expectations infrastructure, MLflow validation gates, data drift monitoring, Python CI templates, Hammer automation, golden file test harness

Team Lead Responsibilities (Staff/Lead QE — L6)

The Staff/Lead QE is the most senior technical IC in the Quality Guild. In addition to leading this team, they:

  • Set the technical vision for test automation across the organization
  • Design reusable CI/CD quality gate architecture — GitHub Actions templates, gate tier definitions, pipeline optimization
  • Make tooling decisions — evaluate and select tools, define integration patterns
  • Act as the technical counterpart to the Guild Lead — Guild Lead owns people and governance, Staff/Lead QE owns technical strategy and infrastructure
  • Mentor team members and embedded QEs on automation best practices

Stakeholders

  • Quality Guild (TC-006): Sets standards and governance that this team implements; primary consumer of dashboards and infrastructure
  • Embedded QEs (App/Platform + Intelligence): Use infrastructure built by this team; provide requirements for test tooling and framework needs
  • Engineering Teams (all): End consumers of CI pipeline templates, quality gates, Testcontainers configs, test data factories, and Playwright infrastructure
  • Engineering Leadership: Approve infrastructure investment, quality gate enforcement levels, tool procurement
  • DevOps / Platform Engineering: Collaborate on CI runner provisioning, Docker image management, AWS infrastructure for test environments (LocalStack, ECS for Hammer)
  • Intelligence Team Leads: Partner on Hammer automation priorities, Great Expectations rollout, MLflow gate requirements
  • Security Team: Align on SAST/DAST integration into CI pipelines (Snyk/Trivy configuration)

Quarterly Review

Q1 2026 Review (Planned)

Date: TBD (end of Phase 1) Primary Metric Movement: 0% → TBD (target: 30% of repos at Phase 1 gates)

Note: Q1 2026 is a target date. Actual timeline may be delayed based on hiring pipeline — all team roles are currently unfilled.

Initiative Expected Outcome Success Criteria
Phase 1 CI gates All active repos have build + unit + lint gates 100% repo coverage
Coverage visibility Coverage reports published as CI artifacts All repos reporting; baselines established
Selenium foundation Playwright project ready, migration skill created 10+ P0 tests converted and passing
Hammer CI Automated pricing PR runs + nightly develop Hammer running on 100% of pricing PRs
Quality dashboard v1 DataDog dashboard live All teams can see their test health metrics

Next Quarter Focus: Phase 2 gates (contracts, security), bulk Selenium/Cypress conversion, Pact broker, flaky test auto-quarantine, Hammer structured reporting.