Platform Team Charter
Platform Team Charter
Effective: March 2026 Duration: 8 months (March 2026 -- October 2026) Charter Owner: Shiv Yadav, Director of Engineering Review Cadence: Monthly with Engineering Leadership; formal re-charter at Month 8
Executive Summary
The problem: Duetto's core data infrastructure -- MongoDB and Redis -- is years past end-of-life, actively blocking revenue (stuck optimize jobs, import failures), costing real money (10x KMS cost spike, $10K/deal ghost hotel onboarding), and carrying unpatched security vulnerabilities. The system cannot scale past its current load. The 70:28 growth target will break it.
The plan: A single focused team, organized around the MongoDB upgrade as the critical path, accelerated by Delphix data virtualization and AI-assisted development. Reaching MongoDB 4.4 by Month 2 unlocks batch server scaling and resolves the two highest-impact customer-facing issues. Reaching v7+ by Month 5 enables sharding. Delphix compresses the upgrade by pipelining validation: while version N goes to production, all 9 applications are already pre-validated against version N+1 on virtual clones. Redis, config safety, and growth blockers run as parallel tracks.
The ask: Approve the 5-person team (3.5 FTE) at the allocations specified. Approve Delphix Continuous Data provisioning via AWS Marketplace (AMI deployed in our AWS account; annual contract recommended; ~12TB uncompressed footprint across RS1-RS9, within 25TB limit). Grant the team authority to coordinate cross-team app validation windows. Ensure Forecasting (Hammer validation) and DevOps (AL2023 VM migration) are aligned as dependencies.
Purpose
Make the Duetto platform safe to scale. Eliminate the infrastructure ceilings, security gaps, and manual toil that prevent reliable growth to 70:28 and beyond. In 8 months, any team should be able to onboard customers faster, deploy changes safely, and operate at 2x current load without heroics.
Scope
In Scope
- MongoDB upgrade path: FCV 4.2 → 4.4 → 5.0 → 6.0 → 7.0+, including sharding strategy
- Application compatibility validation across all MongoDB consumers
- Redis ElastiCache migration (closing CVE SRE-2987)
- Cluster rebalancing and co53 data isolation
- KMS call caching in the monolith (
KmsJwtValidator,KmsSignatureValidatorinapi/) - Server config validation CI gate and OpenFeature provider activation (PLG-253)
- Ghost hotel architecture (Phase 1) in partnership with Frictionless Onboarding
- Data retention policy and archival implementation beyond as-of collections
Out of Scope
- Monolith rewrite or large-scale refactoring
- EKS / OpenSpace re-platform (separate track)
- Feature development for product teams
- Data Platform ETL pipeline work (non-monorepo pipelines; GCSP is in-scope as a monorepo application)
Cost of Inaction
| Impact | Current Cost | Source |
|---|---|---|
| Stuck optimize jobs | Customers unable to publish rates during peak hours | PLA-4306 / RATE-6809 |
| Batch server ceiling | System capped at 16 servers since June 2025 rollback (was 24) | On-call war room, June 2025 |
| Import delays | Support's #1 priority; emergency bulk changes blocked during peaks | TSQ-1598 |
| KMS cost spike | CloudTrail: $713 → $6,000 in 11 days; KMS: $76 → $752; 12% stage error rate | Arek's analysis, Feb 2 |
| Redis CVE | Known critical vulnerability on v3.2.12 (June 2020); unpatched in production | SRE-2987 |
| Ghost hotel onboarding | ~$10K FTE cost per deal with ghost hotels (e.g., B&B Hotels: 300+ ghosts) | David Gerrard, Onboarding |
| Config drift outages | Stage crashed Jan 9 from a single missing property override | PR #8781 |
| 15 years unarchived data | Inflated storage, slower queries, longer upgrade windows, harder sharding design; only as-of collections archived today | -- |
| Bus factor | The primary engineer across 6 of these problems has departed | -- |
Every one of these gets worse with growth. The MongoDB version ceiling is the binding constraint -- it blocks scaling, blocks performance, blocks sharding, and blocks vendor support.
The Root Problem: MongoDB
The production MongoDB binary is 4.2.15 with Feature Compatibility Version locked at 4.0. Demo was upgraded to FCV 4.2 on Jan 15, 2026. Stage and production remain at FCV 4.0.
MongoDB 4.2 has a known findAndModify() contention bug that spikes CPU to 100% under load, queuing all tasks for minutes or hours. This single defect is the root cause of stuck optimize jobs, import performance failures, and the batch server scaling ceiling. Upgrading to 4.4 mitigates the bug. Reaching v7+ enables sharding, permanently solving cluster rebalancing.
David Gerrard was told "60-90 days to 4.4" in December 2025. That window has passed.
Upgrade Path (Accelerated)
Delphix enables pipelined validation: while Step N goes to production, virtual clones at Step N+1 are already being tested by all application teams in parallel. This eliminates the sequential validate-wait-upgrade-revalidate cycle that makes traditional MongoDB upgrades take months per step.
Current state: Binary 4.2.15 | FCV 4.0 (prod/stage) | FCV 4.2 (demo only)
Month 1: FCV → 4.2 on stage + prod ← demo complete (Jan 15)
Delphix: 4.4 clones live for all 9 apps ← parallel validation starts
Month 2: Binary + FCV → 4.4 ← MILESTONE: findAndModify mitigated
Delphix: 5.0 clones live ← next validation pipelined
Month 3: → 5.0 ← driver compatibility validated on clones
Delphix: 6.0 clones live
Month 4: → 6.0 ← DRE may lag (AL2023 dependency)
Delphix: 7.0 clones live
Month 5: → 7.0+ ← MILESTONE: sharding unlocked for RS2-RS9
Applications Requiring Validation at Each Step
| Application | Owner | Risk Level | Notes |
|---|---|---|---|
Monolith (app/) |
Platform | Critical | Java driver 5.6.1 already supports 4.x-8.0+ -- no driver upgrade needed. Validation is server-behavior testing, not client compatibility. Uses Redisson 3.45.0 for distributed locking, login attempts, rate limiting. |
GCSP (gcdb-stream-processor/) |
Platform (monorepo) | Critical | Separate Spring Boot 3.5.5 app; shares monolith's data/query/tools modules and same MongoDB driver 5.6.1. readPreference.isForceSecondary=true -- reads from secondaries. Uses Redisson 3.52.0 (different version from monolith). Ram authored its connection pool listener. |
Hammer (hammer/) |
Forecasting | Hard blocker | CLI tool in monorepo; shares same driver 5.6.1 + api/data/query/tools modules. Connects directly via dbhost/replicaSet properties. Catherine Daves: must validate before prod upgrade. |
Capture (capture/) |
Platform | High | Separate Jetty server in monorepo; same architecture and driver as monolith. |
data-export (data-export/) |
Platform | High | CLI runner in monorepo; shares same data layer. |
| DBUpgrader | Platform | High | Uses data module DbUpgrader class directly. |
| 24 Hour Sync | Platform | High | Spring component scan fragility. |
| AWS DMS | Data Engineering | Medium | Non-monorepo; separate data replication tool. |
DRE (dari-haskell) |
DRE | Medium | Haskell service (GHC 8.10.7, stack lts-18.11, Nix ghc8107Binary). Uses community mongoDB Haskell driver (~2.7.x) with OP_MSG wire protocol support -- NOT in MongoDB's official driver compatibility matrix. MongoDB access via Duetto.Dre.Db.DbMongo using find, updateMany (Upsert), delete, runCommand (indexes). Connects to replica sets (SRV + TLS) with master access mode (writes to primary). Does NOT use findAndModify() (only appears in design comments in Cache.hs) -- so the 4.2 contention bug does not directly affect DRE. Has its own db-upgrader CLI (separate from monolith's DbUpgrader). AL2023 VM blocker: compiled Haskell binary links against system glibc/zlib/pcre; recompilation required for AL2023. Additional risk: if the Haskell driver needs upgrading for server 6.0+/7.0+ compatibility, it may force a stack resolver upgrade (lts-18 is from 2021) and potentially a GHC version bump. Testing against each MongoDB version carries higher uncertainty than monorepo apps because it's a community driver with no official server version guarantees. |
| Sandbox / LTEs | Platform/DevOps | Low | Sanitized prod data environments. |
Independent Problems (Parallel Tracks)
Redis CVE (SRE-2987): Production Redis v3.2.12 with known critical vulnerability. ElastiCache PoC running in dev/stage. Migration runbook written. Redisson is deeply embedded in the monolith: distributed locking (SingletonLockManager via Lua scripts in tools/), login attempt tracking (LoginAttemptInfoService in frontend/), ETL rate limiting and integration callbacks (etl/), and backfill queuing in GCSP. Monolith uses Redisson 3.45.0; GCSP uses 3.52.0 (version split). The monolith migration is the hardest step, not the easiest. Rollout: EBS → integration services → monolith.
Ghost Hotels / co53: 20,000+ competitor rate shop hotels in the client database. ~$10K FTE cost per deal. Strategic blocker for PLG and 70:28. Antonio Cortes (Frictionless Onboarding) is modularizing Tenant Management. Data isolation benefits from MongoDB sharding at v7+.
KMS Call Caching: The monolith's KmsJwtValidator and KmsSignatureValidator (in api/src/main/java/com/duetto/auth/) call kmsClient.verify() on every authenticated request with zero caching. The KmsClient bean (AwsClientConfiguration.java) is a bare KmsClient.builder().region(region).build(). Every JWT validation and every inter-service signature check = an AWS KMS API call. This is what caused the 10x cost spike: CloudTrail $713→$6K, KMS $76→$752. common-boot-auth v1.1.3 added caching for domain services, but the monolith's own auth path remains uncached. The departed engineer had monolith-side fix code that was never verified or merged.
Data Retention / 15 Years of Unarchived Data: The system only archives as-of collections. Everything else -- 15 years of transactional data, logs, historical records -- sits in RS2-RS9 indefinitely. This inflates storage costs, slows queries, increases upgrade risk (more data to migrate and validate at each step), and makes sharding design harder because you're partitioning 15 years of accumulation rather than a right-sized working set. There is no retention policy and no archival pipeline beyond as-of snapshots.
Server Config Fragility: No CI validation for Spring property overrides across environments. Jan 9 stage crash from a single missing property. OpenFeature SDK is wired in (OpenFeatureConfiguration.java, FeatureFlagEvaluator.java in api/) but uses NoOpProvider -- it's a placeholder pending PLG-253 (Datadog provider). Feature flags always return false today.
Team
Membership
| Engineer | Allocation | Current Team | Role on This Charter |
|---|---|---|---|
| Matthew Moulds | 100% | DevOps/SRE | Infrastructure lead. Driving Redis migration, MongoDB upgrade infra, config CI gate. |
| Ian Farmer | 50% | Data Platform | MongoDB architecture. Upgrade design, sharding strategy, cluster rebalancing. |
| Ramprasad Ramachandran | 50% | Data Platform | Data boundary expert. App-side validation for every MongoDB consumer (GCSP, DMS, Capture, 24-Hour Sync). |
| Arek Kossendowski | 50% | DevOps/OpenSpace | Deployment safety. Config validation CI gate, Chef cookbook coordination, AL2023 VM migration. |
Total effective capacity: ~2.5 FTE
Note: Yancy Matherne has moved fully to the Forecasting team (TC-002). Application layer lead role needs reassignment.
One full-time anchor owns the mission day-to-day. Three 50% contributors bring specialist knowledge while staying connected to their current teams.
Capability Coverage
| Capability | Matthew | Yancy | Ian | Ram R | Arek |
|---|---|---|---|---|---|
| Delphix / clone pipeline | Lead | Design | Support | ||
| MongoDB upgrade (infra) | Lead | Design | App validation | Chef/certs | |
| MongoDB upgrade (app-side) | Monolith | Design | Lead | ||
| Redis / ElastiCache | Lead | Support | |||
| Monolith internals | Lead | ||||
| Cluster operations | Support | Lead | Support | ||
| GCSP (monorepo, secondary reads) | Support | Support | Lead | ||
| CI/CD & config safety | Support | Lead | |||
| Data retention / archival | App logic | Lead | Collection analysis | ||
| Ghost hotel architecture | Monolith side | Data side | Design | ||
| Terraform / IaC | Lead | Support |
Operating Model
- Weekly MongoDB upgrade checkpoint (30 min): Where are we in the path? What app validation is pending? What's blocked? This is the team's most important recurring meeting.
- Monthly leadership review: Progress against milestones, leading indicators (batch server count, optimize job failures, import P95).
- Async-first communication in a dedicated Slack channel for the charter.
- Delphix data virtualization (AWS Marketplace): Delphix Continuous Data, deployed as an AMI in our AWS account via Marketplace. Creates virtual clones of production MongoDB (RS1-RS9) at each target version. This is the key enabler for the compressed upgrade timeline. Instead of sequential validate-upgrade-validate cycles, Delphix lets us pipeline: pre-validate the next version on virtual clones while the current version rolls to production. Each application team (Hammer, GCSP, DMS, etc.) gets their own clone to test against in parallel rather than waiting for a shared staging window. Instant rollback on test failures compresses the feedback loop from hours to minutes. Delphix supports MongoDB 4.2.x through 8.x via the Mongopy connector;
OnlineMongoDumpingestion supports cross-version cloning (staging can be higher version than source). Runs onr5n.4xlarge(recommended); annual contract pricing based on data volume. Current footprint: ~5.4TB compressed across RS1-RS9, estimated ~12TB uncompressed (2.5x ratio) -- well within Delphix's 25TB ingestion limit. ROI: if Delphix compresses the MongoDB path by even 2 months, it saves more in 3.5 FTE engineering time and incident cost than the licensing. Setup: Month 1, in parallel with the driver audit. - AI-assisted development: The team will use AI tooling (including claude-mpm for multi-agent orchestration) to accelerate driver audits, compatibility analysis, code review, documentation, and test generation. Tasks like auditing 9 applications across 5 MongoDB versions, generating validation test suites, and tracing orphaned code are materially faster with AI assistance.
- Architecture Decision Records (ADRs) for key decisions (sharding strategy, co53 isolation approach, config validation design).
Goals
Track 1: MongoDB Upgrade (Critical Path -- Accelerated with Delphix)
The upgrade timeline is compressed by pipelining validation: while version N rolls to production, Delphix clones at version N+1 are already being tested by all 9 application teams in parallel.
| Goal | Delivers | Measure of Success | Target | Lead |
|---|---|---|---|---|
| M0. Delphix Setup + Clone Pipeline | Parallel validation infra | Delphix engine provisioned; RS1-RS9 ingested via OnlineMongoDump; first 4.4 virtual clones available to app teams | Month 1 | Matthew + Ian |
| M1. App Inventory & Behavior Validation | Prerequisite | All 10 apps inventoried. Monorepo driver is already 5.6.1 (supports 4.x-8.0+) -- no driver upgrade needed there. Focus is server-behavior validation: secondary read behavior, aggregation pipeline changes, index behavior at each version. Hammer validated on Delphix 4.4 clone. DRE requires separate audit: Haskell community mongoDB driver (~2.7.x) is NOT in MongoDB's official compatibility matrix; uses find/updateMany/delete/runCommand (no findAndModify); runs on GHC 8.10.7 / lts-18.11 (2021-era); if driver upgrade needed at 6.0+, cascades to resolver + GHC upgrade. DMS audited separately. |
Month 1 | Ram R + Ian |
| M2. FCV 4.2 → Stage + Prod | Unblock 4.4 | FCV 4.2 in stage and prod with zero regressions | Month 1 | Matthew + Arek |
| M3. Upgrade to 4.4 | Stuck jobs, import perf, batch ceiling | Prod on 4.4; batch servers scaled to 24+; RATE-6809 closed; TSQ-1598 P95 improved >50%. Delphix 5.0 clones already being validated. | Month 2 | Ian + Ram R + Matthew |
| M4. Upgrade to 5.0 | Path to v7 | Prod on 5.0; all consumers validated. Delphix 6.0 clones already being validated. | Month 3 | Ian + Ram R + Matthew |
| M5. Upgrade to 6.0 → 7.0+ | Sharding / automated rebalancing | Prod on 7.0+; sharding strategy approved for RS2-RS9 company data; automated company rebalancing replaces current manual process | Month 4-5 | Ian + Ram R + Matthew + Arek |
Track 2: Security & Infrastructure (Parallel)
| Goal | Delivers | Measure of Success | Target | Lead |
|---|---|---|---|---|
| S1. Redis → ElastiCache | CVE closure | All non-monolith services on ElastiCache; SRE-2987 closed | Month 4 | Matthew |
| S2. Config Validation CI Gate | Release safety | Zero config-drift outages; CI gate blocking missing overrides | Month 3 | Arek + Matthew |
Track 3: Application Layer (Parallel)
| Goal | Delivers | Measure of Success | Target | Lead |
|---|---|---|---|---|
| A1. KMS Call Caching Fix | Cost spike fix | Add caching to KmsJwtValidator.verify() and KmsSignatureValidator.verify() in monolith; KMS/CloudTrail costs to baseline; stage error rate drops |
Month 1 | Yancy |
| A2. Ghost Hotel Phase 1 | Growth unblock | Architecture approved; onboarding time for ghost-hotel deals reduced >80% | Month 6 | Yancy + Arek |
| A3. Cluster 7 Stabilization | Near-term risk | No cluster >30 companies; co53 isolated; Cluster 7 load reduced >50% | Month 2 | Ian + Ram R |
| A4. Data Retention Policy & Archival | Upgrade speed, storage, sharding readiness | Retention policy defined with business sign-off; archival pipeline implemented for top collections by volume; RS2-RS9 working set reduced before M5 sharding design | Month 4-6 | Ian + Yancy |
Sequencing
Month 1: PREPARE + INFRASTRUCTURE
M0 Delphix engine provisioned; RS1-RS9 ingested .... Matthew + Ian
→ Spin up 4.4 virtual clones for all app teams
M1 App inventory & driver audit (AI-assisted) ...... Ram R + Ian
→ Hammer starts validating on Delphix 4.4 clone immediately
M2 FCV 4.2 → stage, then prod ...................... Matthew + Arek
A1 KMS call caching fix (quick win) .................. Yancy
A3 Cluster 7 tactical rebalancing begins ............ Ian + Ram R
Month 2: THE BIG UNLOCK
M3 Upgrade to 4.4 .................................. Ian + Ram R + Matthew
→ All 9 apps already pre-validated on Delphix clones
→ findAndModify bug mitigated
→ batch servers scale past 16
→ stuck optimize jobs resolve (PLA-4306)
→ import performance improves (TSQ-1598)
→ Delphix spins up 5.0 clones; app validation starts
S2 Server config validation CI gate ................. Arek + Matthew
A3 Cluster 7 rebalancing completes .................. Ian + Ram R
Month 3: 5.0 + SECURE
M4 Upgrade to 5.0 .................................. Ian + Ram R + Matthew
→ Pre-validated on Delphix clones during Month 2
→ Delphix spins up 6.0 clones; DRE/AL2023 check
S1 Redis → ElastiCache migration begins ............. Matthew
A4 Data retention: policy definition begins ......... Ian + Yancy
Month 4-5: FINISH THE PATH + RIGHT-SIZE
M5 Upgrade to 6.0 → 7.0+ .......................... Ian + Ram R + Matthew + Arek
→ Pre-validated on Delphix clones during Month 3-4
→ DRE may lag if AL2023 VMs not ready (not a hard blocker)
→ Sharding strategy for RS2-RS9 company data
S1 Redis → ElastiCache migration completes .......... Matthew
A4 Data retention: archival pipeline implementation .. Ian + Yancy
→ Archive top collections by volume in RS2-RS9
→ Right-size working set for sharding
Month 6-8: SHARDING + GROWTH + HARDENING
M5 Sharding implementation for RS2-RS9 .............. Ian + Ram R + Matthew
→ Automated company rebalancing replaces manual process
A2 Ghost hotel Phase 1 ............................. Yancy + Arek
- Feature flag rollout infra ....................... Arek
- Charter review + next mission .................... All
What Month 2 Delivers
When production reaches 4.4, these outcomes happen as a direct consequence of the upgrade:
- Stuck optimize jobs (PLA-4306):
findAndModifycontention drops. Jobs that stuck for hours during peak complete normally. The Rates team can close RATE-6809. - Import performance (TSQ-1598): Imports stop competing with batch processing for RS1 metadata locks. Support's #1 priority resolves.
- Batch server scaling: System moves from 16 to 24+ batch servers. This is raw throughput capacity denied since the June 2025 rollback.
This is the moment the charter earns credibility with every dependent team. With Delphix, it happens a full month earlier than a traditional sequential approach because all 9 applications were pre-validated on virtual clones during Month 1.
Risks & Dependencies
| Risk | Impact | Mitigation | Owner |
|---|---|---|---|
| Delphix setup takes longer than Month 1 | Pipelined validation delayed; falls back to sequential approach | Begin Delphix provisioning in Week 1; use OnlineMongoDump for initial ingestion (simplest path for RS1-RS9) | Matthew + Ian |
| Hammer validation takes longer than Month 1 | Blocks M3 (4.4 upgrade) | Hammer gets its own Delphix clone in Week 2; Catherine's team validates on their schedule without blocking others | Ram R → Charter Owner |
| Unexpected server-behavior change at 4.4 | Delays M3 | Driver 5.6.1 is compatible across full path; risk is server-side (aggregation pipeline changes, secondary read lag, index behavior). Delphix clones catch this in Month 1 before prod upgrade | Ian + Ram R |
| Departed engineer's KMS caching fix code cannot be found | A1 delayed | Fix is scoped: add caching layer around kmsClient.verify() in KmsJwtValidator and KmsSignatureValidator (api/src/main/java/com/duetto/auth/). Yancy can implement from scratch if orphaned code isn't recoverable -- the classes are small. |
Yancy |
| AL2023 VM migration not ready for M5 | Blocks 6.0+ upgrade for DRE | DRE can lag behind main upgrade; not a hard blocker for monolith path. DRE binary (GHC 8.10.7) links against glibc/zlib/pcre; recompilation required for AL2023 | Arek (coordinates with DevOps) |
| DRE Haskell driver compatibility at 6.0+ | May require stack resolver upgrade | Community mongoDB Haskell driver (~2.7.x, OP_MSG) is not in MongoDB's official compatibility matrix. If 6.0+/7.0+ changes break it, upgrading the driver may cascade to upgrading lts-18 → lts-22+ and GHC 8.10 → 9.x. Validate on Delphix clones early |
Ian + DRE team |
| Matthew overloaded (Redis + MongoDB infra + config) | Track 1 or Track 2 slips | Redis migration (S1) defers to Month 4 intentionally; MongoDB infra is priority | Charter Owner |
| 50% contributors pulled back to home teams | Velocity drops | Charter requires explicit leadership agreement on allocation; escalation path defined | Charter Owner → CTO |
| Ghost hotel work blocked by Frictionless Onboarding availability | A2 slips | Phase 1 scoped to platform-side architecture; Frictionless Onboarding partnership is additive, not blocking | Yancy + Arek |
Cross-Team Dependencies
| Dependency | Team | What We Need | By When |
|---|---|---|---|
| Hammer validation on MongoDB 4.4 | Forecasting (Catherine Daves) | Test window + confirmation that Hammer runs clean on 4.4 | Month 1 |
| AL2023 VM migration timeline | DevOps (Arek's home team) | VM upgrade for DRE environments before MongoDB 6.0 step | Month 6 |
| Chef cookbook updates per upgrade step | DevOps | Coordinated cookbook changes for each MongoDB version bump | Ongoing |
| Frictionless Onboarding partnership | Antonio Cortes' team | Joint architecture for ghost hotel decoupling | Month 4-6 |
| AWS DMS validation | Data Engineering | Confirm DMS works at each MongoDB version | At each step |
Success Criteria (Mission-Level)
At the end of 8 months, this charter succeeds if:
- Production MongoDB is on v7+ by Month 5 and a sharding strategy for RS2-RS9 company data is in implementation, enabling automated rebalancing (RS1 metadata stays as-is).
- Batch servers are running at 24+ with no CPU saturation incidents.
- Zero customer-impacting stuck optimize jobs per month (vs. current recurring PLA-4306).
- Redis CVE SRE-2987 is closed with all non-monolith services on managed ElastiCache.
- Zero config-drift-caused outages with CI validation gate in place.
- KMS/CloudTrail costs returned to baseline (pre-January 2026 levels) via caching in
KmsJwtValidatorandKmsSignatureValidator. - Ghost hotel onboarding time reduced >80% for deals with existing ghosts.
- Data retention policy in effect with archival pipeline running; RS2-RS9 working set measurably reduced before sharding design.
- No single-person dependency -- every critical system (MongoDB, Redis, KMS, config, DRE) has at least 2 engineers who can independently operate, troubleshoot, and recover it.
Authority & Escalation
- The charter team has authority to schedule upgrade windows across non-production environments without additional approval.
- Production upgrade windows require 48-hour notice in
#releaseand#on-call-coordinationwith rollback plan documented. - Cross-team validation requests (Hammer, DMS, DRE) can be made directly; if unresponsive within 5 business days, escalate to Charter Owner.
- If any 50% contributor's allocation is reduced below 50% by their home team, the Charter Owner escalates to the CTO.
What This Charter Is NOT
- Not a rewrite. We are upgrading infrastructure, not rebuilding the monolith.
- Not the EKS modernization project. OpenSpace re-platform is a separate track.
- Not feature work. We build the floor that feature teams stand on.
- Not 8 independent workstreams. MongoDB is the spine. Everything else runs in parallel but Track 1 is the critical path.
- Not a traditional upgrade timeline. Delphix pipelined validation and AI-assisted development compress what would normally be 12+ months of sequential upgrades into 5.
The best platform teams are invisible. Today, too many people at Duetto think about the platform -- because it's in their way. In 8 months, the goal is for the platform to be the thing nobody worries about anymore.