unknown draft

Intelligence Team: Vision and Strategy

Updated 2026-03-11 analytics
analytics intelligence pricing forecasting ml-platform rpos strategy living-document

Intelligence Team: Vision and Strategy

Living document. Last updated: Feb 26, 2026. Owners: Jon Ham, Sujoy Guha, Anoushka Shahane, Rohit Puntambekar


North Star

The Intelligence team exists to turn Duetto from a decision-support tool into a self-driving Revenue and Profit Operating System (RPOS).

Company OKRs: - NPS: 50 (starting at 45) - RPOS Maturity Score: 30 (starting at 0) - Live ARR: $90M (starting at $71.7M including HotStats) - Adjusted EBITDA floor: $8M

7028 strategy: 70,000 hotels by 2028. Requires diversifying from rooms-and-rates revenue management into a full RPOS and changing the Go-to-Market motion. Intelligence is the engine behind the first part.

Revenue framing: Two revenue levers, Rooms and Groups. Not model releases. Models are one of three co-equal investment pillars alongside Data and UI/UX.

B&B signal: Biggest deal in company history ($4.95M, 939 hotels, 86K rooms). Feature request #1 from the customer: group forecast and total revenue forecast. "Focusing on the bottom line is critical, particularly for companies backed by private equity." The RPOS direction and group forecasting urgency are validated by the market.


The Problems We Need to Solve

Pricing

Today's system is a nightly batch LP optimizer sitting behind a legacy Pricerator gate. The LP optimizer computes the mathematically optimal price given the environment, but it runs once per day. When the system requests a price three times daily, it returns the same answer three times. Pricerator, the legacy system, at least re-evaluated with updated information each run. In that narrow sense, the LP is a regression.

The LP can't scale to our full hotel base. Roughly 7,000 hotels are active. About half train successfully. Of those, only about 1,000 pass the quality filters necessary for LP pricing. The rest fail training, mostly from upstream data issues (missing bookings, missing rate push data, insufficient history). If ML pricing only works on 1,000 hotels, we can't credibly claim a new pricing engine.

We're also only pricing the default room type and segment. Multi-room-type pricing, group-aware transient pricing, and segment-level optimization are not addressed by the current architecture.

Forecasting

The legacy forecast is a historical average. Our V1 ML forecast (arrivals-by-LOS, boosted trees) improved WAPE from 0.38 to 0.31 on test hotels, but it's not in production yet.

Customers want a total forecast: transient plus group. B&B asked for it. Strawberry asked for it. "Forecast and Trust" is a top CS-driven problem area. We don't forecast group at all. Group wash (the gap between blocked rooms and actual bookings) is handled by a separate legacy model. The transient and group forecasts are disconnected pipelines, so we can't give customers a unified demand picture.

The forecast is also disconnected from pricing. Users expect that if they change the price, the forecast should change. If they see unconstrained demand exceeding capacity, they expect to raise prices confidently. We support neither interaction today.

Platform

MLP has built a solid foundation. The infrastructure to train and serve models exists. But the philosophy has been maximum flexibility: data scientists build everything end-to-end, bespoke, every time. No canonical feature set, no standard training pipeline, no standard inference pipeline, no feature store.

At FLYR, the ML platform team built standardized pipelines (dataset generation, training, inference) and a feature store that all models could pull from. The tradeoff was inflexibility (locked into TensorFlow, forced into time-series TF records). We're at the opposite extreme: so flexible that every new model is built from scratch, which slows iteration and makes troubleshooting opaque.

Some of the training failures in pricing may trace back to this problem. The elasticity model generates its features on the fly rather than pulling from a standard, validated feature set. Standardizing data and feature generation pipelines through MLP could address a meaningful share of training failures without touching the model itself.

Trust and UX

Revenue managers aren't accepting our rate recommendations at a high rate. They can't see why the system recommends what it does. They can't influence the system's inputs (demand expectations) the way they can in airline RM systems. They don't get confidence intervals. They don't get explanations.

"Pricing Guard Rails" is the #1 customer theme from CS analysis: customers want more control and simplified workflows. "Forecast and Trust" is #2. These are Intelligence problems wearing UX clothes.

Data

Too many hotels fail training for pricing. The data maturity across features varies: competitive pricing is already incorporated in the Double ML model, but event data is not. Group data lives in MongoDB with no export pipeline to our data lake. Sellable capacity (blocks + wash) is missing, which blocks both group pricing and forecasting accuracy. Cost data exists in HotStats but we're only scratching the surface of what we can use and how.

Standardizing data and feature generation, building toward a feature store, and integrating external data sources (events, cost data) are all part of the same strategic push. The model is only as good as the data it trains on.


Pricing

The strategic arc

The arc for pricing is a shift from batch optimization (compute the answer once, serve it all day) to dynamic inference (given the current context, what's the right price right now).

Phase 1: LP Optimizer (now). Prove that ML pricing beats legacy. Measure rate acceptance and directional revenue lift. Scale from 1K to 2K hotels. The current work matters, but it's not the destination.

Phase 2: RL as Optimizer. Train a reinforcement learning model on the Double ML demand curves. The RL model learns the optimal pricing policy for all possible states of the environment. Deployment becomes inference, not optimization. A price request with updated context (new bookings, new competitor rates) gets a genuinely new answer.

The LP optimizer takes ~4 seconds per optimization and has a 60-second cutoff with full constraints. On-demand optimization is infeasible. RL converts pricing from a compute problem to an inference problem. Early tests show RL matching or slightly beating the constrained LP after one minute of training in JAX. Customers are requesting real-time inference. RL as optimizer isn't just a cost optimization; it's a new product capability.

Phase 3: Simulation-trained RL. Build a simulation environment grounded in real hotel data. Pre-train a "bookings foundation model" on synthetic demand curves using domain randomization. Fine-tune per hotel with real historical data. The foundation model solves cold-start (new hotels get the base model), handles rare scenarios (super-peak demand, event-driven spikes), and sidesteps legal constraints on cross-hotel data pooling (trained on synthetic data, fine-tuned on the hotel's own data).

Phase 4: Shop-level dynamic pricing. Price delivered at the moment of shop, incorporating shopper context (length of stay, loyalty status, channel, booking window). Requires controlling the shopping flow and collecting shopping-level data. Long-term destination, not a near-term deliverable.

Technical bets

Neural architectures over trees. The current Double ML uses tree-based models. Trees can't handle non-constant elasticity, a known limitation that pushes the model toward pricing in the extremes. Neural ODE or similar architectures enable instrumental variables, non-constant elasticity, and better integration of external features. The direction is "when, not if."

Loosen the Pricerator gate. The LP optimizer's recommendations are filtered through Pricerator's Poisson gate, which decides whether a price change recommendation is even surfaced. The ML model should decide whether a change is warranted based on magnitude and confidence, not legacy logic.

Price change velocity. Users like small, incremental changes. A $20 jump startles them. But when there's good reason (a major event announced), they need fast response. The system should modulate change velocity based on confidence and context.

Multi-room-type and segment pricing. The current system prices only the default room type and segment. Expanding to multi-room-type pricing raises architectural questions (sell-up optimization vs. choice model with shared reward). Neural architectures make multi-grain optimization more tractable. Medium-term priority that needs scoping.

Group-aware transient pricing. Transient pricing must be aware of group commitments. In an RL framework, group state (committed blocks, expected wash) becomes part of the state definition. Displacement cost modeling, whether to accept a group at a given rate considering the transient demand it would displace, is the analytical foundation for group-transient coordination. We're not modeling displacement costs today. We should be.

User control

Users want to influence the system, not just accept or reject its output. In airline RM, revenue managers adjust demand expectations and the system re-optimizes. We need a similar mechanism.

Pricing profiles: Give users modes (conservative / standard / aggressive). Conservative biases toward occupancy. Aggressive chases rate. Standard is the model's default. Users get a sense of control without overriding individual stay-night prices across hundreds of hotels.

Demand influence: Users adjust the forecast (up or down for a period), and that adjustment feeds back as a feature to the pricing algorithm. The system re-prices based on the user's demand view. Experienced RMs expect this interaction pattern.


Forecasting

The strategic arc

Phase 1: ML Forecast V1 (now). Arrivals-by-LOS model, boosted trees. Improved WAPE from 0.38 to 0.31. Get into shadow mode, then production. Shane absorbing Mila's scope into a combined pipeline.

Phase 2: Unified transient + group model. A single model with a common feature set, predicting two outcomes: transient demand and group demand. Both targets share context (historical bookings, group commits, wash estimates, events, competitive environment). The group wash model's predictions feed the forecast as features.

The unified model is the single biggest customer request ("total forecast"). You can't do it with historical averages because of the curse of dimensionality: not enough group data per hotel to average. An ML model that maps features to predictions can consume the context and generalize. The architecture details (multi-head, shared representations, specific framework) are left to the applied data scientists. The strategic bet is that a unified model is simpler and more powerful than maintaining disconnected transient and group pipelines.

Phase 3: Forecast-price connection. The forecast should be a function of current price. If a user raises the price, the forecast should adjust. In the other direction, user adjustments to the forecast (demand influence) should feed back to pricing. The forecast responds to price, the user adjusts the forecast, pricing responds to the user's adjustment.

Phase 4: Unconstrained demand forecast. Today's "unconstrained" forecast is unconstrained by accident: the arrivals-by-LOS prediction sometimes exceeds capacity when transformed to stay-night level. A proper unconstrained demand forecast tells the RM that demand exceeds supply and they have pricing power. The model should produce both constrained and unconstrained forecasts, available via the API.

Technical bets

Event data as a standard feature. PredictHQ has 13M events joinable to hotels by location. Even if the accuracy signal is marginal, the marketing value is high and customers expect it. Event data should be standard in both pricing and forecasting models.

Forecast error as North Star. Unlike pricing (where the counterfactual is unknowable), forecast accuracy is directly measurable. We forecasted X, Y happened, the difference is the error. WAPE, MAPE, bias: pick the metrics, measure them, improve them.

Arrivals-by-LOS as the canonical grain. The forecast operates at arrival-by-length-of-stay, then transforms to stay-night level. Arrivals matter for staffing and check-in capacity, so the operational grain is preserved. The constrainer post-processes the forecast to ensure physical feasibility. Constrainer logic may need re-evaluation as the model matures.


Platform

Target state

Borrowing from the FLYR model (without its rigidity):

  • Feature store. A canonical, versioned superset of features available to all models. Pricing, forecasting, and RL all pull from the same store. Iterate on features centrally.
  • Training pipelines. Standard pipelines that accept model code and produce trained artifacts. Data scientists write the model; the pipeline handles data loading, orchestration, checkpointing, and artifact storage.
  • Inference pipelines. Standard pipelines that load a trained model and serve predictions. Support batch (nightly LP) and real-time (RL inference, forecast queries).
  • Evaluation framework. Standard metrics computation across model types. Not each applied DS building their own evaluation notebook.
  • Experiment tracking. Offline experiments (backtesting, A/B analysis) tracked in a consistent format.

The balance

FLYR's mistake was over-standardization: locked into TensorFlow, forced into time-series TF records, couldn't do non-time-series problems without deconstructing the data. We need standards that accelerate iteration without constraining architecture choice. The pipelines should be model-agnostic. The feature store should support tabular, time-series, and graph data. The evaluation framework should be extensible.

Ownership

MLP (Rohit, Hakim's team) should own the pipelines and standards. Applied data scientists should focus on experimentation and model development, not infrastructure. Hakim's current philosophy is "I built the foundation, you can build whatever you want." That's the flexibility extreme. The alignment needed is between Rohit, Hakim, Jon, and the applied data scientists on where the boundary sits.


Profit Optimization (RPOS)

Near-term: Refine the reward function with HotStats data

HotStats provides monthly P&L data by hotel: variable costs, non-room revenues, departmental breakdowns. Even at monthly granularity, incorporating cost and non-room revenue data changes the optimization objective.

The pricing model should never optimize for ADR. It should maximize total future expected reward. Today, the reward function approximates room revenue. The refinement is to net out average variable cost per booking and add average non-room revenue per booking, so the model optimizes for net contribution rather than gross room revenue.

A hotel where ancillary revenue (F&B, gaming, spa) dominates room revenue should chase occupancy. A hotel where room revenue is the primary margin driver should chase rate. The model should figure that out from the data, not from user overrides.

The fastest path to demonstrable RPOS value: same pricing model, refined reward function, measurably different recommendations for hotels where total revenue diverges from room revenue.

Medium-term: Channel cost optimization

Some channels are more expensive than others (OTA commissions vs. direct bookings). When demand exceeds supply, the system should recommend closing expensive channels first. Even a rules-based approach on top of the demand forecast would be a meaningful step. The open question is whether HotStats or another source provides channel-level cost data at sufficient granularity, and how the recommendation surfaces in the UX for the user to act on.

Long-term: Granular revenue and cost by segment

True segment-level profit optimization requires revenue and cost data below the monthly hotel level. Booking-level non-room revenue may be available through PMS integrations (OHIP). Booking-level cost data is harder; we'd likely need to model it, breaking down the monthly P&L to segment or booking level using allocation logic or statistical estimation.

The goal: discriminate between segments based on total contribution. Encourage high-value, low-cost segments to book. Discourage low-value, high-cost segments. Price accordingly. We don't have the data infrastructure for this yet, but the architecture decisions we make now (RL reward functions, feature store design) should anticipate it.


Design Principle: Every Model Delivers Three Things

  1. The prediction. A price, a forecast, a wash estimate.
  2. Confidence. How certain is the model? A forecast with a tight confidence interval means something different than one with a wide range. A pricing recommendation where the difference in expected value between the recommended price and the current price is $0.50 is different from one where it's $50.
  3. Explainability. Why did the model recommend this? Not a full SHAP decomposition (50-100x inference cost), but directional attribution: "demand is high for this period," "competitor rates increased," "group block is displacing transient capacity."

Building trust requires all three. The prediction alone isn't enough. Users need to know how confident the system is and why it's recommending what it's recommending.


Data Strategy

External data priorities

Events (PredictHQ). Not yet incorporated into our models. 13M events, joinable by location. Even if the accuracy signal is marginal, customers expect it and it's a strong marketing differentiator. Should be standard in both pricing and forecasting.

Competitive pricing. Already incorporated in the Double ML model. The maturity level varies across the feature set, and there's room to improve how competitive data is sourced and integrated (Lighthouse, OTA scraping, auto-comp rate jobs).

HotStats cost and revenue data. Monthly P&L by hotel. Enables the RPOS pivot. We're only scratching the surface of what's available and how to use it. Legal and contractual constraints may limit some applications.

Internal data priorities

Training data quality. Too many hotels fail training. Some of the failure traces back to bespoke feature generation in the elasticity model rather than standardized pipelines. A categorized failure analysis (missing bookings vs. missing rate push vs. infrastructure errors) would clarify which failures are data problems and which are pipeline problems.

Group data in the data lake. Group quotation data lives in MongoDB with no export pipeline to Athena. Data engineering needs to build the pipeline so we can use group data for modeling.

Sellable capacity (blocks + wash). Missing data that blocks both group pricing and forecasting accuracy. The optimizer prices assuming full capacity when blocks exist.


Experimentation Strategy

The roadmap is the experiment sequence. Each initiative is a hypothesis with a timeline and a decision gate. The plan isn't "build feature X by date Y." The plan is "test hypothesis X by date Y, and if it works, proceed to Z."

Infrastructure that enables this: - Stay-date level experimentation, so each hotel serves as its own control - Eppo integration for all experiments, scaling our ability to run concurrent tests - Standard evaluation pipelines, not bespoke analysis per experiment - Power analysis as a standard step (straightforward with current AI workflows)


Simulation and RL

The simulation environment serves three purposes, in order of near-term to long-term value:

Model evaluation and regression testing. Before any pricing model goes live, run it through the simulator against a battery of demand scenarios. Catches edge cases that backtesting on historical data misses.

What-if scenario tool. Ground the simulation in a specific hotel's data. Run scenarios: "what if I change the price to X?" and see projected outcomes across thousands of simulated booking paths. Monte Carlo simulation for hotel pricing.

RL training data. Synthetic demand curves for pre-training a foundation pricing model. Domain randomization creates diverse training environments. The foundation model learns general pricing dynamics, then fine-tunes to specific hotels with real data.

RL workstreams

  1. RL as optimizer (current priority: train on DML demand curves, deploy as inference)
  2. Simulation grounding (align simulated booking curves with real hotel data)
  3. Data augmentation (simulate rare scenarios to improve offline RL)
  4. Full simulation pre-training + per-hotel fine-tuning (long-term)
  5. Pipeline infrastructure (training jobs, inference jobs, evaluation at scale)

Risks

Resourcing

  • Mila departed (lead transient DS). Shane absorbing combined pipeline.
  • Yancy pulled to memory incident. Forecasting decoupling blocked.
  • Multiple departures in Feb. Domain knowledge gaps across the team.
  • VP of Data Science hiring in progress. Timeline unknown.

Technical

  • Training failures block the path to full hotel coverage. Needs categorized analysis.
  • Batch-only optimizer is a regression from Pricerator's 3x/day updates.
  • Sellable capacity data blocked. Impacts group pricing and forecasting.

Sujoy, Anoushka, Rohit: your turn. Add to it. Challenge it. Keep it alive.