Four Lessons for How RCTs Work as a Measurement Model

Measuring lift in ongoing, dynamic programs with noisy and “spikey” medical data presents real world difficulties not often found in classic “one and done” Random Controlled Trials (RCT) scenarios.

Lift is an improvement in a predicted outcome such as a reduction in trial population visits to ER vs. the control group if you are trying to reduce ER visits.

NextHealth Technologies uses an RCT measurement approach in its platform to determine the impact of interventions and to compare the post-intervention performance of trial groups (campaigns) to that of control groups in the same populations that did not receive the intervention. Randomized assignment removes any potential selection bias. A previous blog-post provided further background and explored how we determine whether any measured difference is statistically significant, or if it should be considered random noise.

After using RCTs in dozens of client campaigns, NextHealth gleaned four important lessons as to how RCTs work as a measurement model.

1. Trimming the Mean
In some programs, cost is derived from underlying claims with extremely high variability in annualized net paid amounts. In these situations, the true lift and its p-value are more difficult to determine – a few “outlier” claims can obscure the dynamics exhibited by the vast majority of members. In principle, the effects would eventually average out, but programs have limited population sizes and limited observation time. To deal with this situation, NextHealth employs a “trimmed mean” approach. At each measurement point the members with the top 1% (or other agreed upon fractional value) of annualized cost values are removed from the trial and control data set before computing lift.

2. Closing Campaigns
The NextHealth adaptive learning approach to campaign execution encourages experimentation and empirical measurement. Campaigns vary in their delivery mode and message content and some will be found to work better than others within a given population. There can be cases where an individual campaign, even with relatively large member years, produces no significant signal for an extended period of time. Additional noisy experience will continue to accumulate and accrue to the overall program measurement creating a “ballast” effect. In some instances, we may instead choose to remove the trial members from a campaign to get a “cleaner” signal on the remaining campaigns. The cost experience and member-years for those members from the program start time forward are no longer included. The members themselves will be returned to the candidate pool for potential reselection after a sufficient hold-out period (e.g. six months).

3. Sandbox Campaigns / Programs
NextHealth may designate a campaign as experimental and not to be included in the “official” program lift/ROI calculations. This means any such campaigns would be set up in a separate “sandbox” program to allow experimentation by both NHT and the client on different messaging or intervention modes. If the campaign is later moved to the program where “official” measurements take place, members already assigned as trials would not be included – it would be equivalent to a new campaign. Sandbox campaigns would be declared as such prior to their start.

4. Use Risk Ranking to Define Populations not to Order Interventions
The NextHealth platform ranks members of a population by the predicted value of selected key performance indicators (KPIs), or weighted combinations, to be able to target the “highest risk” members. To maximize impact in a limited amount of time, it would be tempting to assign members to interventions each month by just skimming off the current highest risk members. This would be fine if all campaigns/ interventions started and ended at the same time (and campaign assignment for each selected member was randomized). The campaigns would then all have similar members and inter-campaign comparisons would be statistically valid. In practice, interventions may not all be deployable at program start due to delays in approval or other technical or logistical considerations. In that case, using a risk ranked deployment method would create imbalances; later starting campaigns would have lower risk members to work with.  

It is better to define an entire target population via a percentile of the KPI predictions such as ‘all members meeting conditions A,B,C, and whose predicted KPI value exceeds X’. Then within this subset, we randomly select members for interventions. X is set so that the number of members available over the course of the program is sufficiently high to get statistically valid results, but no higher.

These four examples provide a deeper understanding of how NHT balances scientific rigor with practical considerations to measure impacts for our clients and ensure the best possible outcomes.

Download a whitepaper to learn more about how NextHealth clients achieve causal outcomes and use behavioral nudges to change member behavior.

See how the NextHealth analytics solution reduced avoidable emergency room visits by 25%.