Sample Ratio Mismatch

In this article:

What is Sample Ratio Mismatch?

Sample Ratio Mismatch (SRM) occurs in A/B testing when the observed proportion of samples in the control and treatment groups differs significantly from the expected proportion. For instance, if an A/B test is designed to split users evenly between two groups (50% control, 50% treatment), but the actual split ends up being 60% control and 40% treatment, an SRM has occurred. This discrepancy can arise for various reasons, such as counting discrepancies, sampling biases, or technical bugs in the experiment setup. 

Detecting and addressing SRM is crucial because it can lead to biased results, undermining the validity of the experiment’s conclusions.

 

How to check for Sample Ratio Mismatch

The calculation for SRM is a chi-squared significance test. If you’re already running manual calculations on your metrics to check for significance, you can use the same process to check for SRM, or you can use an online calculator to do the math for you.

Lukas Vermeer and GIGA Calculator both have easy-to-use SRM calculators.

When manually checking, the typical threshold to determine whether your campaign has SRM is when p < 0.01

Always check for SRM against users

Whether you’re using an online calculator or checking manually, you always want to check for SRM against the number of users in each variation, not visits. 

Users represent the unique visitors who have been assigned to and counted in your campaign. This is where SRM can indicate an underlying problem. The reason not to use visits is that the changes that you’re testing in your campaign can greatly affect the return rate of your users, so a higher or lower visit count could simply be an artifact of the different experiences of each variation.

This is why we always check for SRM against users first. Of course, you can check visits in addition to users if you want. For example, if you detect SRM in visits, but not in assigned users, you can conclude that there is nothing broken in your test configuration, but your campaign is influencing the return rate of your users.

When should I check for SRM?

Newly launched campaigns need some time for patterns to develop and swings to even out, so the best practice is to check for SRM a day or so after launching a new test, and continue to check periodically when the campaign is running. But don’t panic if you see SRM in the first day or so. Campaigns with low traffic might need a bit more time to even out. Additionally, updates to your website, marketing efforts, or other changes outside of the campaign can potentially introduce SRM while a campaign is active, so it’s a good idea to check periodically when the campaign is running.

 

 

Common causes of SRM and how to fix them

Counting

Audiences control which users are eligible for your campaign, and counting keeps track of whether the user saw any changes and only includes users in reports who saw the change. If your counting mechanism is biased to one group over the other, you’ll end up with a variation that either includes users who never saw a change, or is missing users who did see a change. This is more likely to be a problem if you have turned off counting in some changes or set up custom counting via metrics.

If you suspect counting to be the issue, you may need to redefine your counting criteria to ensure all of your variations receive an equal representation of users. For example, if your campaign is designed to add a new widget to your homepage, either at the top or the bottom of the page, your campaign should count all users who visit the homepage instead of only counting users who saw the widget.   

Bots

This problem can occasionally pop up when testing new platforms, technologies, or third-party apps. Sometimes these new technologies will introduce a bug that results in certain users getting inappropriately marked as bots, and/or preventing real bots from being flagged. Either one of these will make your results unreliable. You might be able to find a segment of users (returning users, logged in users, etc…) that is free from the bot discrepancy to help you assess results for a portion of your users.

Redirect Campaigns

No matter what tool you’re using to run your test, redirects can have similar problems as when testing new platforms, technologies, or third-party apps. This is especially problematic if you’re testing a common entry page with a high bounce rate. The slight increase in time to complete a redirect can cause a small portion of users to drop before the new page loads. You can minimize this by creating a manual control group so all of your variations include a redirect, even the control. And when using SiteSpect, try our URL Rewrite option, which changes the path on the request instead of the response, virtually eliminating the delay added by a standard redirect.

Changing Traffic Split

Your campaign can appear to have SRM if you change the traffic split between variation groups mid-campaign. This is not necessarily a problem, though doing so will result in an uneven distribution of new and returning users, affecting the results. If your goal is to ramp up traffic to a new experience, the best practice is to keep the split even between groups and adjust the campaign frequency when you want more or less traffic assigned to the variation.

Omnichannel Assignments

Omnichannel campaigns assign users to variation groups based on the user ID they get assigned on your site. A slight SRM when Omnichannel is enabled is generally not a problem, but a large difference can indicate a problem with the distribution of your user IDs. For example, if all unauthenticated users get an ID of “guest” or “null”, all of those users will be assigned to the same variation group. If you see this happening, your Omnichannel cookie settings can be configured to exclude the value your site uses for unknown users.

Email/Ad Testing

SRM occurring in a campaign using our Email/Ad testing feature is not always a problem, depending on how your campaign is configured. 

SRM can be a problem when your campaign is configured to include normal traffic in addition to the email/ad traffic. The non-email/ad traffic will be assigned randomly and evenly between variations, while the email/ad visits will be assigned based on which version of the email they clicked through. This can change the composition of user types in each variation, and you need equal representation in your variations to properly compare results between experiences.

The best practice when using the email/ad testing option is to include an audience to only allow users from one of the email (or ad) versions to be included in your campaign. The click-through rate might differ with each email version, causing a difference in total users assigned to each variation in your campaign, but the composition of each group will remain consistent. In this scenario, since one email version might cause a high click-through, but low conversion, you’ll likely want to look at total conversions in addition to conversion rate, for each group. 

Internal users

While not common, occasionally your own internal users can cause a sample ratio mismatch if they are actively trying to get assigned to one variation over another. Generally, internal users do not contribute enough traffic to affect results, though a campaign with low traffic and active internal users could see a small shift. Since internal users tend to behave very differently from your regular users, it’s best to exclude internal users from reports whenever possible.


 

    Analyzing Campaigns with Sample Ratio Mismatch

    Most causes of SRM result in an unequal distribution of user types, which invalidates your overall campaign results. Ideally, you would catch SRM early, identify and fix the reason, and start a new campaign with clean assignments. But if your campaign has already been running for a while, or you aren’t able to restart, you may still be able to learn and draw some conclusions.

    If you can pinpoint the reason for the SRM, and reliably segment out those users, then you can analyze your campaign using the visits from all other users. For example, let’s say you have a problem with the code in your new campaign breaking for some users on a specific browser, and that breaking code is causing the bot detection to fail. So long as this error is not happening on other browsers, you can segment out all visits to the problem browser and perform your analysis on the remaining users.

    Keep in mind that depending on the volume and typical behaviors of the users you are removing, there is always a chance that the remaining users will not fully represent your complete user base. But this is certainly a better alternative to throwing out the campaign data altogether, allowing you to gain insights to shape your next iteration.