Featured article
Whether you are a junior, just getting into ad monetization, or you’re an experienced senior who thinks they’ve seen it all, the fact is - we all get surprised every now and then when making seemingly minor changes to our setup. If you’re lucky, a change that you didn’t expect much from could bring a nice, unexpected uplift, while in other cases, you might be in for an unpleasant surprise.
In this month’s newsletter, we compiled a few examples of tests that we ran on the mediation and networks side of the ad monetization coin. There’s more, of course, but we hand-picked these because they happened on more than one occasion, different apps, genres, ad formats, meditation platforms, ad networks, countries, etc. so they are by no means some weird outliers.
We’ll start from the one that questions the testing methodology itself.
1. How Meaningful are the Mediation Tests We Run?
Back in 2023, we decided to “test the test”, or to be more precise, the reliability of the testing suite that is offered to us by the mediation platforms we are using. To do this, we decided to run an “empty test”, meaning we would start an AB test with no changes in the experimental group. So:
- Group A (Control) - No changes performed
- Group B (Experimental) - No changes performed
The results we got back then were somewhat discouraging. The difference between the groups was up to 3.78%, which begs the question: If we perform some optimizations and the uplift is less than 4%, does it even make sense to interpret it as an improvement and consequently to promote the experimental setup to all users?
It’s been more than two years, so we were hoping that the situation would improve. To test our hopes, we started a series of empty tests again. The tests were run on two mediation platforms that, according to our research from May 2025, hold close to 80% of the market - MAX by Applovin and LevelPlay by Unity.
You might be amused (or not!) to find out that our hopes were shattered. You can see the results from a dozen different apps below (sorted in decreasing order, by app scale). We’d love to hear from you if you decide to run the test yourself! What kind of results did you get?
2. Can You Even Run a Banner Refresh Rate AB Test Reliably?
This one may be the most problematic one we have in our books. Banners can be a significant source of revenue, especially in hyper-casual games. It’s one of the best practices to test your refresh rate and in the past year, many publishers have replaced the 10-second refresh rate with an even shorter one - 5 seconds. We also tested this strategy and though sometimes it gave good results, sometimes the only outcome was a headache from the results we got.
Take a look at this one. The control group was set to 10 seconds refresh rate. The experimental group is set to 5 seconds. When we were concluding the test, we were thrilled with the results: a massive, +38.9% increase in ARPDAU. We happily promote the 5-second setup to all users.
And we lived happily ever after? Not. After promoting the test group to all users, the performance is not nearly where it should be and we soon realize that the actual uplift (comparing the performance before the test and the performance after the test) is only +8% (quite a difference!). Digging into this further, we learned that, for some reason, performance in the control group decreased, inflating the increased result that the mediation was reporting, due to the lower baseline it was measuring the control group against. Again, we had this kind of result happen more than a few times so now, when we have an opportunity, we run a Firebase (instead of a mediation) test or we simply take a chance and measure the results before vs. after.
3. How Can a Bidder Decrease Your Ad ARPDAU?
Switching from waterfalls to bidding has resolved all headaches for every Ad Monetization Manager out here. There’s no more manual optimization, our eCPM and ad ARPDAU are through the roof, and it’s never been easier to run UA, because LTV has gone up so much. And who needs control over their own inventory, right?
Well, this probably sums up well the gaslighting narrative, ad networks, and the mediators alike have been preaching for the past few years.
We do want to acknowledge the fact that there’s far less manual work nowadays on the ad-serving part of the business, but this came with (almost) all control being taken away from the publishers. Bidding was supposed to bring efficiency and remove latency. The auction should run under the following assumptions:
- It runs at the same time for all ad networks, so no matter whether you have 5 or 10 ad networks, there shouldn’t be any difference when it comes to latency.
- The auction is fair, and the bidder that responded with the highest bid will always serve the ad, unless there was a technical aspect that prevented the winner from serving the ad. And if you’re working with a market-leading mediation, that should rather be an exception, right?
So, having these two assumptions in mind, we are always surprised to see negative results of an AB test of adding a new bidding network to our setup. Bidders shouldn’t add latency, they should win only if they had the highest bid and the general logic is that they should improve the competition. So how come that our eCPMs go down in the test group?
In particular, we’ve had numerous negative results where the new bidder was generating a significant share of revenue and impressions (>5%) but somehow the test group (containing the new bidder) would have worse results (lower ad ARPDAU) than the control group. Here are a few examples of those tests:
Looking into the results further, we always see that the eCPM in the test group is lower than in the control, which led to the ad ARPDAU decrease in the first place.
We’ve brought up this question multiple times to various mediation and network representatives, and we've never really gotten a reply that made sense to us. If you, our dear reader, have one to offer, please feel free to let us know your hypotheses.
4. How Can a Network That Doesn’t Work Be So Important for Competitiveness?
It’s no secret that Meta Audience Network’s performance deteriorated after Apple depreciated IDFA. For example, Meta’s median share of wallet across the portfolio of GameBiz publishers is 1.4%, average 2.6% and it ranges from 0.23% to 6.9% (the higher share of wallet usually in environments with less competition).
When the share of wallet is too low (say, below 1%), it’s not uncommon for us to AB test removing the network, since it’s not bringing much benefit anyway. However, with Meta specifically, we’ve seen more than one time, that removing it, brings a disproportionate negative impact on ad ARPDAU in the test group. It’s almost as if the sole existence of Meta in the auction has a positive impact on the rest of the bidders.
This was true not only for Meta as a bidder, but also for instances of some other ad networks. Even though in large part, waterfall instances are a thing of the past, you still may be using them, especially in banner waterfalls, using AdMob or GAM instances, Applovin on LevelPlay, COPPA traffic via AdMob mediation, etc.
In case you decide to remove low performing instances via test, you might be in for a surprise. Here are only a couple of examples from our archive.
In this first example, we are looking at an iOS app, Interstitial waterfall in the United States. In total, we removed 5 out of 47 instances, with a total share of wallet of 0.69%. This somehow lead to a decrease in ad ARPDAU of -10.7%.
One more. iOS app, banner waterfall in the United States. Removing a single instance which had 1.6% share of wallet (out of 14) resulted in -5.6% decrease in ARPDAU.
5. How Can a Waterfall Instance That Doesn’t Exist Generate Impressions?
I’m sure we’ve all been there. Working with dozens or hundreds of instances during a single day used to be part of our everyday work. Even now, where most of the revenue is coming from bidders, if you have a lot of banner traffic and you are using Google AdManager (GAM) partners, you might have days where you are working with a lot of instances and IDs that you need to connect correctly in your mediation stack. And mistakes happen, right? Just one letter or a number and the instance ID won’t be correct.
The surprising part is when you notice that mistake and realize that the instance has generated hundreds of thousands of impressions. Neither mediation nor GAM network were able to explain how this is possible. What was particularly troubling was that, when we decided to test what would happen if we completely made up the GAM placement ID (and confirmed with GAM partner this placement doesn’t exist in their system), mediation was still reporting impressions. This meant that mediation was not assigning the ad opportunity to a network that would actually be able to serve the ad and hence, we had an opportunity cost at hand.
To this date, neither the mediator nor the GAM network in question were able to explain this phenomenon.
What did you think about the experiments we ran above? Did any of these surprise you? Have you had similar experiences in your games? Any other mediation and network experiments that you ran that left you feeling confused? We’d love to hear from you!