Best Practices For Reducing Infrastructure Costs at Scale
Picture this scenario: Your mobile game is gaining rapid popularity, with a growing player base every day. But here's the challenge—your cloud infrastructure costs are also climbing. In this article, we'll dive into practical strategies used by Metaplay to control these rising expenses. We'll explore our hybrid IaaS-PaaS approach and share valuable insights tailored to mobile game developers looking to trim their infrastructure costs as they scale up.
How do cloud costs change at scale?
Keeping in mind our discussion on the barebones stack, the costs there were based largely on fixed costs (hourly prices that you cannot fully get rid of). These are very typical when operating at a small scale, and while they take up a proportionally large part of the invoice at the beginning, in absolute terms they’re still relatively small.
When scaling up, however, other cost drivers emerge that eclipse these fixed costs. Let's dive into some of those:
Additional Cost #1: The Database Layer
We can consider for example the database layer. At small scale the instance costs are clearly the key driver of cost. It’s not unreasonable to say that in the beginning, instance costs effectively make up all the database costs.
However, as games start scaling, you will see a necessary increase in instance costs - either through scaling up database instance sizes, other costs from I/O pricing, or from backup/snapshot storage.
Additional Cost #2: Load Balancing
A similar pattern emerges for load balancing: at small scale the load balancers are essentially a fixed cost. However, at scale, that fixed hourly cost becomes minor compared to the costs from load balancer “capacity units”.
Capacity units are AWS’s way to aggregate the work done by a load balancer, capturing multiple dimensions ranging from throughput to connection counts. In some cases, we have also seen content distribution costs become quite high compared to the costs of the rest of the stack at scale - even reaching sizes of almost a quarter of total infrastructure costs.
Costs At Scale: It's Not All Bad
Having said this, there are some positive observations to remember:
1. In the greater scheme of things, these costs are often relatively minor.
2. Cloud and infrastructure costs can diminish at large scale.
3. Most of the rising costs discussed above can be solved through optimization:
Cold storage can be utilized
Alternative content distribution providers can be utilized
Financial planning can be improved via Reserved Instances or Savings Plans
Best Practices For Reducing Infrastructure Costs At Scale
#1: Prioritize Quality Code
Developing a game locally and running it for a small technical demo audience is wildly different than running a game that's played by hundreds of thousands of people every single day. Quick and sloppy prototyping can often be tolerated at the beginning of game projects.
And that's understandable: the benefits from spending extra time optimizing your code are small compared to being able to quickly demonstrate new game mechanics to test out retention numbers or pass funding gates. However, when scaling up a game, these types of shortcuts together with quality assurance issues can be one of the single biggest drivers for increasing costs.
While this area is not strictly speaking only in the domain of cloud infrastructure, it’s highly relevant to understand and consider when evolving and maturing a game production - and the organization that runs it. Keeping player models tight, code paths optimized, and memory management under control means less compute resources, and less costs as you grow.
#2 Leverage Empirical Testing For More Accurate Estimates
An excellent tool in understanding resource consumption at scale is carrying out realistic load testing against the game.
This is true for any backend project, and there are various tools available, but in the Metaplay ecosystem one that we particularly like is our bot client framework (read more about that in the Metaplay docs).
This allows you to easily build bots which mimic action patterns of real players, and we provide tooling to help run bot clients at large quantities.
Running bots, besides being heaps of fun, help us in two crucial ways:
1. Hunting down regressions before they hit production.
2. Understanding the impact of new features and functionality on resource consumptions.
For cost estimation, running load tests are easily the best way of getting meaningful cost estimates while taking into account the nature of the game. These load tests are also convenient for capacity planning ahead of time.
Scaling up mobile game infrastructure leads to changing cost dynamics.
As your game grows, fixed costs are eclipsed by additional database layer and load balancing costs.
Despite challenges, scaling offers opportunities for cost reduction.
Prioritize quality code to minimize resource consumption and expenses.
Utilize empirical testing, such as load tests, for accurate cost estimates and capacity planning.