Modern backend systems have reached a level of complexity that organizations struggle to wield. Many find themselves in a constant cycle of hiring, reorganization and reprioritization in order to better align themselves with that complexity, only to find familiar problems occurring again. Perversely, this constant organizational flux further contributes to the complexity of their backend system, creating a negative feedback loop.
Simplicity won’t be found at the end of the infrastructure-platform journey, instead it reaches an incremental yet cyclical steady-state of streamlined complexity. Breaking the cycle requires a paradigm shift in how cloud development is approached, something we’ve been working on with Klotho and InfraCopilot for the past several years. Regardless of where you are in the cycle of coping with backend complexity, we can help you step outside of the loop.
My name is Ala Shiban, I’m the co-founder of Klotho, building the Klotho Platform and InfraCopilot. I led the Cloud Platform organization at Riot Games, a centralized technology team which enables hundreds of engineers to ship the genre defining League of Legends and Valorant live services for over 200 million users around the globe.
How we got here
Small startups have small teams of developers balancing all the development responsibilities from coding to basic CI/CD setups. With the scaling of the company comes more services to connect and deploy, leading to extra burden that draws developers away from their product focus, making way for hiring the first dedicated cloud ‘devops’ engineer.
Once the company moves up to the Series A stage, they’re tasked to handle the complexities of operationalizing services across many repositories, usually laid out in a microservice architecture. Their efforts clear the way for the developers to turn their focus back onto the product.
Yet, as the startup’s journey continues, dedicated product teams are spun up to capture more of the opportunity the startup creates. This results in more services, which means more cloud engineering work than one engineer can handle. A shared cloud engineering team is formed, focusing on creating reusable tools and templates.
Despite their expertise, the shared team’s capacity has a cap, and supporting everyone’s needs becomes impossible, and with limited organizational tools, prioritizing across varying product groups is equally as hard. This forces product teams to come up with their own infrastructure stop gaps, moving away from the shared team’s offerings.
When the divergence becomes large enough and the benefits of centralizing the cloud engineers becomes more evident, a reorganization is brought about to put more directional thinking and more rigorous prioritization.
That starts the shift towards platform engineering and the beginning of the self-reinforcing cycle.
Step 1 – Split out Infrastructure
This is the common entry point for organizations as they scale, where cloud engineering work has grown to sufficient size that it becomes worth separating into its own entity, tasked with creating software to facilitate product teams delivery. This often is accomplished through consolidating the previous “everything goes” set of technologies onto a standardized set of choices, allowing for economies of scale in creating an ecosystem around them.
Step 2 – Grassroots Platform
However, to fully gel into a cohesive ecosystem sensitive to the company’s particular needs, there needs to be some software engineering applied. Often this is noticed first by team members within the infrastructure organization who have a software engineering background, and with good intent and initiative, take it upon themselves to start creating that software. This early work proves very valuable to product teams, and desire for the fully realized version to exist rises quickly.
Step 3 – Formal Platform
Responding to that desire, the Infrastructure team leadership takes on the scope of providing a cohesive platform for their infrastructure. There’s a vision of an easy to use self-service interface to the infrastructure that’s aligned tightly with the organization’s needs and a high-powered FAANG-like hire brought in to realize it. These types of systems take a tremendous amount of work, and the new platform team created to make it happen quickly balloons and priorities start to get strained between what the platform needs and what infrastructure needs.
Step 4 – Infrastructure-Platform MegaOrg
If everything goes well and the vision is executed successfully, someone still needs to maintain the software as its underlying technology and the company’s needs naturally shift, so the large team created to stand the new platform up is calcified into its own organization, treating the platform it’s responsible for as a product unto itself, further distancing it from the infrastructure organization that it came from. As platform teams enter a more operational and iterative refinement phase, they can suffer from many problems, ranging from bored engineers looking to reinvent the wheel to bleeding out senior talent to projects that are building something new.
All the while, the company continues to adapt and its engineering practices evolve and expertise in backend development increases. Each product group’s ambition grows, and some commonalities that brought all parties into a shared platform solution in the first place stop being quite so common. With the underlying technologies now being mostly standardized across the product groups, and its maturing tooling ecosystem, perhaps it would be better if we de-scoped and went back to just handling the infrastructure…
The industry is now streamlining the complexity of cloud computing, which is long overdue but insufficient. As growing utilization of cloud infrastructures increases complexity even further, it will continue to drive the need for more complex organizations to streamline the technical complexity. The desire for simplicity as we journey from startups to infrastructure-platform orgs isn’t met at the journey’s end; instead, it reaches an incremental yet cyclical steady-state of streamlined complexity that never fully succeeds in taming it.
Simplicity isn’t found in the infrastructure-platform engineering reorganization loop. To find it, we need a paradigm shift. That paradigm shift is beginning, and it’s ready for you to take part. Learn more about it by visiting Klotho and InfraCopilot.