Airtel Digital re-architected its Xstream and Wynk Music applications and the support infrastructure underpinning them to improve its ability to offer new features.
Airtel Xstream is an over-the-top (OTT) platform for film and TV streaming, while Wynk Music is a music and podcast streaming app.
Speaking at the recent AWS Innovate event, Airtel Digital’s head of DevOps Hitesh Bhatia said that both applications “serve personalised content at an exabyte scale” to tens of millions of listeners and viewers.
However, Bhatia said that the company encountered “a certain set of challenges” as its streaming services grew, which, he said, “made us realise that our model was not scalable”.
While the applications were already built with a microservices-based architecture, the services “were tightly coupled”, which made changes difficult and impacted the company’s ability to deploy new features to customers in what is a highly-competitive market.
“Eventually, we were building distributed monoliths,” Bhatia said.
“Our services were not independent - they were tightly coupled. We often deployed multiple services at the same time to prevent failures.
“Due to this, feature rollouts became very difficult, and we often had failures cascading to other services.”
Bhatia said that test and development became “very difficult and time consuming”, and that a developer needed to have “knowledge of the full codebase” to make effective changes.
In addition, the microservices themselves “were becoming over-chatty” due to their tight coupling, meaning they were becoming expensive to run.
“All these factors slowed down feature rollouts, and time to market suffered heavily,” he said.
One of the first steps to resolving this was to apply a single responsibility principle to microservices, “where each microservice had one responsibility, and the responsibility to do it well.”
Bhatia said that teams were also restructured into pods - “a small group of focused, self-sufficient individuals”, with their work mapped to the way applications were designed.
But more than that, the company “re-architected, rebuilt and re-hosted the applications and our infrastructure stack”.
One of the key changes in this is that Airtel Digital “segregated out infrastructure for each microservice to remove any kind of dependency on the other service.”
Another change has seen the company shift its infrastructure stack more “towards Amazon Managed Services (AMS), like Application Load Balancer, ECS [Elastic Container Service], Kinesis and DynamoDB,” Bhatia said.
“[This] allowed us to build fast, focus more on business requirements rather than administrative tasks, and rebuild our observability stack,” he said.
The new observability stack is based on Prometheus, and includes other components such as Grafana and Telegraf.
A simple example of one of its uses is “to track application errors before and after deployment, to check the health of the last deployment,” Bhatia said.
Bhatia added that the result of all the work is better-designed microservices and an application architecture that can scale to meet growing streaming needs, since each component of the architecture can be scaled up independently as and when required.
He also said that fault isolation had improved, as had the company’s ability to limit the “impact radius of any failure”.
“Our feature rollout process and time-to-market improved as well,” he said.
“The infrastructure also complemented our microservices. Now we are able to respond better to variations in traffic, and [our] focus goes into building business use cases rather than administrative tasks.”