Black Friday, Cyber Monday, marketing campaigns, TV spots: exciting things are happening for e-commerce and marketplace teams! A crowd of customers is flooding the apps and buying goods and services.
In the meantime, in front of their computers, engineers are nervous that their infrastructure and software will gracefully handle these peaks of high activity. Often with campaigns, the bursts of traffic are unpredictable and aggressive: multiplying the requests-per-second (RPS) by a factor of ten. At any point in time, the infrastructure must be ready to scale instantly.
This is especially true for the payment processing part of the user journey. All the efforts for acquiring a customer and driving them through the funnel finally converge to that one ultimate question: “was the payment successfully authorized?”. There is no room for uncertainty and approximation at this stage. There is no room for failure after so much effort.
Scaling the infrastructure vertically or horizontally?
How can payment engineers manage a 20 times multiplier of traffic in a second without compromising the availability and accuracy of their systems?
A common answer to that question is to scale the infrastructure vertically. With modern cloud providers, like AWS or GCP, it is made quite easy to increase the infrastructure computing and networking capabilities. This is eventually a quick win for engineers when they are under time pressure to deliver immediate performance improvements. However, constantly over-scaling the infrastructure is very costly, so this should only happen episodically when there is an awareness that such a burst of traffic will happen, which is often not the case.
Additionally, despite the money that can be thrown at it, there are always hard limits, like instance sizes or network i/o, on how far an infrastructure can really scale vertically.
Opting for a distributed architecture and sharding data stores is always a good practice to scale horizontally and have costs that are proportional to the traffic. But even with an optimal design and the latest technologies, it is hard to bootstrap new instances fast enough when the traffic burst is so sudden and aggressive.
Pre-computing some operations is a good way of serving faster and cheaper responses to clients. Typically advanced caching strategies are your ally to stand bursts of traffic while the rest of the infrastructure scales in the background.
Unfortunately for payment engineers, almost everything they do is dynamic, depending on the context of the checkout: a user, cart items, and the payment method. So the surface of what can be cached in the payment processing space is generally limited to the payment method lookup: assuming the filtering rules aren’t too advanced.
Asynchronous authorization to support the scale
Another design mechanism commonly leveraged by engineers to achieve high availability is to become eventually consistent. Even if It is a good practice with virtues beyond scalability, not all use cases can afford that share of uncertainty. Typically the result of a payment authorization operation cannot be approximative as it holds many subsequent actions for merchants, such as delivering a service or a good. To make it worse, this operation generally involves a lot of upstream third parties (PSP, networks, acquirers, issuers, etc.), which increases the latency and the risk of failure.
A general good practice we encourage at Payrails is to make that payment authorization asynchronous. While the user experience is the same, in the background, clients are receiving an HTTP 202 acknowledgment that the request was valid and taken into consideration. Subsequently, clients either wait for a notification or long polls the payment entity to fetch the authorization status. During burst episodes, this asynchronous design is more graceful and scalable to manage than holding so many open connections and juggling with timeouts and retries.
Managing bursts is often about saving time while the infrastructure automatically scales up in the background. For that matter, buffering requests and, ultimately, rate-limiting are good allies. It is nowadays made easier to perform at a container level with service meshes like Istio.io or Linkerd.io.
It is important to note that this is affecting the user experience; since the buffering results in increased latency and the rate-limiting in a denial of service followed by a retry. If the infrastructure scales fast enough in the background, this inconvenience should resolve after a few seconds.
Ultimately, in order to avoid overwhelming retries during bursts, you can route a share of your traffic (e.g., the one you rate limit) to a failover journey that has a sub-optimal user experience but can still operate payments. Hosted payment pages and components are a great way to do that.
Payrails team understands the challenge of scaling the payment acceptance
For payment engineers to join their business partners in celebrating high-traffic marketing events, you must be obsessed with scalability and resilience at the early stage of your features, software architecture, and infrastructure design. There are no easy ways to manage bursts, but there are many options you can leverage by design. All these strategies we mentioned can be useful in different contexts or assembled as lines of defense to ensure payments are always available for your customers.
This is a long and steady journey, but by constantly pushing your limits and practicing with load tests and disaster recovery exercises as part of your software development lifecycle, you should be able to sleep well at night :)
Our team of payment experts have built an infrastructure that processes millions of payment every day and is happy to chat about your specific payment challenges and how we could help you to solve them.