Two Rules to Make Microservices and Cloud-Native Work

Oversimplification is never a good thing, and sometimes it can be downright destructive.  One of those times is when we look at “distributed” application models, which include such things as serverless and microservices.  Classical wisdom says that distributing applications down to microservices is good.  Classical wisdom says that hosted service features are best when distributed across the carrier cloud.  There’s a lot of goodness in that classical wisdom, but there are also some major oversimplifications.  The good news is that there’s almost surely a path to resolving some or all of the problems of distributed applications, and that could be the most profound thing we could do to promote a new computing model.

Software used to be “monolithic”, meaning that there was one big software application that ran in one place and did one specific thing.  A bank might run its demand deposit accounting (DDA, meaning savings and checking) application, another application for lending, another for investing, and so forth.  Because core business applications tend to be focused on database activity, it actually makes sense to think about many of these applications in monolithic terms.

There are two problems with monolithic programs.  First, they tend to be enormous, meaning that there’s a lot of code involved.  Their massive size, and the fact that structural rules to make them more maintainable are often ignored during development, makes changing or fixing them difficult and even risky.  Some financial institutions I’ve worked with had core applications that they approached with real fear, and that’s in a highly regulated industry.  The second problem is that they are vulnerable.  If the program crashes, the system it’s running on fails, or the network connections to the site where the system is located in breaks, the application is down.

Componentization, meaning the division of programs into components based on some logical subdivision of the tasks the programs do, is one step in fixing this.  Componentized programs are easier to maintain because they’re smaller, and even if good structural practices aren’t followed within the components, their smaller size makes it easier to follow the code.

Where things get a bit more complicated is when we see that componentization has two dimensions.  For decades, it’s been possible to divide programs into autonomous procedures/functions, meaning smaller units of code that were called on when needed.  These units of code were all part of one program, meaning they were loaded as a unit and thus generated what looked like monolithic applications.  Inside, they were componentized.

The second dimension came along fairly quickly.  If we had separate components, say our procedures or functions, why not allow them to be separately hosted?  A distributed component model did just that, breaking programs up into components that could be linked via a network connection.  Things like remote procedure calls (RPC), the Service Oriented Architecture (SOA), the common object request broker architecture (CORBA), and recent things like RESTful interfaces and service mesh, are all models supporting the distribution of components.

When you have distributed components, you have to think about things differently.  In particular, you have to consider three questions related to that approach.  First, where do you put things so that they can be discovered?  Second, what defines the pattern of workflow among the components and how does that impact quality of experience?  Third, how do distributable-model benefits like resilience or scalability, impact the first two?  We need to look at all these.

The general term used to describe the “where” point is orchestration.  The industry is converging on the notion that software is containerized, and that an orchestrator called Kubernetes does the work of putting things where they need to be.  Kubernetes provides (or supports) both deployment and discovery, and it can also handle redeployment in case of failures, and even some scaling under load.  If something more complex has to be deployed, you can add a service mesh technology (like Istio, the by-far market leader in that space) to improve the load balancing and discovery processes.

A lot of people might say that service mesh tools are what determine distributed workflows, but that’s not really true.  A service mesh carries workflow, but what really determines it is the way that the application divided into components to start with, and how components expect to communicate among themselves.  This is the place where oversimplification can kill distributed application, or service, success.

Let’s say that we have a “transaction” that requires three “steps” to process.  One seemingly logical way to componentize the application that processes the transaction is to create a Step-1, -2 and -3 component.  Sometimes that might work; often it’s a bad idea, and here’s why.

If the transaction processing steps are all required for all transactions, then every distributed component will have to participate in order.  To start off, that means that the delay associated with getting the output of Step-1 to the input of Step-2, and so on, will accumulate.  The more components, the more steps, the more delay.

Where this step-wise componentization makes sense is if there’s a lot of stuff done in each step, so it takes time to complete it, and in particular if some (hopefully many) transactions don’t have to go all the way through the stream of steps.  This matches the cloud front-end model of applications today.  The user spends most of the time interacting with a cloud-front-end component set that is highly elastic and resilient, and when the interaction is done, the result is a single transaction that gets posted to the back-end process, usually in the data center.

One area of distributed components that doesn’t make sense is the “service chain” model NFV gave us.  If we have a bunch of components that are just aligned in sequence, with everything that goes in the front coming out the back, we’ve got something that should be implemented in one place only, with no network delay between components.  The single-load or monolithic VNV in NFV would also make sense for another reason; it’s actually more reliable.

Suppose our Step-1 through -3 were three VNFs in a service chain.  A failure of any of three components now breaks the service, and the MTBF of the service will be lower (the service will fail more often) than would be the case in a monolithic VNF.  For three steps, assuming the same MTBF for all the components, we’d have an MTBF a third of that base component MTBF.  In scalability planning for variable loads, consider that if everything has to go through all three steps in order, it’s going to be hard to scale one of them and have any impact on performance.

We can take this lesson to edge computing.  Latency is our enemy, right?  That’s why we’re looking to host a distributable component at the edge, after all.  However, if that edge component relies on a “deeper” connection to a successor component for processing (a chain that goes user>edge>metro, for example), we’ve not improved latency end to end.  Only if we host all the processing of that transaction or event at the edge do we reduce latency.

Complex workflows generate more latency and reduce overall MTBF, versus an optimized workflow designed to componentize only where the components created actually make sense in workflow terms.  Don’t separate things that are always together.  One of the most profound uses of simulation in cloud and network design is simulation of distributed component behavior in specific workflows.  This could tell you what’s going to actually improve things, and what’s likely to generate a massive, expensive, embarrassing, application or service failure.

You might conclude from this that not only is everything not a candidate for microservice implementation, most applications may not be.  Classical wisdom is wrong.  Edge hosting can increase latency.  Service chains reduce reliability and performance.  You have to plan all distributed applications, networks, and services using a new set of rules.  Then, you can make the best trade-off, and the nature of that tradeoff is dependent on something we typically don’t consider, and is more dynamic than we think.

The first rule of distributability, if there is such a thing, is that there exists a “work fabric” into which components are connected, and whose properties determine the optimal tradeoffs in distributability.  Workflows move through the work fabric, which in practice is vehicle through which inter-component connections are made.  The most important property of the work fabric is a delay gradient.  The further something moves in the fabric, the more difference in delay is created.

The minimum delay in the fabric is the delay associated with component calling, which is typically small.  Then comes the network binding delay, which consists of the time needed to discover where work needs to go and dispatch it on its way.  Finally, there’s network delay itself.  Since the delay gradient contribution of the first two delay elements is constant, the delay gradient is created by transit delay in the network.

That means that for distributability to be optimized, you need to minimize network delay.  You can’t speed up light or electrons, but you can reduce queuing and handling, and that should be considered mandatory if you want to build an efficient work fabric.

The second rule of distributability (same qualification!) is that an optimum distributed model will adapt to the workflows by moving its elements within the work fabric.  The best way to picture this is to say that workflows are attractive; if you have two components linked via a workflow, they attract each other through the workflow.  The optimum location for a component in a distributed work fabric is reached when the attracting forces of all the workflows it’s involved in have balanced each other.

This suggests that in service meshes or any other distributed structure, we should presume that over time the workflow properties of the application will combine with the work fabric to minimize collective latency.  We move stuff, in short.  Whenever a new instance of something is needed, spin it up where it better balances those workflow attractions.  Same with redeployment.  When you have to improve load-handling by scaling, you scale in a location that’s optimum for the workflows you expect to handle.

This has major implications on the optimization of load-balancing.  You can’t just spin something up where you have the space and hope for the best.  You have to spin up something where it can create a better (or at least the best of the not-better) total delays.  You always look to minimize delay gradient overall.

One minor implication of this is that every microservice/component model should work transparently if the components are attracted all the way into being co-hosted.  We should not demand separation of components.  If QoE demands, we should co-host them.  Any model of dynamic movement of components demands dynamic binding, and co-hosting means that we should be able to eliminate the network discovery and coupling processes in our work fabric if the components are running together.

What does this then do to “orchestration”?  If the microservices are going to rush around based on workflow attractions, is where they start even meaningful?  Might we start everything co-hosted and migrate outward if workflow attractions say that proximity of components doesn’t add enough to QoE to compensate for the risk of a facility failure taking everything out?  Do we need to need anything more than approximation?  I don’t think this changes the dynamic of deployment (Kubernetes) but it might well change how we try to optimize where we host things.

This isn’t the kind of thing you hear about.  We’re not looking at either of these rules today, in an organized way, and if we don’t then we end up creating a work fabric model that won’t deliver the best quality of experience.  A good model might come from the top, or it might (and likely will) come as an evolution to the service mesh.  I do think it will come, though, because we need it to get the most from distributed applications and services.