How Bad is the “Cloud-Native” Problem for Operators?

Light Reading sets a lot of the dialog in the industry, and so when they raise a topic it’s important to me, to my clients, and to those who read my blog.  An example is this article, citing a discussion at the TMF’s Digital Transformation World event in Nice.  The piece recounts operator frustration with the “cloud native” claim and concepts.  This mirrors the complaints I’ve heard myself, and I want to cover those complaints here.

The article’s main thrust is introduced with comments from Telecom Italia, including “We firmly need vendors to step up in order for us to be able to give to our friends what they need”, referring to the needs of business customers.  According to the operator’s spokesperson, the specific issue is in cloud-native technology and its implications.  “We are talking about adding software and core at the edge and we need to have orchestration. We have plenty of orchestration but not yet the right one and that is something vendors need to work on. We are talking about cloud-native and guess what? Vendors are not yet delivering cloud-native software for us. Time is running out.”

I agree with the sentiment expressed here; as I said I’ve heard the complaints myself.  I also agree that vendors haven’t exactly been forthcoming with their cloud-native products, not exactly truthful in their claims, and not at all responsive in meeting operator needs.  So, is this time for another vendor indictment?  Only in part, because part of the problem is in the notion of “cloud-native” itself, and part is also with the operators.

Cloud-native means “designed explicitly to take advantage of the features of cloud computing”.  If you think about that for a minute, you realize that the whole concept flies in the face of much of what we used to think was essential in cloud services.  Current software is not cloud-native.  If you port it to the cloud, the result will be not-cloud-native running in the cloud.  If you had cloud-native software now, you’d already be running it on the cloud.  That means cloud-native is about developing software explicitly for the cloud.

Software runs on a platform that consists of hardware (the server), an operating system, and “middleware” that provides access to incremental specialized features.  If you ask a software developer to write software, the logical first question is “what does it run on?” because you’d need to know the interfaces, the APIs, that were exposed by the platform and thus available to the software.  Let’s ask this obvious question; what is the API set that defines “cloud-native”?  Answer: We don’t know.  It’s not that we don’t have any; we have too many.  The public cloud providers all offer the cloud equivalent of “middleware” in their web services.  Think of writing software that had to run on any operating system, any set of middleware tools, and what would you end up with?  Probably the classic one-line “Hello World” program, because doing anything else would make you platform-specific.

Carrier cloud is a kind of nascent public cloud, because we don’t have it yet but most say they want it and most also have specific things they think it will do.  Making it do those things in a “cloud-native” way has to start with a definition of what the carrier cloud platform would look like to a programmer.  Once that’s been covered, it would be possible to define how applications for the carrier cloud would look, and from that we could launch development projects.

Who defines the platform?  If the answer is “vendors”, then we’ll probably have at least a couple dozen platforms, not one.  That means that there will be no “carrier cloud” but rather a bunch of incompatible versions of one, which reduces the chances of anyone developing much for any of them.  If the answer is “operators”, then we also have a couple dozen platform choices, but this time they’ll all be suppositional and described in terms of what they should do, not what they consist of.  You can’t write any code for that mess.  If you say “standards groups” you’ll wait five years to see if any result emerges, and if it’s useful.  If “open source” is the answer, you’d need to identify the open-source elements of the platform, which then gets you back to the same list of options to do the defining.

There probably is no right answer to how to get to a carrier cloud platform, which is the biggest reason why we don’t have cloud-native technology of the kind Telecom Italia wants.  There is, however, a best answer.

The cloud computing community has been feverishly expanding the scope of tools that contribute to cloud-native functionality.  There are often multiple solutions to the same problem, but because the tools build on each other to create ecosystems, we already have a fairly limited number of ecosystems with fairly standard elements.  We know cloud-native is about containers.  We know Kubernetes is the way orchestration will be done.  We know that there will be a service mesh employed to connect and load-balance elements of our applications.  We know how an application looks, how it connects.  We know, in short, quite a bit.

What don’t we know?  First, we don’t know all the choices for the tools we’d use—service mesh, monitoring and management.  We don’t know how orchestration extends to control state for stateless microservices or serverless functions, but we have some implementations.  What we know the least about is how some of the specific issues network operators face, like data-plane traffic among high-speed software instances of device functionality flow in an application mesh.

Most of all, we don’t know “functional orchestration”.  Yes, orchestration has been exploding as a topic in the cloud space, for deployments in containers, and even in a limited sense in NFV, but this is what we could call “structural orchestration”, the creating of the application or service structure.  In a true cloud-native system, with elements that are stateless, you need to orchestrate the event flows through the application or service in order to get things that don’t know anything about context (microservices) to behave systematically.

Ironically, the very TMF who hosted the meeting Light Reading was covering had the answer to this ages ago, with their NGOSS Contract work that proposed a data model mediate the event-to-process relationships and set context for stateless processes.  Without this approach, it’s very difficult to build a cloud-native version of any application or a cloud-native implementation of any network service.

The point here is that we don’t have a huge problem with our carrier cloud architecture, except perhaps in the political sense.  That political problem was generated because we started out by trying to define how to build carrier cloud applications with no framework in mind, and so we cobbled together stuff.  That stuff isn’t cloud-native because the framework we seized on isn’t cloud-native and never will be.  The cobbling created fragmentation, which is one of the points the article makes about the current cloud-native situation.  Nothing fits with everything, because there was no everything-framework to start with.

We have to accept that all that early stuff was done wrong, and we have to start working on doing it right.