Microsoft has decided to go its own way on service mesh, or at least to try. Given that Istio, a Google development but still open-source, is the de facto standard in service mesh, why would Microsoft make that decision? Would a service mesh battle help the cloud-native space? Is there a carrier cloud dimension to all of this? Interesting things to speculate about, so let’s speculate.
One thing that doesn’t seem to demand much speculating is that the announcement timing is related to Google’s decision not to contribute Istio to the Cloud Native Computing Foundation (CNCF). The fact that Microsoft will contribute its Open Service Mesh to the CNCF is the most obvious point made in the announcement. Its support of CNCF-member-supported Service Mesh Interface API (SMI) is the second-most-obvious point. The third? It’s a “lightweight” implementation of service mesh.
All of the service mesh technologies in general use are based on something like the Envoy proxy, which is a software sidecar, an agent that links to a microservice and provides the microservice with basic connectivity features. Microsoft proposed last year to make SMI a universal API, an open way of communicating with services (and proxies), and thus enhancing service portability among mesh implementations. It seems pretty clear that Microsoft hopes that the combination of a lightweight service mesh and an open API will get Open Service Mesh a lot of traction.
That raises our first truly interesting question, which is “Why?” What could it be about service mesh that matters so much? Microsoft adopted Kubernetes, another Google project, after all. The answer here, I believe, is that service mesh is the foundation of cloud-native, as much as Kubernetes is the foundation of container deployments. Kubernetes is heading toward being a de facto standard, and if the market is heading to cloud-native, you can’t differentiate your cloud-native approach by adopting the same tools as everyone else.
Despite the way the market has talked about the cloud and cloud-native, it’s always been really a development issue. The cloud has features that the typical data center doesn’t have. If those features are to be fully exploited, applications have to be written to take full advantage of them, which is what “cloud-native” means. The application was written to run in the cloud, not moved there from somewhere it was already running.
The central adaptation of components to cloud-native status is making them into microservices, which are stateless (in some way) features that can be spun up and run, taken down, moved, and so forth, as conditions demand. Having a bunch of application features floating out there in the ether isn’t exactly how most application developers think, and the tools to make this work start with an abstraction of connectivity, or virtual connection fabric. That’s what service meshes provide.
The reason this is important is that the cloud’s role in computing underwent a major transformation just last year, when enterprises recognized that a true “hybrid cloud” was a combination of their current data center and a cloud front-end component set that provided the user interface to browsers and mobile devices. This insight showed that you don’t “move something to the cloud”, but rather create something new in the cloud that mates with your current applications. That thinking is a stepping-stone to the realization that cloud-native applications would have to be truly different. What makes them different, as enterprises will eventually realize, starts with service meshes.
In July, I blogged about why Google might want to retain development direction control over Istio and other projects that it elected not to submit to the CNCF. My suggestion was that Google sees an architecture for cloud-native emerging, sees Istio as a key piece of that architecture, and wants to make sure Istio develops according to Google’s cloud-native vision. Their decision on governance control wouldn’t prevent others from exploiting Istio, but it would prevent having Istio go off-track, which implies Google has a track for it. Which implies it’s strategically important to Google, which implies it would be to Microsoft too.
The next interesting insight we can derive here relates to the “lightweight” property of Microsoft’s Open Service Mesh. Why would a lightweight version of service mesh be needed when there are at least three well-known full-feature ones (Istio, Linkerd, and Consul)? The answer is that all the current service meshes were designed with the full mission of service mesh in mind. Enterprises, having just figured out that there really was such a thing as “cloud-native development”, are hardly pushing the technical boundaries of service mesh thinking with their current projects. You don’t buy a backhoe when you need to plant a bush.
But you may need a backhoe eventually, of you’re going to dig enough holes. I think Open Service Mesh is a basic framework, not a complete competitor to things like Istio but a framework on which further feature development can be built. The CNCF, then, would be not only a way of rubbing Google’s nose in the you-know-what, but a way of encouraging open development of advanced service mesh features, based on the expanding experience and needs of enterprise cloud prospects and customers.
Whether this is going to work will obviously depend on two things. The first is the pace at which enterprises gain insight into the totality of cloud-native development techniques and their value propositions. The second is the pace at which open contributions to extend Open Service Mesh are offered and accepted. If the first horse in our race runs way ahead, then Open Service Mesh quickly becomes a shovel in a backhoe race. If the second outpaces the first, then Open Service Mesh can hope to address user needs as they develop, never confronting enterprises or enterprise software developers with more complexity than current applications demand.
If we rethink our horserace in terms of Google versus Microsoft, we could frame the race in terms of the pace of education versus the pace of development. Google needs to accelerate the full cloud-native understanding of the development community. They also need to ensure their own cloud can support the full range of Istio features, and do so with reasonable ease of use and cost. Microsoft needs to prime the pump on Open Service Mesh enhancement, seeding any reasonable project with resources and contributing an insight as to the way that the features should guide the expansion of service mesh overall.
Where, then does “carrier cloud” fit into all of this? Recent surveys have suggested that there are no new standards areas that operators feel are critical to them; they reject most by a 2:1 margin or better. Surveys also suggest that operators believe that OSS/BSS modernization is hopelessly stalled (I found the same attitudes back in 2013). Operator initiatives in software design (NFV, ONAP) don’t suggest a lot of experience with or understanding of cloud-native development. Thus, I think, it would be safe to say that the operators are in no danger of pushing the boundaries of a basic service mesh.
And yet…5G function hosting is clearly a mission that all the public cloud providers think they could make money on. What could we say is the cause of carrier cloud outsource interest, if not lack of understanding of carrier cloud by the carriers themselves? We therefore could have the classic situation of a demand driver whose “demander” can’t muster much sophistication in meeting the demand themselves. That could favor an Open Service Mesh model; simple things for simple folk.
Then there’s the mission of 5G Core hosting itself. What telco standards group or open-source activity has framed out a carrier-cloud-ready architecture? There’s no reason to think that the 3GPP did, or could do, that. We could use a service mesh in the 5G control plane for sure. We could (IMHO) adapt it to serve as an agile data plane too. The technical requirements of service mesh in 5G control-plane applications are fairly basic, though (again, IMHO) still beyond the very basic Open Service Mesh feature set. The adaptation of Open Service Mesh to the control-plane mission would be simple to accomplish, and adding data plane not much harder.
That’s if you know what you’re doing, of course. Given that the 3GPP stuff isn’t really framed for cloud-native, some architecting of the structure, preserving the standard interfaces, will be necessary. Microsoft may well be in the best position to do all this adapting and adding. They bought Metaswitch, who has more experience with open, software-based, IMS/EPC/5G implementation than pretty much anyone out there.
The process could give Microsoft an edge in 5G, but also an edge in the battle for the hearts and minds of evolving service-mesh users. 5G signaling is a great event-driven application, a good test for a service mesh and microservice implementation. As Microsoft/Metaswitch goes through it, Microsoft could learn a lot about what’s really needed in Open Service Mesh enhancement, and make sure their own cloud service is optimized to provide it.
The pressure, then, is likely on Google first, and Amazon and IBM as well. Google needs to make 5G an Istio application, with a strong framework and whatever specialization is required. They have perhaps the best microservice and service mesh people in the industry, but I’m not sure where they are with respect to 5G details. They’ll need those skills if they want to ride 5G and carrier cloud to a broader service mesh (and public cloud) victory.
Or they could try to jump ahead, of course. There’s still the possibility of educating the enterprise buyer, but there are a lot of enterprises to be educated, and only a dozen or so big network operators who collectively account for the majority of the 100,000 potential new carrier cloud data centers. It’s the opportunity to get operators to outsource those data centers, and then to build on that base, that seems the most direct path to success here. It will be interesting to see who takes it.