I think most people would agree that there is a fusion between cloud computing, virtual networking, and NFV. I think it’s deeper than that. The future of both network infrastructure and computing depends on our harmonizing all that stuff into a single trend. Even Google, who is probably as far along as anyone on this point, would admit we’re not there yet, but Google seems to know the way and they’re heading in that direction. I had an opportunity to dig deep into Google’s approach late last week. We all need to be looking carefully at their ideas.
The cloud is exploding, taking over the world, so you’d think from reading the story. Wrong. Right now, we’re in the bearskins-and-stone-knives phase of cloud computing and we’re still knuckle-dragging in the NFV space. The reason, arguably, is that we’ve conceptualized both these concepts from a logical but suboptimal starting point.
Cloud computing for businesses today is mostly about hosted server consolidation, meaning it focuses on taking an application and running it in a virtual machine. Yes, that’s a positive step. Yes, anything that’s going to make use of pooled server resources is going to eventually decompose to deploying something on a VM or into a container. But what makes an ecosystem is the network relationship of the elements, and the classic cloud vision doesn’t really stress that. We already see issues arising as we look to deploy systems of components that cooperate to create an application. We can expect worse when we start to think about having applications actually exploit the unique capabilities of the cloud.
NFV today is about virtualizing network functions. Since the presumption is that the functions run on some form of commercial off-the-shelf server, it’s true that virtualizing the functions so that can be done is an essential step. But if we shed the boundaries of devices to become virtual, shouldn’t we think about the structure of services in a deeper way than assuming we succeed by just substituting networks of instances for networks of boxes?
Virtualization is about creating an abstract asset to represent a system of resources. An application sees a virtual server rather than a distributed, connected, pool of servers. The application also sees an abstract super-box instead of a whole set of nodes and trunks and protocol layers. What’s inside the box, how the mapping occurs, is where the magic comes in. My point is that if we constrain virtualization to describe the same stuff that we had without it, have we really made progress?
Paraphrasing what a Google pundit said at an ONF conference, network virtualization enables cloud benefits. You can’t build the network the old way if you want the cloud to be a new paradigm. Whatever is inside the compute or virtual-network abstraction, networking connects it and makes an ecosystem out of piece parts. In short, we have to translate our cloud goals to cloud reality from a network-first perspective.
Well, what are the goals? Google’s experience has shown that the keys to the cloud, in a technical sense, are to reduce latency to a minimum, and in a related sense to recognize that it’s easier to move applications to data than the other way around. Service value is diluted if latency is high, of course. One way to reduce it is to improve network connectivity within the cloud. Another is to send processes to where they’re needed with the assurance that you can run them there and connect them into the application ecosystem they support. Agile, dynamic, composition. I think that’s also a fair statement of NFV’s technical requirements.
If this is the cloud mantra, then network-centricity isn’t just a logical approach, it’s the inevitable consequence. Google is increasingly framing its cloud around Andromeda, its network architecture. Even OpenStack, which arguably started by building networks to connect compute instances, seems to be gradually moving toward building networks and adding VMs to them. If you sum all the developments, they point to getting to network virtualization using five corollary principles or requirements:
- You have to have a lot of places near the edge to host processes, rather than hosting in a small number (one, in the limiting case) of centralized complexes. Google’s map showing its process hosting sites looks like a map showing the major global population centers.
- You have to build applications explicitly to support this sort of process-to-need migration. It’s surprising how many application architectures today harken back to the days of the mainframe computer and even (gasp!) punched cards. Google has been evolving software development to create more inherent application/component agility.
- Process centers have to be networked so well that latency among them is minimal. The real service of the network of the future is process hosting, and it will look a lot more like data center interconnect (DCI) than what we think of today. This is handled with an SDN core in Google’s network, and they use hosted-BGP technology that looks a lot like virtual-router-control-plane at the edge of the core.
- The “service network” has to be entirely virtual, and entirely buffered from the physical network. You don’t partition address spaces as much as provide truly independent networks that can use whatever address space they like. But some process elements have to be members of multiple address spaces, and address-to-process assignment has to be intrinsically capable of load-balancing. This is what Google does with Andromeda.
- If service or “network” functions are to be optimal, they need to be built using a “design pattern” or set of development rules and APIs so that they’re consistent in their structure and relate to the service ecosystem in a common way. Andromeda defines this kind of structure too, harnessing not only hosted functions but in-line packet processors with function agility.
A cloud built using these principles would really correspond more to a PaaS than to simple IaaS, and the reason is that the goal is to create a new kind of application and not to run old ones. What I think sets Google apart from other cloud proponents is their dedication to that vision, a dedication that’s led them to structure their own business around their own cloud architecture. Amazon may lead in cloud services to others, but Google leads in the mass of cloud resources deployed and the service consumption their cloud supports.
So let’s say that this is the “ideal cloud.” You can see that the biggest challenge that a prospective cloud (or NFV) provider would face is the need to pre-load the investment to secure the structure that could support this cloud model. You can’t evolve into this model from a single data center because you violate most of the rules with your first deployment and you don’t even validate the business or technology goals you’ve set.
You can probably see now why I’ve said that the network operators could be natural winners in the cloud race. They have edge real estate to exploit, real estate others would have to acquire at significant cost. They also have a number of applications that could help jump-start the ROI from a large-scale edge deployment—most notably the mobile/5G stuff, but also CDN. Remember, CDN is one of the biggest pieces in Google’s service infrastructure.
Some of the operators know the truth here. I had a conversation perhaps three years ago with a big US Tier One, and asked where they believed they would deploy NFV data centers. The response was “Every place we have real estate!” Interestingly, that same person, still at the same Tier One, is doubtful today that they could realize even 20% of that early goal.
Some vendors, reading this, will shout with delight and say that I’m only now coming around to their view. Operators, they say, have resisted doing the right thing with NFV and they need to toss cultural constraints aside and spend like sailors. Of course, incumbents in current infrastructure sectors have been saying operators needed to toss constraints and reinvest in legacy infrastructure to carry the growing traffic load, profit notwithstanding. I’m not saying any of this. What I’m saying is that the value of the cloud was realized by an OTT (Google) by framing a holistic model of services and infrastructure, validating it, and deploying it. The same thing would have to happen for anyone trying to succeed in the cloud, or in a cloud-centric application like NFV.
The reason we’re not doing what’s needed is often said to be delays in the standards, but that’s not true in my view. The problem is that we don’t have the same goal Google had, that service/infrastructure ecosystem. We’re focused on IaaS as a business cloud service, and for NFV on virtual versions of the same old boxes connected in the same old way. As I said, piece work in either area won’t build the cloud of the future, carrier cloud or any cloud.
AT&T and Orange, who have announced partnering on NFV to speed things up, are probably going to end up defining their own approach—AT&T has already released its ECOMP plan. That approach is either going to end up converging on the Google model, or it will put AT&T and those who follow ECOMP in a box. So why not just learn from Google? You can get all the detail you’d need just by viewing YouTube presentations from Google at ONF events.
This same structure would be the ideal model for IoT, as I’ve said before. Sending processes to data seems like the prescription for IoT handling, in fact. I believe in IoT based on the correct model, which is a model based on big-data collection, analytics, and agile processes. Google’s clearly got all those things now, and if operators don’t want to be relegated to the plumbers of the IoT skyscraper, they’ll need to start thinking about replicating the approach.