Are Problems with Virtual Networks Rooted in Abstraction Shortfalls?

What we have here is a failure of abstraction.  Yes, a stylistic variant on the famous “failure to communicate” line in “Cool Hand Luke”, but true, and never more so than at this dawn of a new decade.  In 2020, we need to fix some of the problems we’ve fixed in the past, and in order to do that, we need to understand what was “wrong” so we can make it “right”.  A little communicating wouldn’t hurt, either.

I don’t think anyone would argue that transformation in networking has been, still is, and will be dependent on “virtualization”.  That’s a broad term that even Wikipedia tends to define in a circular way (their definition uses the word “virtual”), but in popular usage it means the creation of an element that represents another in usage.  Users of the element “see” the virtual element, and there is a mapping between virtual and real to complete the connection of user to reality.  The virtual element, then, is an abstraction.

In networking, we’ve had “virtual private networks” or VPNs, which are abstractions of a router-and-circuit private network of the kind that few today even remember, for years.  The VPN abstraction is mapped to shared infrastructure in a variety of ways, including RFC 2547, MPLS, SD-WAN, SDN, and a host of overlay-network technologies that tend to be called either “SDN” or “SD-WAN”.  In all of the cases, the goal is to create an abstraction of private-IP connectivity, then map it to suitable infrastructure.  In effect, virtual private networks virtualized a giant router, then mapped it to a network of things.

In computing, virtualization had, and still has, multiple roots.  Virtual machines based on hypervisors date back to the late ‘60s, and IBM’s commercial VM/CMS system was arguably the first widely deployed model.  Containers came along a bit over a decade later, starting as what was essentially a partitioning of the UNIX file system and evolving into something that partitioned operating system and system hardware resources differently from VMs.  In both VM and container evolution, the constant was that it was applications/components that ran in them.

The cloud has taken a bit of a different view, one that we’ll see is the root of our current dilemma.  The cloud presented a “service” view; “Infrastructure”, “Platform”, or “Software” as a Service.  Yes, there are often discrete virtual-server things in the cloud, but the properties of the cloud itself are what’s important.  The cloud is a higher level of abstraction, and it demonstrated that it’s possible to choose what the abstraction is in “virtualization” and that the abstraction may present features/capabilities that lower “real” resources don’t present, because the mapping of abstraction to reality can itself create features and capabilities.

We can choose what to abstract.  We can choose what properties our abstraction will have by defining its features as we like, then mapping those defined features to resource behavior.  Virtualization in this form becomes a way of taking complicated things that, as unary elements, don’t exist, and making them simple to manipulate and use.  The abstraction is more than the sum of the parts, more important than the actual resources.  The mapping is the real product of the abstraction.  This is the lesson of the cloud, a lesson that’s been difficult to learn.

The dawn of our current transformation initiatives in networking is arguably the early part of this decade, and two initiatives got a lot of attention.  One was Software Defined Networking (SDN), which in its formal manifestation was an abstraction of an entire network as a single virtual device, involving the mapping of the control/management plane to a central controller and the nodes to simple forwarding devices under control of that central element.  This is the ONF “OpenFlow” model.  The other was Network Functions Virtualization (NFV), which mapped the functions of network elements to hosted software instances.  Here, as the name suggests, it was “network functions” that were abstracted.  Both had a significant, and one could say “fatal” flaw.

The problem with SDN is that it tried to define abstraction/virtualization by defining new resources.  We needed OpenFlow switches, and we needed central controllers, neither of which really existed.  That meant that SDN had to succeed by what could be called “enclave deployments”, pieces of network here and there that could, for a variety of reasons, be fork-lifted to the new model.  However, SDN didn’t initially define exactly how even multiple SDN enclaves could be centrally controlled.  SDN still seems confined to enclaves today.

NFV’s problem was the vagueness of the term “network function”.  One obvious meaning would be the observable features of the network, things like “routing” or “discovery”.  Another obvious meaning would be the collection of functional elements already in place—routers, firewalls, CPE, whatever.  Ideally, we should have defined an NFV model that didn’t care what definition of “function” we used, but because visualizing the future network in terms of the evolution of current devices is far easier, we created an NFV model that virtualized devices.  That was too “low” a level of abstraction.

A good virtualization approach has two properties best defined as “negatives”.  A good virtualization model doesn’t constrain the scope of abstractions under it.  If some abstraction offers a strong benefit, then the model should allow for it without restriction.  Similarly, a good virtualization model doesn’t propagate low-level element assumptions into the abstraction.  Anything that can be mapped to the abstraction to suit its visible properties should be accepted, encouraged.

If we apply these two properties to SDN, we find both violated.  What SDN should have done was to abstract network forwarding in the most general sense, not IP forwarding alone.  It should not have assumed that all networks were necessarily IP networks, nor that the need to create OpenFlow compatibility in existing IP devices should constrain the forwarding features available.  IP SDN, in short, is a subset of “SDN” as it should be.

For NFV, we also have two violations, and there’s no better way to see that than to reflect on the current discussion on “CNFs” versus “VNFs”.  I should be able to compose a service based on any useful set of functions, whether they are hosted or embedded, containerized or in virtual machines, functional/stateless versus monolithic/stateful.  What I have instead is a focus on a standard mapping strategy, the MANO model, that almost codifies the notion of device virtualization and strongly influences, if not constrains, the specific things I can deploy and the way I deploy them.

If we’d taken one or both of these network-transformation approaches to the cloud, we wouldn’t have much cloud computing today.  Who, today, looks at the cloud as simply a resource pool?  Most cloud applications rely on web services presented by the cloud provider and integrated into applications.  We’re creating a totally composable set of abstract resources that we can then build into an application platform, and on which we can then host applications/components.  Composable computing, by creating a composable abstraction.

We have an indirect acknowledgement of the importance of totally unbounded abstraction, and an approach to implementing the concept, all in one handy package.  “Intent modeling” is a technique for describing an abstract system by its external properties (“intent”) and mapping those properties to an unspecified (and therefore totally flexible) implementation.  It creates a functional “black box” that can represent anything.  All you need to complete it is to provide a way to implement it other than a one-off approach, and then provide a way to link intent-modeled black boxes in some sort of model that defines both how that implementation is standardized, and how functions can be collected to create broader experiences.  This is where abstraction meets modeling.

Modeling can exist at multiple levels, which is actually good for the notion of unbounded abstraction.  If you are modeling a physical system, it’s important to represent the physical structure.  Such a model would be appropriate if you were building a network of virtual devices.  If you were modeling a functional system, you’d want the model to represent a hierarchy of composition; the “system” decomposes into “subsystems”, when further decompose until you reach a point where your model is describing something that directly maps to resources.

Unbounded abstraction kind of mandates unbounded modeling, which means we shouldn’t be too concerned about standardizing a modeling language.  That doesn’t mean we should assume that any language or approach fits all missions.  Obviously, a language that describes physical structure is limiting if what we’re trying to do is virtualize functions.  Similarly, there may be “functional modeling” that’s inefficient or even ineffective when you try to apply it to physical systems like virtual-device networks.  But I think functional modeling works way more often than physical modeling, and I’d point out that if you’re hosting virtual devices in a cloud, your virtual-device network isn’t really a physical system.

There may be an optimum modeling approach for unbounded abstraction, and at some level I tend to believe there is one.  At a more practical level, I think the key to unbounded abstraction is having a modeling approach, combined with having a “service factory” that’s responsible for using the model to corral the functional elements and manage their lifecycles.  The two go together, which is one good reason why we should hope that a single optimum approach can be found.

But not demand or even expect it.  What’s important about unbounded abstraction is to keep it free until we know whether “definitions” are “constraints” in disguise.  We’ve made that mistake before.  What’s needed instead is some deep-think on what a “black-box” network should really look like, rather than how it’s going to be implemented, because abstractions have to start at the top.  I’m hopeful that in 2020 we’ll get some answers to this most-critical question.