NFV: The History of Wrong and Right

We may need a summary of my cloud centric view of network infrastructure.  I blog four days of every week and I don’t want to repeat myself, but some who don’t follow me regularly may have a hard time assembling the complex picture from multiple blogs.  It is a complex picture too, one that can’t be turned into a sexy 500-word piece.  Thus, this isn’t going to be one.  In fact, it will probably take 500 words just to set the stage.

The traditional view of networking divides software and hardware.  Devices like routers, switches, optical transport devices, CPE, and so forth are the hardware side, and network management and operations management tools (the classic NMS, OSS, and BSS) are the software.  Operators have been complaining about both these areas for at least a decade.

On the software side, most operators have said that OSS/BSS systems are monolithic, meaning they’re big centralized applications that don’t scale well and are difficult to customize and change.  They believe that network management is too operations-center focused, making it dependent on expensive and error-prone human processes.  On the hardware side, they believe vendors have been gouging them on pricing and trying to lock them in to a vendor’s specific products, which then exacerbates the gouging.

On a more subtle level, experts on traffic handling (especially Google) have believed that the traditional model of IP network response to topology changes and congestion, which is adaptive route management supported via local status exchanges, makes traffic management difficult and leads to underutilization in some areas and congestion in others.

The traditional view of computing is that software frames the applications that deliver value, and hardware hosts the software.  Virtualization in the computing and IT world focuses on creating “virtual hardware”, using things like virtual-machine (VM) or container technology.  This allows software to be assigned to virtual hardware drawn from a pool of resources, eliminating the 1:1 relationship between software and host that’s been traditional.

Network operators use software and computing in their operations systems, and the expertise operators have with software is found in the CIO organization, which is responsible for the OSS/BSS stuff.  The “network” side, under the COO, is responsible for running the networks, and the “science and technology” people under the CTO are responsible for network technology and service evolution planning.  The way these groups operate has been defined for a generation, and the fact that network hardware is a massive investment has focused the CTO/COO groups on creating standards to ensure interoperability among devices, reducing that lock-in and gouging risk.

In the first five years or so of the new millennium, John Reilly of the TMF came up with what should have been the defining notion in network transformation, the “NGOSS Contract”.  John’s idea was to use a data model (the TMF’s SID) to define how events (changes in network or service conditions) were coupled to the appropriate processes.  The original diagram (remember this was in the opening decade of the 21st century) postulated using the SOA standards, but the principle would work just the same with modern APIs.

The TMF embarked on another initiative in roughly 2008, called the Service Delivery Framework (SDF) for composing services from multiple functional components.  I was a member of the SDF group, and toward the end of the work, five Tier One operators who were also in the TMF asked me to do a proof-of-concept execution of SDF principles.  This resulted in my original ExperiaSphere project, which used XML to define the service contract and Java to implement the processes.  Some of the concepts and terminology of ExperiaSphere were incorporated into TMF SDF.

In roughly 2010, we started to see various efforts to virtualize networking.  Software-defined networks (SDN) proposed to replace expensive proprietary hardware with cheaper “white-box” forwarding devices, controlled by a central hosted software element via a new protocol, OpenFlow.  The most important feature of SDN was that it “virtualized” an entire IP network, not the devices themselves.  This approach is symbiotic with Google’s SDN deployment, which creates a kind of virtual giant BGP router with BGP at the edge and SDN inside.

NFV came along at the end of 2012, maturing into the Industry Specification Group (ISG) in 2013.  The goal of NFV was to replace “appliances”, meaning largely edge CPE boxes, with cloud-hosted instances.  NFV was thus the first network-side initiative to adopt a “software” model, combining software (virtual network functions, or VNFs), virtualization (VMs and OpenStack), and hardware resource pools (NFV Infrastructure or NFVi).

NFV had a goal of virtualizing devices, not services, and further was focused not on “infrastructure” devices shared among services and customers, but rather on devices that were more customer-specific.  While there were use cases in the NFV ISG that included 5G or mobile services, the body never really addressed the difference between a per-customer-per-service element and an element used collectively.  While they adopted the early cloud VM strategy of OpenStack, they ignored the established DevOps tools in favor of defining their own management and orchestration (MANO).  All this tuned NFV to be a slave to OSS/BSS/NMS practices, which encouraged it to follow the same largely monolithic application model.

While all this was going on, the cloud community was exploding with hosted features designed to create cloud-specific applications, the precursor to what we’d now call “cloud-native”.  VMs and infrastructure as a service (IaaS) was giving way to containers (Docker) and container orchestration (Kubernetes), and microservices and serverless computing added a lot of elasticity and dynamism to the cloud picture.  By 2017 it was fairly clear that containers were going to define the baseline strategy for applications, and by 2018 that Kubernetes would define orchestration for containers, thus making it the central tool in application deployment.

That NFV was taking the wrong tack was raised from the very first US meeting in 2013.  The TMF’s GB932 NGOSS Contract approach postulated a model-driven, event-coupled process back in about 2008, and the very first proof-of-concept approved by the ISG demonstrated this approach.  Despite this, the early NFV software implementations adopted by the network operators were all traditional and monolithic.  I reflected the TMF vision in my ExperiaSphere Phase II project, and there are extensive tutorials on that available HERE.

My goal with ExperiaSphere was to expand on the TMF service modeling to reflect modern intent-model principles.  A “function” in ExperiaSphere had a consistent interface, meaning that any implementation of a given function had to be equivalent in all ways (thus, it was the responsibility of the organization implementing the function to meet the function’s interface specifications).  That meant that a function could be realized by a device or device system, a hosted software instance or a multi-component implementation, and so forth.  I also made it clear that each modeled element of a service had its own state/event table and its own set of associated processes to run when an event was received and matched to the table.  Finally, I released the specifications for open use, without attribution to me or CIMI Corporation, as long as the term “ExperiaSphere” and my documentation were not included or referenced without permission.

If you presume that the TMF framework for event-to-process binding is accepted, you have something that would look very much like the microservice-centric public-cloud vision we now see.  The “service contract”, meaning the data model describing the commercial and infrastructure commitments of the service, become the thing that maintains state and directs the reaction of the service to events.  If the processes themselves are truly microservices, this allows any instance of a process to serve when the related state/event condition occurs, which in turn lets you spin up processes as needed.  The result is something that’s highly scalable.

The monolithic and traditional model of management starts with EMS, then upward to NMS and SMS, with the latter becoming the domain of operators’ OSS/BSS systems.  If the implementation of NFV is constrained by the EMS/NMS/OSS/BSS progression, then it is not inherently scalable or resilient.  You can make it more so by componentizing the structure, but you can’t make it like the cloud if you start with a non-cloud model.  The lack of a service-contract-centric TMF-based approach also complicates “onboarding” and integration because it doesn’t build in a way of requiring that all the elements of a service fit in a standardized (modeled) way.

By 2018, operators and vendors were starting to recognize that the NFV ISG model was too monolithic and rigid, requiring too much customization of VNFs and too much specialization of infrastructure.  The cloud-container approach, meanwhile, was demonstrating that it could be easy to onboard components, to deploy on varied resource pools, etc.  This led some operators to promote the notion of “container network functions”, taking the view that if you containerized VNFs (into CNFs), you’d inherit all the benefits of the cloud-container-Kubernetes revolution.  Another group tried to standardize resource classes, thinking that this would make the NFV approach of resource pools and virtual infrastructure managers workable.

Neither of these approaches are workable, in fact.  NFV launched the convergence of network and cloud, but did so without knowing what the cloud was.  As a result, its approach never supported its own goal, because it let its specifications diverge from the sweep of cloud technology, which ultimately answered all the questions of function deployment in a way that’s demonstrably commercially viable, because it’s used daily in the cloud.

The cloud is a community approach to problem-solving, and this kind of approach always leads to a bit of churning.  I have a foot in both worlds, and I think that everything networks operators need to fulfill both the SDN and NFV missions optimally is already available in open-source form.  All that’s needed is to integrate it.  We have open-source TOSCA-based service modeling.  We have Kubernetes orchestration, which can be driven by TOSCA models.  We have monitoring and tools for lifecycle automation, and perhaps best of all, we have application-centric implementations of function deployment that are totally compatible with the higher-level (above the network) services that most operators believe have to be exploited to create an upturn in their revenue line.

A cloud-centric NFV would be one based on the prevailing cloud tools, and conforming to the trends in how those tools are applied.  There is little to gain (if anything) from trying to retrofit cloud concepts onto NFV, because NFV wasn’t really a cloud concept from the first.  It would be fairly easy to simply adopt the combination of TOSCA, the “Kubernetes ecosystem”, microservices, SDN and segment routing control plane principles, and build a cloud-ready alternative.  In fact, it would take less effort than has already been spent trying to support things like CNFs and NFVI classes, not to mention the earlier NFV efforts.

I’m frustrated by where we are with this, as you can probably tell.  I’ve fought these issues since I first drafted a response to the Call for Action that launched the NFV ISG in 2012, from that first meeting in the Bay Area in 2013, and from that first PoC.  This was a failure of process, because some at least tried to warn people that we were heading to a wrong place.  Recognizing that process failure is important, because the cloud software movement has succeeded largely because it didn’t have a formal process to fail.  Innovation by example and iteration is the rule in open source.  It should have been so in NFV, and the concept of NFV can rise from the implementation ashes only if NFV forgets “standards” and “specifications” and embraces open source, the cloud, and intent modeling.

I’m not going to blog further on NFV, unless it’s to say that somebody got smart and launched an open-source NFV project that starts with the TOSCA and Kubernetes foundation that’s already in place.  If that happens, I’ll enjoy blogging about its success.