Resource Modeling and Open Infrastructure in SDN/NFV

I said in my last blog that starting from the top of in viewing next-gen services and infrastructure led to a refinement of operator-sponsored models for SDN/NFV deployment.  In that blog, I took the process down to what I said was the critical point—the “Network Function” that replaces the TMF concept of a “Resource-Facing Service” and forms the boundary between OSS/BSS and service management and the down-in-the-dirt management of infrastructure.  I said then I’d dig down below the NF level in a later blog, and this is it.

Recouping the critical lead-in to this discussion, an NF in my approach is an intent model that describes a function that can be composed into a service.  The NF has to be put into a commercial wrapping, which could be done most easily by assigning it to a Customer-Facing Service (CFS) and giving it commercial attributes.  But you still have to realize the abstraction of an NF somehow, and that’s today’s discussion.

It would be lovely if we could simply jump off of the NF directly into some convenient and simple API that’s the old saw—the “DoJob” function.  In some cases, that might even work, providing that the scope of service and the capabilities of the management layers of infrastructure combined to mean that one API would cover all the features needed everywhere.  In today’s world, that’s unlikely.

An NF would logically have to decompose first into what I’ll call domains, which represent collections of infrastructure that obey a common management system.  If I want to sell a VPN or VLAN and add virtual-CPE features, I’d need to have a subdivision representing the various management frameworks through which I needed to operate, likely based on the location of the endpoints.  Thus, a VPN NF would decompose into a series of “administrative NFs” that represented the various collections of endpoints that each could serve.  I’d pick several of these during the primary NF decomposition, based on just where I needed to project service access.

Inside my primary administrative domains, I’d probably have further subdivisions that represented any different technical routes to the same functional capability.  Here I might command routers, and there I might use SDN, and over on the far right (or left, depending on your preferred political leaning!) I might have to deploy virtual elements using NFV.  If I follow this kind of “domain dissection” process downward, each branch would eventually lead me to something that actually controls or deploys resources.

I’ve argued in the past that it could often be logical to assume that a given operator would build up from the resources to expose a set of cooperative functional behaviors.  These are what are available for combination into NFs, either as alternative elements or cooperative ones.  An operator could use this bottom-up exposure mapping to decide just what network or hosted features they wanted service architects to be able to use.  I’ve described this bottom-up behavioral exposure as being the function of a “resource architect”.

This is similar to an aspect of cloud deployment.  In modern cloud tools (DevOps) we have applications at the top, which deploy onto virtual resources in the middle.  These in turn are deployed onto real resources at the bottom.  The intermediary abstraction, a VM or container, means that the top processes and the bottom ones know of each other only insofar as both support a common binding abstraction.

From a modeling perspective, I could visualize the central or binding NF concept as being the border between resource-independent and resource-specific behavior.  At the top, I assemble functions.  At the bottom I coerce cooperative behavior from resource pools.  The top focuses on what features are and do, and the bottom on how they’re actually created.  This difference could be reflected in a number of ways, but one is to shift from a functional or abstract view of a service model to a topological view, one that reflects the location and nature of the real stuff.  You could see this in terms of a modeling language shift, from something like TOSCA to something like YANG.

Or not, and let me illustrate why.  The modeling that’s provided at a given layer depends on the role the model has to play.  I could decompose my binding NF into a YANG model of domain connectivity, and that would be essential if I had to build a multi-domain resource commitment by commanding stuff per-domain and then linking what I’ve built.  But if I had a supermanager that saw all the domains and somehow mysteriously accommodated different vendors and technologies inside them all, I could then simply tell that supermanager to build the function.  In the first case, I decompose into YANG, and in the second I simply invoke a supermanager API.

Likely these supermanagers, and in fact many lower-level management systems that are eventually the target of a hierarchical decomposition, will have to know about topology, but if that’s the case then the need is fulfilled inside the intent model and it’s all invisible to the outside world.

This difference highlights a key question on openness.  At the NF level, everything that realizes the function is equivalent, so the NF is open.  However, if I have a huge complex god-box below, whose capabilities include only the support for my own products, I’ve created at least a good start at a de facto lock-in for buyers.  There is “titular openness” but perhaps not openness in a true and practical level.

What this suggests to me is that for open infrastructure and implementation of NFV and SDN to combine, we need to demand multi-layer decomposition below my critical functional-model (NF) level.  Today, in NFV, low-level deployments via the virtual infrastructure manager function aren’t required to be modeled through higher layers.  That encourages subducting my administrative and geographic domain functions into the VIM, which means that vendors who have product-specific VIMs can then build silo implementations.

Some operators have suggested that something like OpenDaylight, with its universal architecture that connects pretty much anything using pretty much anything, is a solution.  It’s true that a standard supermanager would open things up, but I wonder whether a monolithic implementation is the best approach; why not model it?  In any event, OpenDaylight manages connectivity not function deployment, so you can’t use it for all the supermanager functions.  Operator architectures already reflect that limitation in how they place SDN control and NFV deployment relative to their own models.

I think that the need to generalize supermanager functions also argues against saying that you have to transition to a topology-modeled approach like YANG below the NF.  You can model all of this stuff with TOSCA, though YANG might be more straightforward as a means of describing domain and nodal connectivity, and thus of framing routes.  A combination of the two seems a reasonable approach, but only if somebody steps up to propose a good, open, model for doing it.  I’m still waiting to hear one.

Synthesizing a General Model for Next-Gen Networking From Operator Architectures

We now have a number of public or semi-public operator architectures for their next-gen networks, and we’ve had semi-public vendor architectures for some time.  The “semi-“ qualifier here means that there are a number in both spaces that are not explicitly NDA, but are nevertheless not particularly well described.  We also have a number of viewpoints on what has to be done, not reflected well in any of the models.  I’ve tried to synthesize a general model from all this, and I want to try to describe it here.  Comments welcome, of course!

A good approach has to start at the top, with the conception of a retail service as distinct from a network service.  A retail service (which the TMF calls a “product”) is a commercial offering that will normally combine several network services to create something that has the geographic scope and functional utility needed.  Retail services are the provenance of OSS/BSS systems, and they have a price and an SLA associated with them.  Customer care necessarily focuses on the retail service, though that focus has to be qualified by the fact that some of the underlayment is also at least somewhat visible to a customer.

What is that underlayment?  In my view, the TMF has the best answer for a part of it—the notion of a “customer-facing service” as a component of a “product”.  A CFS is an orderable element of a retail service that, like the overarching retail service or product has a price and an SLA.  It’s my view that this is all mediated by OSS/BSS systems too because the distinction between a “product” and a “CFS” is minimal; one is a collection of the other.

The next-gen question really comes down to what’s under the CFS layer, for three reasons.  First, it’s pretty clear that below CFS we’re getting to the place where “deployment” and “management” become technical tasks.  Second, because this is the point of technology injection it’s also the point where next-gen and current-gen become relevant distinctions.  Finally, at this point we have to start thinking about what makes up a “manageable element” to an OSS/BSS and how we’d express or model it.

The TMF places “resource-facing services” under CFSs, and I think my own work in the SDN/NFV space have suggested to me that this may be too simplistic a concept.  My own suggestion is that we think about how virtualization and the cloud work and use their mechanism to guide the model from this point down.

In virtualization and the cloud, we create a virtual artifact that represents a repository for functionality.  That artifact is then “assigned” or “deployed” by associating it with resources.  Thus, what I would suggest is that we define the next layer as the Network Function layer.  We compose CFSs and products by collecting network functions.  It seems to me, then, that these NFs are the lowest point at which OSS/BSS systems have visibility and influence.  The management of “services” is the collected management of NFs.

A NF could be created in theory in two ways.  First, we could have a lower-level “network service” functionality that’s created and managed through a management system API.  Second, we could have something that has to be deployed—NFV virtual network functions (VNFs) or router instances or cloud application components.  This is the process that I’ve called “binding” in my own work—we’re binding a network function to an implementation thereof.  I suggested (without success) that the TMF define a domain for this—to complement their service, resource, and product domains.

The NF binding process is what gives next-gen infrastructure both operations efficiency and service agility.  Anything that can deliver the characteristics of a NF is a suitable implementation of the NF.  The concept is inherently open, inherently multi-vendor and multi-technology.  But the binding is critical because it has to harmonize whatever is below into a single model.

The cloud and virtualization wouldn’t be worth much if an application hosted in a container or virtual machine had to know what specific hardware and platform software was underneath.  The abstraction (container, VM) insulates applications from those details, and so the NF has to insulate the OSS/BSS and its related services/products from the infrastructure details too.  So the NF is a management translator in effect.  It has a lifecycle because it’s the compositional base for services that have lifecycles.  The binding process has to be able to drive the lifecycle state of the NF no matter what the resource details are below.

VMs and containers are the keys to virtualization because they’re the intermediary abstraction.  The NF is the key to next-gen infrastructure for the same reason.  Thus, I think it’s how we formulate and implement the NFs that will make the difference in any model of next-gen networks.

There are two pathways to defining an NF.  The first is the commercial pathway, which says that an NF is defined when an opportunity, competition, or optimization goal introduces a value to some feature that’s not currently available or well-supported.  The second is the exposure pathway which says that if there’s something that can be done at the technical level within infrastructure, it can be exposed for exploitation.  In either case, there are some specific things that have to be provided.

The top thing is that an abstraction concept has to be an abstraction.  The concept of “intent model” has recently grown out to describe the notion of an element that’s defined by what it intends to do, not how it does it.  Clearly that’s the critical property here, whichever direction the defining of an NF has taken.  Operators have to define their own NFs as proper intent models, and they have to demand that vendors who claim to offer support for “services” define the features in intent-model form so they map nicely to NFs.

The second point is that one I already raised about management translation.  From the top, the NF has to present management in functional terms.  From the bottom it has to collect management in technical terms because resources are technical elements that have to be managed as they are.  Since there are by definition multiple ways to realize a given NF abstraction, the NF processes have to harmonize the different models below to a single functional management model above.

I cannot overstress the importance of this.  With functional abstraction at the NF level accompanied by management abstraction, any two implementations of a given NF are equivalent at their top interface.  That means that as pieces of a service they can be substituted for one another.

One interesting aspect of this point is that even if we were to have absolutely no standardization of how VNFs were deployed and managed and onboarded, we could simply present different approaches under a common NF functional umbrella and compose them interchangeably.  Another is that we could build both VNF and legacy infrastructure versions of an NF and use them interchangeably as well.  Finally, if we put a model hierarchy with decision points on deployment underneath an NF, we could mix infrastructure and build services across the mixture because the NF composition process for all the different infrastructures in place would be harmonized.

This doesn’t solve everyone’s problems, of course.  VNF providers would either have to try to promote some harmony below the NF level or they’d have to provide implementations of the NFs in their market target zone, and potentially provide them for every infrastructure option below.  Thus, NF abstraction is a complete tool for OSS/BSS management integration but only a starting point for how the next level should be managed.

I do think that we could gain some insight even down there, though.  I propose that an intent-model concept of functional layers is “fractal” in that what’s a service to one layer is the infrastructure of the layer below.  If NFs decompose into what are essentially sub-NFs, then the hierarchy says that by intent-modeling every layer we can use alternative implementations there in an equivalent way.  We then simplify the way in which we create openness because NF-decomposition is now a standard process at all layers and it solves the openness problem where we use it—which is everywhere.  That presupposes a common modeling approach at least as far down as the point where we get very configuration-specific.  That’s been my own approach all along, and I hope that this explains why.

How would that lower-level decomposition work?  That’s a topic for a future blog!

Where Could We First See SDN Success in the Network Operator Space?

Software-defined networking has a lot of potential, but we’ve learned in our industry that vague potential doesn’t create a revolution.  More important than potential are the specific places where network operators see SDN creating a significant shift in capex.  There’s still a lot of variation to contend with regarding exactly how soon these places will be sown with the seeds of SDN revolution, but at least we know where things might happen.

The obvious SDN opportunity, which is the carrier cloud, is also the most problematic in timing and scope.  Operators five years ago were gaga over cloud opportunity; it outstripped everything else in my surveys.  Today few operators see themselves as giants in public cloud computing, and most of those who see a role for cloud services think that the role will develop out of something else, like IoT.  But to offset the market uncertainty is the assurance that at least they know what SDN technology in the cloud would look like

Cloud computing begs virtualization on a large scale, and the more cloud applications/components or features, the more virtual networks you need.  Stuff like containers makes the SDN opportunity bigger by concentrating more stuff in the same space, server-wise.  Even non-cloud-service applications of cloud data centers, including NFV and IoT, would demand a host of virtual networks, and SDN breaks down the barriers created by L2/L3 virtualization through VLAN and VPN technology.

The problem here is that while carrier cloud demands SDN, it demands more than simple OpenFlow or vSwitches.  Google and Amazon, no slouches in the cloud, know that and have developed their own virtualization models.  Googles, called Andromeda, is an open architecture and both are well-publicized, but the operators apparently didn’t get the memo (or press release).  To do virtualization well, you need not only separate address spaces and subnets for the tenant applications, but also an architected gateway approach that includes address mapping between the virtual subnets and customer or other networks.  As a standard, SDN doesn’t address this, though it would in theory not be difficult to add.

The technical reason why data-center SDN is a low apple is that the technology of Ethernet needs augmenting because of forwarding-table bridging limitations and VLAN segmentation, and it’s easy to do that with SDN.  That means, logically, that the next place where operators see SDN deploying is in carrier Ethernet.  Some form of Ethernet-metro connectivity is intrinsic in carrier cloud because there’d surely be multiple data centers in a given metro area.  It’s easy to see how this could be adapted to provide carrier Ethernet services to businesses, and to provide a highly agile and virtualized Ethernet-based L2 substrate for other (largely IP) services.

The challenge with the business side of the carrier Ethernet driver is the opportunity scope.  First, business Ethernet services are site-connectivity services.  Globally there are probably on the order of six million candidate sites if you’re very forgiving on pricing assumptions, compared of course to billions of consumer broadband and mobile opportunities.  With consumer broadband making high-speed services (over 50 Mbps) available routinely, it’s harder to defend the need to connect all these sites with Ethernet, and if you look at the number of headquarters sites that do demand Ethernet you cut the site population by an order of magnitude or more.

On the plus side, the number of “Ethernet substrate” opportunities are growing as metro networks in particular get more complex and data-center interconnect (DCI) interest grows for cloud providers, content providers, and enterprises.  The Metro Ethernet Forum (MEF) wants to grow it more and faster through its “Third Network” concept, which wants to formalize the mechanisms that would let global Ethernet create a subnetwork on which agile connection-layer services would then ride.  This is a good idea at many levels, but in the mission of being the underlayment of choice for virtual connection services, Ethernet has competition.

From the third opportunity source—“virtual wire”.  On the surface the virtual-wire notion isn’t a lot different from using SDN for carrier Ethernet applications in transport applications rather than as a retail service, but there are significant differences that could tip the scales.

SDN can fairly easily create featureless tunnels that mimic the behavior of a physical-layer or wire connection (hence the name).  If these virtual wires are used to build what is basically private physical-layer infrastructure, they could be used to groom optical bandwidth down to serviceable levels, to segment networks so that they were truly private, and when supplemented with virtual router instances create VPNs or VLANs.  Those are the same missions that Ethernet could support, but because virtual wires have no protocol at all, they demand less control-plane behavior and are simpler to operate.

One of the battlegrounds for the two possible WAN missions is likely to be mobile infrastructure.  Everyone in the vendor WAN SDN space has been pushing mobile applications, and in particular the virtualization of the Evolved Packet Core (EPC).  The fact is that at this point the overwhelming majority of these solutions are very simplistic; they don’t really take full advantage of the agility of SDN.  That means that there’s still time for somebody to put a truly agile strategy out there, one that takes a top-down approach to network virtualization.

The other battleground is SD-WAN.  It’s unlikely that SD-WAN will be based on carrier Ethernet whatever the goals of the MEF’s Third Network, simply because Internet tunneling is a big part of any viable model.  If we were to see a truly organized virtual-overlay-network tunnel and virtual node management approach emerge, we could see virtual connection layers take off and pull SDN with them.  There are some signs this could happen, but it’s not yet a fully developed trend.

A common technical issue for these pathways to SDN to address is management, and one aspect of management is brought out by the Ethernet-versus-virtual-wire face-off.  Ethernet has management protocols that can exchange information on the L2 service state.  These mechanisms are lacking in virtual wires because there’s no real protocol.  Some would say that gives Ethernet an advantage, but the problem is that in order for an SDN implementation of Ethernet to deliver management data you’d have to derive it from something because the devices themselves (white boxes) are not intrinsically Ethernet devices.

Deriving management data in SDN is really a function of delivering service state from the only place where it’s known—the SDN controller.  In adaptive networks, devices exchange topology and status to manage forwarding, and those exchanges are explicitly targeted for elimination in SDN.  That’s fine in terms of making traffic engineering centralized and predictable, but it means that the topology/state data isn’t available for use in establishing the state of a service.  Sure, the central controller has even better data because it sees all and knows all, but that presents two questions.

First, does the controller really see and know all?  We don’t really have specifications to describe how service state, as a composite state derived from the conditions of multiple forwarding rules and perhaps paths, can be known or can be communicated.  In fact, many network problems might be expected to cut off portions of the network from the controller.  There are good solutions here, but they’re per-vendor.

Second, how is the data delivered to a user?  Today it’s done via control packets that are introduced at each access point, but is that also the strategy for the SDN age?  Simple control packets can’t just be forwarded back to the SDN controller for handling; you need their context, meaning at least the interface they originated from.  In any event, wouldn’t it be easier to have management data delivered from a centrally addressed repository?

You can see from this summary that the big problem with deployment of SDN technology by network operators is the lack of a simple but compelling business case.  Yes, we can find missions for SDN (just as we can find missions for NFV) but the missions are complicated by the fact that they smear across a variety of technical zones, most of which aren’t part of the SDN specifications.  Thus, SDN has to be integrated into a complete network context and we can’t do that yet because some of the pieces (on the business side, in technology, or both) are missing.

I think we’re going to see SDN applications beyond carrier cloud take the lead here, in part because the NFV-carrier-cloud dimension is more complicated to architect and justify and so might take longer to mature.  My bet is on somebody doing a really strong virtual-EPC architecture and then telling the story well, only because mobile infrastructure is already being upgraded massively and 5G could make that upgrade even bigger.  It’s easiest to change things fundamentally when you’re already committed to making fundamental changes.

The Comcast/Icontrol Deal is Another Sign of IoT Reality

Comcast’s announcement on buying a big chunk of the software assets of Icontrol (already used by Comcast in Xfinity’s home control) is another possible sign that the IoT space is getting sensible.  In fact, it’s two signs in one deal, and these could reinforce the trend toward a logical model of IoT that I blogged on earlier.  If that happens, we could finally see strong IoT progress.

We have to start with some facts.  There are over 600 million prospects for home and office security and control technology globally.  Today, those who have monitoring pay about $300 per year, which would mean that the total addressable market (TAM) is almost $200 billion.  It’s harder to say what the other segments of the prospective IoT market size up to be, but there’s not much doubt that security and control represents the largest single segment.  Further, it’s also the one that requires the least push to get going—because it already is.  That’s the problem with it, though.  “News” means “novelty” and the old market can’t make news.

Sad, given that truth doesn’t have much of a shot in the space either.  In my prior IoT blogs I’ve noted that “Internet of Things” has been taken too literally to be logical.  To put a bunch of sensors and controllers directly on the Internet begs all kinds of business model, security, and public policy questions.  A smarter model for the broad, public, IoT is the analytics-and-big-data approach, now being taken by about a half-dozen players of various sizes.

OK, but let’s circle back here.  The database model is great as an architecture for global IoT, but it doesn’t directly address two things.  First, how do you economically connect the literally millions of security and control devices already out there?  Second, how does the home security and control market, a giant piece of populist IoT, evolve?  The Comcast/Icontrol deal might offer insight to both.

Icontrol is one of a number of companies that offer sensor/control solutions based not on IP or the Internet but on lightweight protocols designed for short-range in-home (or business) connections between sensor/controller devices.  There are a number of protocols in this space, ranging from the venerable X.10 to Insteon’s proprietary protocol and to Zigbee and Z Wave, both of which are supported by Icontrol.  The first two of these are typically used over powerline wiring in a home or business and the latter two are wireless.

All these protocols support meshed direct connections, and all can also be used in a star configuration with a central controller, or a tree of controllers and devices for larger spaces.  The controller configurations have become more popular over time because they allow for programming of events based on time and sensor conditions.  They also provide, at least in most cases, for a link between the control network and the Internet, which lets people control things from their phones or tablets.  The sensors and controllers are not directly on the Internet, and if the gateway feature isn’t enabled there’s no Internet connection at all.  Many people use the networks this way for security reasons.

Comcast and a host of other service providers and specialty or security companies; Comcast is the largest of Icontrol’s service customers.  These providers offer a security-and-control-as-a-service model, a kind of PaaS, and the value proposition is simple—there are tens of millions of security systems out there that have minimal features, and probably as many as ten million with home control features that are obsolete.  Installing the complicated high-feature systems requires considerable skill and often even some programming, and finding devices that work with what you have is also complicated.  Companies like Comcast believe that people will pay for a service-centric ecosystem, and that seems to be true.  Remember, most good home alarms are connected to central monitoring and a monthly or annual fee is collected.  Often it’s more than a residential phone system’s fee, or even low-speed broadband.

These services are a kind of back door to IoT in the strict sense, if they’re IoT at all.  As I said, most of them don’t provide Internet access to the control networks at all, and where the access is provided it’s simple to add security/authentication the same way you’d do with an email account or secure website.  The sensors and controllers don’t have to bear the cost of direct Internet connectivity and security either.

These are the two signs of progress I mentioned.  First, Icontrol/Comcast represents a business model to evolve that which is already out there, the home/office control and security model that has worldwide well over a hundred million users already.  Second, it demonstrates a network technology that’s inherently secure and Internet-independent unless you take explicit steps to open it up.  While some of the control protocols (like X.10) could in theory be used from outside the home to trigger something within, others have explicit sensor/controller registration and the central hub would ignore a foreign device.

A service based on a platform like Icontrol could still perform many IoT-ish functions, though.  That’s particularly true if you can incorporate local WiFi devices, and even allow roaming phones and tablets to do things.  For example, a phone or tablet might register with the local hub and then be able to report its location, which could be helpful for parental control.  Similarly, a phone or tablet could be privileged based on its identified user to extract data from the control network or control devices.  And that’s without presuming that any newer sensors/controllers would be added to the mix.  Since Comcast’s rumored reasons for buying the Icontrol software assets include the desire to customize the stuff to incorporate new devices, Comcast might be thinking of expanding the scope of the control network beyond the ordinary gadgets now available and the Comcast-created custom devices and third-party elements it already supports.

Most of the home-network stuff is based on the simpler and shorter-range Zigbee technology, and this is what Comcast acquired.  Z Wave is usually professionally installed, and usually by companies that do home alarm and control systems.  However, it has about three times the range, and it might be that something like Z Wave would be a better option if the Icontrol model were to be applied to things like retail stores and even streets/intersections, things that people often think of when they think “IoT”.

The key point, though, is that the Icontrol model aims explicitly at supporting current sensor/controller technology as an upgrade.  You can add features (and devices) to many current systems or upgrade to something that is able to support that sort of incremental functional growth.  And because the security/control space is typically associated with a recurring payment for a service external to the home/office even today, it lends itself to an implementation based on cloud-hosted technology.

Once you pull security/control even a little bit into the cloud, you can start looking at those expanded IoT-like services.  You can implement the service using database and analytics technology, complex event processing, and all the other good stuff that IoT really has to be based on if it’s to succeed.  In short, Comcast could be on to something.

We already have cloud IoT, of course, as I said in my prior blog, but that was largely in the form of cloud-hosted services to augment IoT, not a cloud form of an IoT application.  Comcast is betting now on just that, and it might mean that other players who now offer only IoT features in the cloud (like Amazon) will start thinking platforms.  Amazon, recall, has a home controller already and hopes to expand its capabilities.  You can, in fact, link it to many popular home-control technologies already.  It would take only a little systemization for this to become a full IoT offering.

This is, to be sure, “IoT lite” in the eyes of those who think the whole world of sensors and controllers are going on the Internet, but that’s never been a practical option.  What we may be developing now is evolutionary not revolutionary, interesting not exciting, but real and not fiction.

Facing the Future: Inside Vendor Failures to Sell NFV Transformation

I blogged recently on the views of the “literati” of network operator planning for NFV, and one blog dealt with what they thought vendors were doing wrong.  Yesterday, Ray Le Maistre from Light Reading did a video piece on the OPNFV event and made one of the same points.  The literati said that vendors don’t present their complete solution, and the LR piece said that operators said that OSS/BSS integration was lacking as a part of NFV solutions.  The truth is that at least a dozen vendors can provide just what operators want and that Light Reading said was missing.  What’s up?

I’ve noticed that when the same vendors who I know have a complete solution either send me stuff or talk with me, they seem to omit the very thing operators are saying they need NFV to provide.  I’ve also noticed that the vendors’ salespeople tell me that they aren’t getting the deals as fast as they’d like, which suggests that at the sales level there’s proof that there is an omission of capabilities going on.  So it’s not in the operators’ minds here, and Ray at Light Reading isn’t misreading the market.  We’re not singing the right song.

I’d like to be able to call on an objective outside resource like my literati to explain this, but of course you can’t expect somebody to tell you why someone isn’t telling them something. I have to present my own view of the vendor motives, and that’s what I’ll do here.

IMHO, the biggest problem is that vendors have been chasing media-generated NFV rainbows.  The NFV market is at this point so totally speculative that no responsible forecast for it could be done, but not only do we have forecasts, we have escalating ones.  The reason is that a story that says that NFV will sell five or ten or twenty or a hundred billion dollars in five or three or even one year is exciting.  People will click on URLs to read it.  For the forecasters, who buys a report that says there’s no market forecast for something?  Who buys one that builds a vision of an extravagant market?  In the first case, nobody.  In the second, every product manager responsible for the products for that extravagant market.

The natural consequence of heady forecasts is unrealistic expectations.  One large NFV hopeful confided that their quota for 2016 was probably going to end up larger than the total of NFV spending because that had been the case in 2015.  As pressure to perform against the forecasts mounts, companies look for quick deals and shorten selling cycles.  Nobody wants two-year projects to prove a business case here.  They don’t need it; after all, NFV is the way of the future and XYZ Analytics says so.

The second problem is that there is no PR benefit to having a full solution, so there’s no incentive to talk about one.  Everybody is an NFV player these days.  If you sell servers you’re a NFV infrastructure player, and those who offer Linux and middleware are virtualization and platform players.  Everyone who has anything that could be considered a network function can virtualize it (meaning make it cloud-hostable) and be an NFV virtual-function provider.  If you have OpenStack support or OpenDaylight you have “orchestration” so you have MANO, and of course if you provide any form of management you’re a shoo-in.  These are all conflated in a story or a press release, and we lose the distinction between somebody who can build a case for NFV deployment and those who expect to ride on the coattails of such a deployment.

NFV is really complicated.  Even the ETSI ISG, who declared a lot of the issues out of scope originally, are now admitting that we need things like service-level orchestration and federation across operators and implementations. Add in these new requirements and you rule out almost everyone as an NFV leader, someone who can make the business case, and relegate them to NFV exploiters who hope that first somebody else makes that business case and, second, that they can horn in on it.

The next problem is related, which is that operator drive for an open solution means nobody gains much by developing the market on their own.  A vendor who comes up with the right approach is essentially mandated to share it.  Given the enormous cost of building a complete solution and productizing it, it’s not a surprise that vendors don’t want to bear that cost without having a leg up on reaping the benefits.  Open approaches mean that they probably can’t do that.

The fact is that open strategies are going to have to be funded by either the buyer or by those vendors who win almost no matter what.  An example of the former is the initial focus on standards and open-source, and now the operator-created architectures.  Intel is an example of a vendor who wins no matter what; whose chips are in all those boxes?

Problem number four is that sales organizations have no collateral to support the kind of operations-integrated sale that Ray from Light Reading heard operators want to see.  Some of the salespeople for those six vendors who could do what operators want literally don’t know that they can.  More don’t know how to promote it or who within the operator organizations to promote it to.

Some of this problem is surely related to the issue of sales quotas an instant gratification, but that’s not the whole story.  This, as I said, is complicated and sales organizations who are used to selling interchangeable boxes into an established demand aren’t prepared to build a business case and sell a complex solution to a complex problem.

And there is a final problem, organizational culture is holding back success.  Ray made that point in his video, and we’ve all heard that operators have to change their culture to make NFV succeed.  Well, perhaps there are some issues with the culture of operators, but the truth is that it’s the vendor culture and not the operator culture that are the problems today.

Transformation to a new model of networking means, for vendors, that the easy happy days of having a buyer come to you and say “give me a dozen routers” is gone.  Instead they’re coming with a request for you to make their services more agile and their operations more efficient, and by the way do that at a lower overall cost than they’d been presented with before.  You can’t really even make money selling professional services to accomplish these goals, because without an understanding of how to do what the buyer wants you can’t make a business case for doing anything at all.

All of this explains why NFV has failed to live up to heady expectations, but it doesn’t mean it never will.  There are benefits to NFV, and in many areas they’re compelling.  We tried to advance to NFV on a broad front, and we simplified ourselves into failure doing that.  Breadth is breadth in complexity as well as in opportunity.  Now we’re moving on multiple narrow fronts, constrained more by the fact that we’re not sure all our steps will add up to progress and not silos.  Operator architectures are guiding the framework for unification and federation, a framework we should have had from the first.

But we could still mess this up.  Will every operator do an ECOMP like AT&T, or even be able to adopt AT&T’s own model if they open-source it?  Benefits justify an architecture, architectures frame product elements, and product elements are what buyers purchase.  Making the connection at the top is what makes the world go ‘round, and more attention to that top-level process is essential, even and perhaps especially for the media.

How We Can Create a Framework for Open VNFs and Easy Onboarding

Earlier this week in my blog I talked about a model-driven open approach to NFV that focused on deployment elements, meaning infrastructure.  I introduced the notion of virtual function classes as the framework for VNF openness too, and I want to follow up on that today.

The general point of my earlier blog was that for openness to happen, you had to define a series of abstractions or functional models that represented the base class of capabilities you wanted, then define the various ways that those capabilities could actually be decomposed onto infrastructure.  You assemble functional models into services in a generic way, then commit functional models to infrastructure in whatever way the infrastructure demands.

This same approach can be applied to Virtual Network Functions.  The high-level functional model (“Firewall”) might be decomposed into one of two technology-based models—“RealDevice” and “FirewallVNF”.  The latter is the combination of the VNF software and the hosting of that software on infrastructure.  We talked about the latter yesterday, and so it’s the VNF software we’re going to address today.

All firewall VNFs should be equivalent, meaning that all the ways of instantiating a virtual network function of a given functional class should be interchangeable and identical when looked at from the outside.  Clearly all the software that might be used for a firewall VNF doesn’t take the same parameters, expose the same interfaces, or have the same management capabilities, so the harmonization of those differences has to be framed in some way.

That harmonization has to start, as the whole modeling approach had to start, with an abstraction that represents the function class—“FirewallVNF” in this case.  It’s my view that this function class would be an extension (for software types, think of a Java Class that “Extends” a base class) of the base class of “AnyVNF”.  If we adapt this approach to the ETSI ISG’s approach, the base class would define the scope of the necessary APIs that were exposed by the centralized part of the VNF Manager (VNFM).  This base class might expose a parameter port, a management port, a map of “internal ports” to interconnect elements, and a set of “service ports” that will connect outside.

The goal of onboarding a given VNF is now to map to that FirewallVNF template (and by inference to the AnyVNF set).  The responsibility for that, again using ETSI’s concepts loosely, devolves on the “distributed” part of the VNF manager.  You have a VNF that’s a firewall implementation.  You add to that VNF the necessary custom logic to make it appear in all respects as an instance of FirewallVNF, which means that all of the management, deployment, and functional characteristics of that implementation are mapped to the common model.  Once that’s done, you can deploy any of those implementations.

There are five beauties to this approach, IMHO.  The first is that it solidifies what I think has been an incredibly fuzzy concept—the VNFM.  Second, it defines a very specific goal for openness—which is to harmonize something to a function class.  Third, it makes the validation of a given function’s adherence to the open model a simple matter to test.  You could build a test jig into which instances of the function class (implementations of FirewallVNF in our example) could be inserted and validated.   Fourth, all of the integration happens in matching the implementation to the function class, so it’s the responsibility of the VNF vendor or an agent thereof.  What’s handed off is a complete, open, useful, package.  Finally, anyone can run these tests in the same framework and get the same result.  Vendors and operators can do testing for themselves, and any “lab” can recreate the “experiment” just like science generally requires.  We don’t have to establish or accept some central body as responsible for testing.

Vendors should like this model because it makes their own labs and integration testing easier because it provides them an explicit target to hit—matching to the functional class.  Operators should like it because most of their own onboarding efforts would be expended for new functional classes; basic testing or certification of vendor labs would cover the specific integration issues.  VNF vendors should like it because it could simplify their own integration.

I emphasize the “could” here because the question of who defines the functional classes remains.  This would be a worthy activity for some group like the OMG or even the ETSI NFV ISG, with the cooperation of operators and vendors.  I suspect that at least the “AnyVNF” base class would have to be defined by the NFV ISG because its requirements relate to the interface between VNFs and the rest of the NFV world.  The management stuff would also have to be integrated here.

This might prove to be the big hole in the functional class approach for VNFs.  There are, IMHO, a lot of open questions on management for NFV.  I suggested in an earlier blog that the ISG needed to incorporate the notion of Infrastructure-as-Code from the cloud, and that notion has at least the potential for defining the use of events to link services and resources when linkage is required for service lifecycle management.  That poses two challenges; what events get generated under what conditions, and how do you reflect events in the ETSI model in the first place.

This issue can be finessed (and probably should be) in part by saying that resource events are directed to the functional models for the service and not to the VNFs.  If that’s done, then all we need to worry about is how to harmonize the failure of a real device versus its equivalent virtual function.  If a firewall device fails, we need to communicate that to the service level.  If a FirewallVNF fails, the same has to be true if we follow the precepts of our open model.  That means that part of the base class AnyVNF stuff has to be able to recognize the state of the VNF just as a management system would recognize the state of firewall hardware.  One path to that is to use the SNMP MIBsets for the basic “real” devices as the foundation data.  That would work even if we didn’t assume “event generation” to drive service changes and instead relied on testing the state of resources and VNFs.

I think all of this demonstrates that the techniques for creating open VNFs or facilitating the integration and onboarding of VNFs has to be based on the same notion of a functional-class model set that can be composed from above and instantiated below.  That way, it integrates with resource allocation during deployment, redeployment, and other lifecycle stages.  In VNF onboarding and integration, as in network infrastructure accommodation, we’re trying to resolve detailed issues without a suitable architecture framework to guide us.

This is where I think vendors and operators need to guide things.  The architecture models for next-gen networks promulgated by operators in various parts of the world demonstrate (and in some cases explicitly say) that the ISG framework for NFV and the ONF framework for SDN aren’t sufficient.  You can’t just cobble together standards at a low level and expect them to add up to something optimal, or even useful, at a high level.  Harmonizing the architecture of VNFs is critical and I’d love to see vendors present their approaches, and base them more on the cloud than on abstract interfaces.  A good one could form the basis for broad deployment.

Achieving Openness in NFV

What operators want most from their next-gen infrastructure (whether SDN or NFV or the cloud) is openness.  They feel, with some justification, that equipment vendors want to lock them in and force them down migration paths that help the vendor and compromise operator goals.  Networks in the past, built on long-established standards that defined the role of devices, were considered fairly open.  It’s far from clear that goal can be achieved with the next generation.

You can make anything “open” if you’re prepared to spend a boatload of cash on professional services and suffer long delays when you commission a piece of hardware or software.  That all flies in the face of a transformation to be driven by efficiency and agility.  What you need is for pieces to fit because they were designed to fit, and specifications that insure that you realize the goal of “fitting” that you’ve designed.

Everyone I’ve talked to in the operator community (CxOs and “literati” alike) believe that an open environment has to be based on three things.  First, an architecture model that makes the business case for the migration.  Second, a series of functional models that define open elements that can then be made interchangeable by vendors/operators through “onboarding”.  Finally, validation through testing, plug fests, etc.

The problem we have today in realizing openness is that we don’t have either of the first two of these, and without them there’s little value in validating an approach because there’s no useful standard.  There doesn’t seem to be much of a chance that a standards group or even open-source activity is going to develop either of the missing pieces either.  Vendors, even the half-dozen who actually have a complete model, don’t seem to be promoting their architectures effectively, so what we’re now seeing is a set of operator-driven architecture initiatives that might result in a converging set of models, or might not.  Fortunately, we can learn something from them, and in particular learn why that second point of requirements for openness is so critical.

“Open” in IT and networking, means “admitting to the substitution of components without impacting the functionality of the whole.”  That almost demands a series of abstractions that represent classes of components, and a requirement that any component or set of components representing such a class be interchangeable with any other within that class.  I think that this divides what’s come to be called “orchestration” and “modeling” into two distinct areas.  One area builds from these functional models or component classes, and the other implements the classes based on any useful collection of technology.

Let’s return now to the bidirectional view of these functional models.  Above, you recall, they’re assembled to create services that meet an operator’s business needs.  Below, they’re decomposed into infrastructure-specific implementations.  With this approach, a service that’s defined as a set of functions (“network functions” perhaps in NFV terms) could be deployed on anything that could properly decompose those functions.  If infrastructure changes, a change to the lower-layer decomposition would update the service—no changes would be needed at the service level.

The service structure could be defined using TOSCA, where my functions are analogous to high-level application descriptions.  It could also be defined using the TMF’s SID, where my network functions would be analogous to either customer-facing or resource-facing services.  That means it should be largely accommodating to OSS/BSS as long as we frame the role of OSS/BSS to be the management of the CFS and RFS and not of “virtual devices” or real ones.

Decomposing a function requires a bit more attention.  Networks and services are often multi-domain or multi-jurisdictional.  That means that the first step in decomposing a function is to make a jurisdictional separation, and that’s complicated so let’s use a VPN as an example.

Let’s say I have a North American VPN that’s supported by AT&T in the US, Bell Canada in Canada, and TelMex in Mexico.  My first-level decomposition would be to define three administrative VPNs, one for each area, and assign sites to each based on geography.  I’d then define the interconnection among providers, either as a gateway point they had in common or a series thereof.  In the complex case I’d have six definitions (three area VPNs and three gateways), and these are then network functions too.

For each of these network functions, I’d then decompose further.  If a given operator had a single management API from which all the endpoints in their geography could be provisioned, I’d simply exercise that API.  If there were multiple domains, technology or otherwise, inside one of these second-level functions, I’d then have to decompose first to identify the proper domain(s) and then decompose within each to deployment instructions.

This description exposes three points.  First, there’s a fuzzy zone of network function decomposition between the top “function” level and the decomposition into resource-specific deployment instructions.  Is my administrative separation, for example, a “service” function or a “resource” function?  It could be either or both.  Second, it’s not particularly easy to map this kind of layered decomposition to the ETSI processes or even to traditional SDN.  Third, the operator architectures like AT&T’s and in particular Verizon’s calls out this middle layer of decomposition but treats it as a “model” and not specifically as a potentially n-layer model structure.

All of which says that we’re not there yet, but it gets a bit easier if we look at this now from the bottom or resource side.

A network’s goal is to provide a set of services.  In virtual, complex, infrastructure these resource-side services are not the same as the retail services—think the TMF Resource-Facing Services as an example.  I’ve called these intrinsic cooperative network-layer structures behaviors because they’re how the infrastructure behaves intrinsically or as you’ve set it up.  SDN, NFV, and legacy management APIs all create behaviors, and behaviors are then composed upward into network functions (and of course the reverse).

Put this way, you can see that for example I could get a “VPN” behavior in one of three ways—as a management-driven cooperative behavior of a system of routers, as an explicit deployment of forwarding paths via an SDN controller, and by deploying the associated virtual functions with NFV.  In fact, my middle option could subdivide—OpenDaylight could control a white-box OpenFlow switch or a traditional router via the proper “southbound API”.

The point here is that open implementations of network functions depend on connecting the functions to a set of behaviors that are exposed from the infrastructure below.  To the extent that functions can be standardized by some body (like the OMG) using intent-model principles you could then assemble and disassemble them as described here.  If we could also define “behaviors” as standard classes, we could carry that assemble/decomposition down a layer.

For example, a behavior called “HostVNF” might represent the ability to deploy a VNF in a virtual machine or container and provide the necessary local connections.  That behavior could be a part of any higher-layer behavior that’s composed into a service—“Firewall” or even “VPN”.  Anything that can provide HostVNF can host any VNF in the catalog, let’s say.

The notion of functional behaviors is the foundation for the notion of an open VNF framework too.  All virtual network functions, grouped into “network function types”, would be interchangeable if all of them were to be required to implement the common model of the function type they represented.  It would be the responsibility of the VNF provider or the NFV software framework provider to offer the tools that would support this, which is a topic I’ll address more in a later blog.

Openness at the infrastructure level, the equipment level, is the most critical openness requirement for NFV for the simple reason that this is where most of the money will get spent as well as where most of the undepreciated assets are found today.  We can secure that level of openness without sacrificing either efficiency or agility, simply by extending what we already know from networking, IT, and the cloud.

SD-WAN’s Potential as a Game-Changer is Growing

Software-defined WAN is one of those terms that’s vague enough to be applicable to a lot of things, but the core reality is a fairly classic “overlay model” of networking.  An overlay network is a layer on top of the “network protocol” of a real network, an overlay that provides the connectivity service to the user and uses the real network protocol (or protocols) as a kind of physical layer or virtual wire.

We’ve had SD-WAN overlay concepts around for many years.  In a very limited sense, many IP networks are overlays that add an IP layer to Ethernet or another Level 2 technology.  The whole OSI model in fact is a series of overlays.  But the SD-WAN concept’s real birth was probably the work done by Nicira, the cloud-network player who was bought by VMware.  Nicira recognized that you could use overlay networks to build connectivity for cloud data centers without the constraints that exist in “real” network protocols and without the need to involve physical network devices in what was “virtual connectivity”.  SD-WAN technology extends this model, one of the forms of software-defined networking or SDN, to the WAN.

The early SD-WAN products aim at one of several major goals.  First, they create a very agile connection-layer service that can easily build a virtual private network without the headaches and costs of something like MPLS.  Second, they can build a unified virtual network across locations that don’t have a common real-network connection.  Users like this because they can use traditional MPLS VPNs in major sites and add in minor sites or even transient locations through an SD-WAN that supports both underlayments.  Finally, they can use the Internet as an adjunct to private VPNs or to create a wider pipe for a period of time.

SD-WAN has carved out a nice but not enormous market in these areas, and while all of them are valuable it’s not likely that these three drivers would result in explosive growth.  That doesn’t mean that SD-WAN doesn’t have potential—it may end up being as important or more important than SDN or NFV, and in fact be a critical enabler of both.

One obvious mission for SD-WAN is the creation of agile virtual networks over a combination of traditional L2/L3 technology and SDN.  Using SD-WAN I can build a VPN by creating SD-WAN tunnel meshes of all my endpoints over a common, evolving, or disparate underlayment.  If I use NFV to deploy interior router instances, I can create a virtual topology of nodes and trunks (both virtual, of course) that aggregates traffic and eliminates the potential inefficiency or lack of route control that meshing could create.

The use of SD-WAN could mean that connectivity, meaning the service connection layer, resides only in endpoints or in hosted instances that are dedicated to a service or user.  This would make provisioning new endpoints or changing services trivial; no real hardware would even have to know.  If my underlayment offered grades of service representing QoS and availability options, I could offer various SLAs from this, and change QoS and SLAs easily without interrupting anything.

If I used SDN to build only virtual wire tunnels, I could build services entirely from software elements.  This would clearly scale to support VPN/VLAN services and probably also to support content delivery networks, mobile Evolved Packet Core, and even IoT.  With some reasonable way of managing Internet services as predictable sessions, I could probably support most of the demanding Internet applications for content delivery and ecommerce.

In both SDN and NFV and in the cloud, you have a presumptive “virtual network” built in data centers and across data center interconnect (DCI) trunks.  In the cloud, these virtual networks are most often subnetworks linked to the Internet or a VPN, meaning that they’re a homogeneous part of a larger network.  In NFV, the presumption is that these networks are truly private, with only selected ports adapted into a broader address space.  SDN and SD-WAN could create a hybrid of these models.

Suppose I build an SDN network to link application components in a data center complex.  Yes, I can gate these to a larger address space like always, but suppose that instead I consider these to be a series of separate networks, independent.  Now suppose that I have a bunch of users and organizations out there.  With the proper software (SD-WAN, in some form) I link users or groups of users (themselves formed into a separate network) with applications selectively.  I don’t have a uniform address space for every user, I have a composed address space.

Composed address spaces may seem hokey, but they go back a long way conceptually.  When switched-virtual-circuit services like frame relay and ATM came along, there were a number of initiatives launched to exploit these (if they became widespread, which they did not) for IP.  One of the protocols invented for this was called the “next-hop resolution protocol” or NHRP.  With NHRP you surrounded an SVC network of any sort (the standard calls these networks “NBMAs” meaning non-broadcast multi-access to show that traditional IP subnet processes won’t work) with a kind of IP ring, where each ring element maintained a table that showed the NBMA address for each remote subnet.  When traffic arrived there, the ring station simply looked to see if the associated ring station for the subnet was already connected, and if not connected it.  SD-WAN composed address spaces could be similar, except that they translate a remote application address in its own space to the address space being composed.

These features all make SD-WAN the perfect companion to SDN and NFV because they illustrate that SD-WAN can virtualize the service layer independent of what’s happening to layers below.  That’s the value of overlay technology.  The agility benefits, and the ability to decouple service-layer connectivity from transport infrastructure, are so profound that I think it’s fair to say that SD-WAN technology is the best option for VPN/VLAN services.  In addition, its linkage to cloud data center connectivity means that it could revolutionize NFV managed services, cloud computing service delivery, and even (by creating explicit connectivity versus permissive networking) security.

Obviously SD-WAN is a threat to established service-layer paradigms, particularly box-based IP, and for that reason it’s going to be heavily opposed by the big-name players like Cisco.  The question is whether those who could benefit from it, including obviously the SD-WAN vendors but less obviously companies like Brocade who have world-class software routers and Nokia whose Nuage SDN architecture has all the right interior components, will position their assets better.  Complementing SD-WAN or (in the case of Nuage, offering an alternative implementation model) could be a big win for them, and ultimately for the market.

The big question will be whether the network operators or managed service providers jump on SD-WAN technology in a big way.  Verizon has launched an SD-WAN service and MetTel has a competitive-carrier SD-WAN service aimed at the MPLS VPN space.  None of these has so far created a serious threat to incumbent VPN technology, but the potential is there, and a big shift could come literally at any moment.  SD-WAN could even be a first-deployed technology, paving the way for adoption of SDN and NFV by disconnecting the service layer from infrastructure.  We’ll have to watch for developments, and I’ll blog on the important ones as they occur.

What Operator Experts Think is Wrong with Vendor NFV Strategy

If you are a company with aspirations in the SDN or NFV markets, then operators themselves say you have a problem.  In fact, you probably have more than one problem, and those problems are hurting your ability to engage customers and build revenue.  This is a message from those same literati I talked “tech-turkey” with last week, and again it’s interesting that their views, my views, and vendor views of the issues are both congruent and different.

I’ve noted in past blogs that SDN and NFV salespeople have complained to me that their markets are not moving, they’re not making their sales goals as a company, and they’re frustrated by what they see as the intransigence of buyers.  The key phrase from their emails is “The buyer won’t…” as though salespeople had either the right or ability to simply expect that buyers would adopt a frame of reference that’s convenient to the sales process.  What do operators see, through the literati, see?

The number one problem with vendors, according to operator literati, is “they act like SDN and NFV are decisions already made and all they have to do is differentiate themselves from the competition.”  This, when 100% of the CFOs I’ve talked with or surveyed say that they still can’t make a broad-based SDN or NFV business case and so there is no commitment as yet to either technology.

I’ve done sales/marketing consulting for a lot of years, and one point I’ve always made is that there are three message elements in positioning your offering.  The first and most important are the enablers, meaning the features and value propositions that can make the business case for a deal.  Second are the differentiators that make you stand out from others who can “enable” too, and last are the objection management statements that can put to rest mild issues of resistance or credibility.  What the literati are saying is that vendors aren’t enabling SDN or NFV, and so there’s not really much of a market to compete for.

There’s unanimity among the literati on this first problem, but not on what the next one is.  About half the literati say that problem two is that vendors don’t address the complexity of getting support for their projects from all the relevant buyer constituencies.  “The CTO doesn’t have a deployment budget and can’t make a deployment decision,” says one literati who happens to work in the CTO organization.  The other half say that vendor views of the market are set by the media, who in turn are setting their views from vendors.  “Most of what these salespeople tell us is what they read somewhere, and at the same time they know that their own company is promoting analysts and writers to say that very stuff.”

It’s not going to surprise you to hear that I believe that the media processes in the tech industry took a turn over two decades ago when subscription publications were replaced by ad-sponsored controlled-circulation pubs.  At one point, I had an opportunity to review the reader service cards for a mainstream network rag, and the total value of purchases the respondents said they made decisions for was at least triple the total market.  Processes took another turn with the online shift, because serving an ad happens pretty much as you click a URL, where you have a better chance of seeing a print ad the longer you’re on the page.  At any rate, what we see and hear and read is increasingly set by vendors.  Even if you assumed it was all true (which obviously it is not) then it makes no sense for salespeople to simply mouth the same story.  Why should a buyer even bother to take a sales call if that’s what happens?

The engagement issue is probably the longest-standing problem, and it’s related to another issue the literati brought up, which is that vendors don’t understand anything about my business.  I remember consulting with a switch/router vendor two decades ago, and pointing out to them that the diagram they were showing for network evolution by US operators was in fact a violation of the regulatory framework that governs the industry.  Operators used to send me moaning emails making that same point, and they saw it as an indication that their vendors didn’t take the trouble to understand the customer.

The thing is, there’s more than one customer.  A transformation like SDN or NFV would bring has to be lab tested, network operations-tested, CIO/OSS/BSS-tested, pass CFO muster, and get CEO and executive committee approval.  All these constituencies have to buy in, all will do so if their own issues are addressed, and vendors tend to expect their own sales contacts to run the ball internally, which in most cases can’t be done because the internal operator groups probably no less about each other than the vendors do.

And the literati say vendors don’t present their total solution either.  SDN and NFV are not monolithic.  Generally speaking, you have a combination of IT infrastructure on which stuff will be deployed, facilitating software that handles the virtualization and deployment processes, and operations and management tools and processes that manage the commercial offerings and the sales/support processes.  In all of the SDN or NFV vendors I’ve talked with, these three pieces of tech transformation are different profit centers, or they’re not even present (the buyer, seller, or integrator would have to add in stuff from outside).

How many cars would an auto giant sell if you had to talk to a showroom salesperson about the car, another about the engine, yet another about tires and the seats?  “Infrastructure” isn’t seamless but it has to be cohesive.  Yet I’ve listened to vendors who won’t talk about NFV orchestration because they want to sell servers and platforms, and others who won’t talk OSS/BSS because that’s either another business unit or it’s a partner company.

There seems to be a “vendors are from Mars and operators from Venus” thing going on here.  I think part of the reason is that vendors are looking for profits in the next couple quarters and transformation is seen by telcos as a three-to-five-year process.  Another part is that vendors are used to selling equipment into what could be called an established paradigm, not to working to invent one and then sell into it.  Finally, they are used to the telco’s own internal processes taking “successful” trials into production, where today the trials don’t have a broad enough scope to make the business case.

One operator literati made what might be the definitive comment on all of this, relating to the tendency for vendors to go after tactical service-specific NFV and SDN projects.  “These services that they’re talking about, if you presumed they were 100% converted to NFV hosting, and assuming they delivered on the benefit case promised, would make a difference for us that’s a rounding error on our bottom line.”

You can’t easily creep by midget steps into NFV because none of the steps make a visible difference in profits.  Somehow, vendors have to convince operators that they can do more than creep.  The disconnect vendors face now makes that hard, but far from impossible, because the literati say the operators want vendors, and NFV, to succeed.

Should NFV Adopt “Infrastructure as Code” from the Cloud?

From the first, it was (or should have been) clear that NFV was a cloud application.  Despite this, we aren’t seeing what should then have been clear signs of technologies and concepts migrating from the cloud space into NFV.  One obvious example is TOSCA, the OASIS Topology and Orchestration Specification for Cloud Applications, which has been quietly increasing its profile in the NFV space despite a lack (until recently) of even recognition in the ETSI NFV activity.  But I’ve talked about TOSCA before; today I want to look at “Infrastructure as Code” or IaC.

IaC is a development in the DevOps space that, at first glance, is actually kind of hard to distinguish from DevOps.  Puppet and Chef both talk about it, and Amazon has picked the notion up (along with Chef) in its OpsWorks stuff.  The explanation of just why we have IaC and DevOps independently is not only useful for IaC, it’s also instructive in how NFV’s own management and orchestration should be expected to work.

Any virtualized environment is a combination of abstraction and instantiation.  You have an abstract something, like an application or virtual function, and you instantiate it, meaning that you commit it to resources.  In software, “DevOps” or “Development/Operations” described an initiative to transfer deployment and later lifecycle management information from the application development process forward into data center operations.  Because both virtualization and DevOps end up with deploying or committing resources, the similarity at that level overwhelms an underlying difference—one is really a layer on the other.

DevOps is about the logical organization of complex (multi-element) applications.  But if you’re doing virtualized resources, the resources have a life separate from that of applications.  A server pool is a server pool, and some VMs within it are the targets of traditional DevOps deployment.  But not only does the existence of a resource pool versus a specific resource complicate the deployment, it also separates the management of resources as a collection from the management of applications.

The DevOps people, especially market leaders Chef and Puppet, were among the first to see this and to reflect it in their products through the addition of resource descriptions.  You could describe what was needed to commission a resource in a pool or independently, just as you could describe what was needed to commission an application.  Rather than trying to tie the two tightly, these evolving changes reflected an interdepencence.  They created another side to the DevOps coin, and that other side became known as IaC, to reflect the fact that DevOps-like tools were to be used to commission resources and to handle their lifecycle management.

It’s my view that what makes IaC a critical concept in DevOps and the cloud should also make it critical in NFV, and probably in SDN too.  Resources are always separate from what they’re resources for, meaning separate from the deployment of applications/components or the threading of connections.  The mission that commits them—which we could call a “service” or an “application”—is one deployment and management domain, and the resources themselves are another.  Logical, and perhaps even compelling, but not something we hear about in NFV.

It’s also interesting to note that what the DevOps community seems to be doing (or moving to do) is supporting the “interdependence” I talked about earlier by providing an event-based link between the DevOps and IaC processes.  The two are separate worlds when everything is going normally, but if the IaC operations activities are unable to sustain the resource lifecycle properly, then they have to trigger a DevOps-level activity.

An example here is that of a failed instance of a component or virtual function.  You might have a resource-level process that attempts to recover the lost component by simply reloading or restarting it, or by instantiating a new copy local to the original.  But if you need to spin that copy up in another data center, you need to make connections that are outside the domain of the resource control or IaC processes and you have to kick the problem to another level.

This shows a couple of critical points.  Obviously some resource conditions, like the failure of a resource not currently committed to anything, has to be handled at the “IaC level” by NFV, whether we have such a function or not.  You wouldn’t want to deal with that kind of failure only when you tried to deploy.  Second, there are some kinds of failures of resources that could be handled at the resource level alone, and others that would require higher-level coordination because multiple resource types are involved.  There are also things, like a service change initiated by the customer, that could require high-level connection/coordination first, but might then require something at the resource level—setting up vCPE devices on prem for example.

Virtualization introduces not two layers, potentially, but three.  We have services/applications and resources, but also “mapping”.  Resource management is responsible for maintaining the pool, services/applications for maintaining what’s been hosted on the pool, but the “virtualization mapping” or binding process is itself dynamic.  The trend with the cloud and IaC seems to be to presume that resource issues, including mapping issues, are reported as service/application events.  With NFV there is at least an indication of a different approach.

Arguably, NFV presents a three-layer model of “orchestration” (which, by the way, Verizon’s architecture makes explicit).  You have services, then MANO, then the Virtual Infrastructure Manager (VIM).  None of these three layers correspond to IaC because pure resource management is out of scope.  Service-layer orchestration is recognized but not described either.  Presumably, in NFV, resource conditions/events that impact orchestrated deployments are reflected into the VNF Manager.  The MANO-level orchestration is where mapping/binding management is sustained, meaning that any “resource” problems that aren’t automatically remediated at the resource level are presumed to be handled by MANO.  IaC would then be “in” or “below” the VIM.

Where or why, you may be thinking is the integration of IaC with NFV important?  My view is that lifecycle management has to be coordinated wherever there are layers of functionality.  Logically if we have cloud-like IaC going on with NFV resources, then that IaC should be the source of “events” to signal for the attention of higher-layer lifecycle processes, be they MANO or service/OSS/BSS.  If I have an “issue” with resources, the IaC gets the first shot, then signals upward to (hypothetically) VNFM/MANO, and then upward to OSS/BSS or “super-MANO”.

The ETSI ISG has generally accepted the notion of a higher orchestration layer, which is good because the network operators are writing that into their approaches.  The only thing, as I’ve said before, is that if you have multiple orchestration layers you have to define how they communicate, and that should be somebody’s next step.  Defining APIs implies a non-event interface, and there is no question that all of the layers of orchestration create parallel, asynchronous, processes that can’t communicate except through events.

More broadly, the IaC point is another example of what I opened with, the need to contextualize NFV in light of other industry developments in general, and the cloud in particular.  The cloud is much more advanced than NFV in both technology thinking and market acceptance.  It’s framing tomorrow’s issues for NFV today.  The NFV ISG has, like most standards groups, set narrow borders for itself to insure it can make progress rather than “boil the ocean”.  That’s fine as long as the rest of the technology landscape can be harmonized at those borders, and I think IaC makes it clear that efforts to do that may be getting outrun by events.