Top Network Issues for 2016: My Personal View

We’re closing in on 2016 and what will surely be a pivotal year in terms of network operator strategy.  I’ve already blogged about the results of the operators’ fall planning cycle, and I think that offers a pretty clear view of their technology plans.  Because even the best of plans can be contaminated by internal biases (personal and company) I want to offer my own analysis of the year to come.

First and foremost, 2016 is a year when a combination of operations efficiency and new revenue opportunity will come to dominate network operator planning.  The industry has tended to focus on capital budgets, in no small part because Wall Street uses network operator capex as a means of assessing the opportunities in the network equipment space.  However, the Street’s own measure of network operator financials (EBITDA) omits capex, and in any event operators have known for at least two years that there are really no technology measures they could take that would effectively address capex.  That leaves new revenues or lower operations costs to address the revenue/cost-per-bit convergence.

In one sense, this shift in sentiment seems to offer some significant opportunities.  The fact is that the overwhelming majority of the stuff needed to either enhance revenues or reduce operations costs is concentrated in OSS/BSS systems under the CIO.  With one super-sponsor and one technology target, it’s far easier to build a consensus and implement it.  On this basis, you could assume progress would come quickly in 2016.

The problem is that CIOs are the most glacier-trained of all operator executives.  Operations systems have greater inertia than even network equipment.  Thus, despite the fact that top-down planners of network transformation have long accepted the need to drive those top-down changes through OSS/BSS, very little has been done.  Everyone seems to get bogged down in details, meaning that even in OSS/BSS where top-down focus has been clear, people have started technology changes at the bottom and hoped for convergence in some sense.

What seems to be driving changes to this practice in 2016 is the fact that operators are now committed to the “wing and a prayer” strategy of NFV deployment.  “I don’t know whether we understand what an NFV transformation would look like, as a company,” one CIO told me.  “We are committed to evolving from trials toward something, but I don’t really know what that ‘something’ is or exactly how we’ll approach it.”  Put in technology terms, operators are committed to building up from their trials and PoCs, which are overwhelmingly focused on a single service concept, and in the majority of cases on vCPE.  So, here’s the key point for 2016; service-specific silos are not only here to stay, they’re on the rise.

While no operator wants to admit that they’ve invented in NFV a new way to build silos, that’s pretty much what’s happened.  What a few operators are now recognizing is that silo convergence in the new age is probably a technical problem that OSS/BSS could solve.  If that is true, then the silo-convergence mission at the technical level might lead (accidentally?) to a pathway to implement the top-down strategy operators have been grasping for.  Where do you converge silos other than at their top?

The problem with this approach, for vendors at least, is that the big bucks in NFV benefits terms are only available at the silo-top.  If we’re relying on OSS/BSS to converge the silos and deliver the benefits, then the benefit delivery will occur largely after the silos are deployed.  That means well beyond 2016, even discounting the time it might take for CIOs to frame a silo-convergence strategy.

If nobody converges silos explicitly then there’s only one other option to escape them, and I’ll call it the cult of the ginormous silo.  If NFV deployment is the success of a thousand Band-Aids, then one really big one could be a game-changer.  There are, as I’ve noted, only two opportunities for a ginormous silo—mobile/content and IoT.  Mobility (as we’ll see below) is a key factor in another technology shift, and IoT is a darling of the media, but neither of them has shaped a solid vision of how it might change operator costs or revenues decisively.  The best anyone has done is to say “sell 4/5G to every human, and when you run out sell to the gadgets directly.”  Presumably 3G printing would then combine with robots to create autoreproducing gadgets to continue the trend.  Not very insightful.

If nobody pushes the ginormous silo approach effectively, then vCPE will dominate early services.  Here we are likely to generate an interesting dynamic, because the great majority of credible vCPE opportunities are associated with business sites (the remainder are linked to content delivery, which then links back to mobile and social services).  Enterprises have their own technology plans, focused as always on enhancing worker productivity.  The technology vehicle for this is mobility, a transitioning of workers from a desktop-centric “bring work to your workplace” approach to a mobile-centric “take your workplace to where you’re working” model.

Mobility, particularly when viewed as something you’re transitioning to, almost demands a connection network or VPN that is interdependent of, rather than created with, physical network connectivity.  Read, then, a virtual network.  I would contend that social-based services that focus on the person and not on the network have this same need.  In both cases, the logical thing to do is to build an overlay VPN to accomplish what you need.  Unlike most of these “SDN” networks, this overlay would have to be supported on every device, and in parallel with the “normal” Internet connectivity.  You could tunnel over the Internet or create something alongside it—either would work—but you don’t want to be pushing Internet and VPN traffic on VPN for egress somewhere else.

This raises the last of my 2016 technology issues, which is the cloudification of the network.  You can already see that mobile services are impacted by the notion of personal digital assistants that would offer users answers to questions and not access to information.  IoT will develop and extend this trend by building an analytics framework to collate and interpret sensor contextual inputs and combine them with social and retail data.  The result of this will be an increasing shift toward answer delivery from website delivery, to agent handling from search.  That will subduct more and more of the Internet into the cloud, or rather put it behind a cloud edge.

This would create a profound shift in the dynamics of the Internet.  If you could sell the user an agent that answers all their questions, what value does anyone else’s search engines or ad placements have?  Everyone knows that nobody is going to replace Google or Yahoo, but suppose that’s not even the goal any longer?  Suppose that “over-the-top” is now under the covers?

There are network implications too.  A shift of traffic from website-to-user to website-to-agent wouldn’t eliminate access networking because bandwidth is still key for content delivery.  What it would do is create a massive intra-cloud traffic flow, one between data centers.  That could radically increase the high-capacity trunking needs of operators while making the access network more of a simple on-ramp to the cloud.  Big pipe, but a dumb subordinate.

So these are the things I think we should be watching next year.  I know I will be, and I promise to keep you posted on how I’m seeing things develop.

How the “Trial-Silo” NFV Strategy of Operators for 2016 Impacts Vendors

NFV trials are not exercises involving the same set of players, or reflecting the same issues or service goals.  I noted in a blog earlier this week that operators had decided (often by default) that they would pursue evolving PoCs toward services even though it would likely result in silos based on the large difference in players and focuses.  That dynamic could change the fortunes of some of the notable NFV players, and so I want to take a look at them.

First, as I’d noted, an uncoupled vision of NFV evolution would favor OSS/BSS vendors like Amdocs and Ericsson, who can couple silos together above the NFV level.  Neither of these two were “full service” NFV players, but the evolution of operator planning would validate their above-the-fray stance.  The problem for this group of players is that it’s a weak position to sell.  Do operators know, or want to admit, that their NFV trials aren’t likely to promote a common strategy?  Do you, as a vendor, want to cram that truth down their throat?

Pure-play OSS companies like Amdocs seem to understand that their best approach is to focus not specifically on silo integration but rather on operations efficiency and service agility.  Most NFV stories are way beyond being “weak” in reaping these benefits, yet they’re the ones operators say have to drive deployment of NFV in the long run.

Ericsson, on the other hand, is already the most-accepted integrator of the carrier space.  They see the disorder developing and know that eventually all NFV trials will have to converge on common operations and management practices or neither agility nor efficiency will be attained.  They know how to do it, not necessarily in an optimum way but certainly in a profitable way, so they can simply let things play out.

Some of the six vendors who can really do NFV (Alcatel-Lucent, Ciena, HPE, Huawei, Oracle, and Overture) have also taken positions (deliberately or by default) that the new silo model favors.  Huawei wins if NFV fails completely, because price concessions are the only other answer to the narrowing carrier profit margins that become critical in 2017.  Overture has always promoted a service-specific (Carrier Ethernet, MSP, and vCPE) vision of NFV but its technology could be used to spread a unifying ops layer down the line.  Oracle has taken an ops-centric position from the first, so if operations integration wins they get an automatic boost.

For the other vendors on the list, it’s more complicated, so let’s look at them in alphabetical order.

Alcatel-Lucent probably knows that its best approach is to accept silos because it could have the biggest one.  Mobile services and content delivery will be where most operator profit and cost focus will fall, because they’re the largest opportunity and largest capex target already.  Alcatel-Lucent can afford to lay back in other areas because they realize that on scale of deployment alone, they’d win.  It would be easier and cheaper to fit something like vCPE into an NFV infrastructure justified and deployed based on mobile/content than the other way around.

Ciena has an extremely interesting opportunity arising from the trial-diversity-and-silo-creating strategies of operators.  They are the only vendor of the six who have actively embraced silo consolidation and federation as capabilities.  You could put in Blue Planet above other NFV silo implementations of services and use it to harmonize management and operations, and Ciena says so.  That latter point is important because operators tell me they want to exercise vendor capabilities that the vendors own up to in public, not those that are pronounced in whispers in smoke-filled rooms.  Since Ciena also has an optics-up SDN/NFV story, that gives Ciena two shots at achieving a compelling NFV position.

HP (now HPE) has the strongest technical solution to NFV, but they’ve focused on a PoC-and-trial process that doesn’t let them demonstrate their scope effectively.  They may see the trial-and-PoC silo approach as a validation of their own strategy of standing tall in PoCs even if the PoCs don’t make a business case.  I know enough about their modeling and orchestration capability to know they could almost certainly orchestrate across diverse silo implementations, but they have not made that capability a fixture of their positioning.  That means they are at risk to being confined by their sales efforts into a mission far less capable and important than their software technology could obtain.  But marketing assertions have little inertia; all HP has to do to fix their problems is assemble a nice orchestra and sing their hearts out.

The big question, one raised yet again by the Light Reading interoperability test, is whether silo NFV can make enough of a business case even when confined to sweet-spot services.  How long will a service trial for a single service take before operators can say it will pay back?  Answer: We don’t know, nor do the operators.  Some, particularly if you talk to the CFOs, aren’t sure that any silo service will pay back and aren’t sure that anything can unite the silos.

Operators didn’t make a proactive decision to pick at NFV opportunity scraps, they were offered no alternative.  None of the vendors who could tell a full story were willing to do that, probably because such a story would have been complicated to sell both in objective business case terms and in terms of operator political engagement.  At this point, based on what operators have told me about their fall planning results, I think it’s too late to make a broad case for NFV.  A trial process comprehensive enough to do that could not now be initiated and completed in 2016.

It’s probably also too late for a new camel to stick its nose in.  I noted in a prior blog that IoT was potentially the killer carrier for NFV, and interestingly companies like GE Digital have already launched cloud-IoT services that to me clearly demonstrate NFV’s value to the “right” IoT story.  But again, IoT is complicated and operators have gotten off on the wrong track (as Verizon did), misled by opportunism.  I don’t think that IoT/NFV trials would make even as much progress in 2016 as a full-solution operations-driven NFV trial would.

In the midst of this, OPNFV says it’s ready to take on MANO and Light Reading says the same, for its next phase of testing.  Too late.  You can’t really test the underlayment of something without knowing its higher-level requirements.  The ISG, to define a proper operations/management framework, would have to obsolete much of the work it’s already done.  OPNFV has to have something to implement from, and anything they do right at the MANO level will expose limitations in their NFVI and VIM models.  So we are not going to get a unifying standard in time to drive market consensus.

Which takes us back to the silos.  Operators at this point will continue with their PoCs and trials because they have no other options that will mature in time.  Even though these trials will create silos, they are committed to hope these can be somehow unified.  That hope is now the opportunity for every NFV vendor to position for.

What Can We Learn from the Light Reading NFV Tests?

Light Reading has published the first of what they promise will be a series of NFV tests, run in combination with EANTC, and the results are summarized HERE.  I think there are some useful insights in the results, but I also think that there are fundamental limitations in the approach.  I’m not going to berate Light Reading for the latter, nor simply parrot the former.  Instead I’ll pick some main points for analysis.  First, I have to summarize what I perceive was done.

This first set of tests were targeted at interoperability at the deployment level, meaning VNFs deploying on NFVI.  Most of the functional focus, meaning software, would thus fall into the area of the Virtual Infrastructure Manager (VIM), which in all the tests was based on OpenStack.  I can’t find any indication of testing of MANO, nor of any program to test operations/management integration at a higher level.

This is the basis for my “fundamental limitation” point.  NFV is a technology, which means that you have to be able to make it functional, and the Light Reading test does make some good points on the functional side.  It’s also a business solution, though; something that has to address a problem or opportunity by delivering benefits and meeting ROI goals.  We cannot know from the test whether NFV could do that, and I contend that implementations that can’t make a business case are irrelevant no matter how well they perform against purely technical standards.

Now let’s get to the points made by the tests themselves.  As I said, I think some of the results were not only useful but highly evocative, though I don’t think what I saw as important matched Light Reading’s priorities.

The first point is that OpenStack is not a plug-and-play approach to deployment.  This is no surprise to me because we had issues of this sort in the CloudNFV project.  The problem is that a server platform plus OpenStack is a sea of middleware and options, any of which can derail deployment and operation.  The report quotes an EANTC source:  “There were tons of interop issues despite the fact that all NFVis were based on OpenStack.”

The lesson here isn’t that OpenStack isn’t “ready to play an interop role” (to quote the report) but that it’s not sufficient to guarantee interoperability.  That’s true of a lot of (most of, in my view) network-related middleware.  There are dependencies that have to be resolved, options that have to be picked, and none of this is something that operators or VNF vendors really want to worry about.

What we have here isn’t an OpenStack failure but an abstraction failure.  The VIM should represent the NFV Infrastructure (NFVI) no matter what is below, presenting a common set of features to support deployment and management.  Clearly that didn’t happen, and it’s a flaw not in OpenStack but in the ISG specifications.  All infrastructure should look the same “above” the VIM, converted by the VIM into an intent model that can represent anything on which VNFs can deploy.   The specifications are not sufficient for that to happen, and the notion of a fully abstract intent model is absent in any event.

You can see another indication of this abstraction failure in the discussion of interoperability issues.  There are discussions of OpenStack Nova, Heat scripts, and so forth.  None of this should matter; a VNF should never “see” or be impacted by specifics of the implementation.  That’s what VIMs are supposed to cover, creating a universal platform for hosting.  It is unacceptable that this did not happen, period.

The next point is that NFV management is broken.  I’ve noted all along that the decision to create a hazy management framework that includes external or VNF-integrated VNF Managers (VNFMs) has negative ramifications.  The tests show that the decision has one I didn’t predict, which was dependence on a tie between VIM management and OpenStack that was never fully described and isn’t really present.  The VIM abstraction must represent NFVI management in a consistent way regardless of how the NFVI is implemented and how the VIM uses it.  The tests show that too much of OpenStack is exposed in the deployment and management processes, which makes all of the service lifecycle stages “brittle” or subject to failure if changes occur underneath.

The model that’s been adopted for VNFM almost guarantees that lifecycle management would have to be integrated tightly with the implementation of the VIM, and perhaps (reading between the lines of the report) even down to the actual OpenStack deployment details.  That means that it will be very difficult to port VNFs across implementations of NFVI.  The VNFMs would likely not port because they can’t exercise an intent-model-level set of management facilities.

I also have concerns that this level of coupling between VNFM and infrastructure will create major security and compliance problems.  If VNFMs have to run HEAT scripts, then how do we insulate an OpenStack instance from incorrect or insecure practices?  Can we prevent one VNFM (which represents one vendor’s notion of a service for a single user) from diddling with stuff that impacts a broader range of services and users?

The third issue raised in the tests was that NFV spends too much time describing logical inter-VNF relationships and not enough time on describing how the VNFs themselves are expecting to be deployed on the network.  This is a problem that came up very early in ISG deliberation.  Every software-implemented network feature expects certain network connections; they’re written into the software itself.  What the Light Reading test showed is that nobody really thought about the network framework in which VNFs run, and that made it very difficult to properly connect the elements of VNFs or link them to the rest of the service.

The most common example of a VNF deployment framework would be an IP Subnet, which would be a Level 2 network (Ethernet) served by a DHCP server for address assignment, a DNS server for URL resolution, and a default gateway to reach the outside world.  The virtual function components could be connected within this by tunneling between ports or simply by parameterizing them to know where to send their own traffic.  To know that traffic is supposed to follow a chain A-B-C without knowing how any of these are actually connected does no good, and the testing showed that.

But this is only the tip of the iceberg.  As I’ve said in prior blogs, you need a specific virtual networking address and membership model for NFV, just as you need one for the cloud.  Amazon and Google have their own, and Google has described its approach in detail (Andromeda).  Without such a model you can’t define how management elements address each other, for example, and how NFV components are separated from service components.

All of this results from the bottom-up approach taken for NFV specifications.  Nobody would design software like that these days, and while nobody disputes that NFV implementation is going to come through software, we’ve somehow suspended software design principles.  What we are seeing is the inevitable “where does that happen?” problem that always arises when you build things from primitive low-level elements without a guiding set of principles that converge you on your own benefit cases.

So where does this leave us?  First, I think the Light Reading test found more useful things than I’d thought it might.  This was dissipated a bit by the fact that the most useful findings weren’t recognized or interpreted properly, but tests aren’t supposed to solve problems but rather to identify them.  Second, I think the test shows not that NFV is fairly interoperable (they say 64% of test combinations passed) but that we have not defined a real, deployable, model for NFV at all.  Truth be told, the tests show that NFV silos are inevitable at this point, because operators could never wait for the issues above to be resolved through a consensus process of any sort.

But this isn’t bad (surprisingly).  The fact is that the operators are largely reconciled to service-specific, trial-specific, NFV silos to be integrated likely by operations processes down the road.  The points of the test are helpful in identifying that those unifying operations processes will have to contend with.  However, I think that PoCs or trials are the real forums for validating functionality of anything, particularly NFV, and that these vehicles will have to show results for operators no matter what third-party processes like Light Reading’s tests might show.

The Relationship Between Content Delivery and SDN/NFV: It’s About Mobility

Operators have recognized from the first that video probably represents their largest incremental opportunity.  There’s also been a lot of hype around the video market, particularly focusing on the notion that Internet OTT delivery of video would displace all other video forms.  Like most popular notions, this is almost completely unsupported and even illogical, but it’s covering the real issues and opportunities.  Let’s look at the truth instead.

Channelized material represents the great majority of video consumed today, and this material is delivered in what is often called “linear” or “linear RF” form, to a set-top box and to a TV through an HDMI cable or coax.  For material that is delivered on a fixed schedule this is a highly efficient model, and it’s my view that assertions that any company with either fiber to the home or CATV in place would abandon it are inaccurate.  Imagine the traffic generated by a million viewers of a popular TV show if all those views were IP-based.

Where IP is increasingly a factor is in delivery of video that is either off-schedule (on demand) or delivered to a mobile device or computer.  The popularity of mobile video has grown as people have become dependent on their smartphones and tuned to having entertainment access through them.  I think that the trend toward online delivery of traditional TV-on-demand reflects the fact that mobile use of video creates a platform with favorable exploitation costs—better than you could have in trying to build linear-RF solutions to TV on demand.

If mobile drives on-demand viewing trends, then it’s fair to ask how mobile changes video content delivery overall.  There are clearly “back-end” issues impacting content delivery networks (CDNs), but there are also viewing-habit changes that could have a profound impact on video overall.

Wireline content delivery is well understood.  The access network is an aggregation hierarchy, with consumer connections combined into central office trunks, and then further aggregated at the metro/regional level.  The goal for most CDNs was not to optimize this hierarchical structure, but to avoid the variable but generally significant performance variations that would arise were content to be streamed over Internet peering connections.  Instead, the CDN caches content close to the head-end point of an aggregation network.

With traditional CDNs, a user who clicks on a URL for video is redirected (via DNS) to a cache whose location is based on whatever quality of experience or infrastructure optimization strategies the network operator applies.  The content is then delivered from that point throughout the experience.  Cache points can be aggregated into a hierarchy too, with edge cache points refreshed by flow-through video and deeper ones filled in anticipation of need.

Mobile video changes this in two important ways.  First, the mobile user is mobile and so the optimum cache location is likely to change over time.  This is particularly important because mobile infrastructure has many aggregation points as towers aggregate into Evolved Packet Core elements and onward to gateways.  It’s also typically more expensive and more easily overconsumed than wireline.  Second, the mobile user is often highly socially driven in terms of consumption.  People don’t usually share their living rooms and view collectively, but almost every mobile user will be a part of an active social group (virtual or real) and the habits and behaviors of others in the group will impact what any member decides to view.

For both SDN and NFV, the dynamism inherent in mobile CDN operation presents an opportunity.  SDN could be a cheaper and faster way of defining transient paths between the cell sites supporting mobile users and their content cache points.  NFV could be used to spin up new cache points or even to build “video distributor” points that would repeat video streams to multiple users.  Some network operators have expressed interest in what could be called “scheduled on-demand” viewing, where a mobile user joins a community waiting for delivery of something at a specific time.  This would enable video delivery “forking” to be established.  Fast-forward, pause, or rewind would require special handling but operators say some market tests show users could accept a loss of these features for special pricing or special material (live streaming, for example).

Dynamic CDN is fine, but it’s the social-viewing changes that could have a profound impact.  Twitter is often used today to comment on TV shows, either among friends or in real time to the program source.  This model doesn’t require additional technical support, but with some new features the social integration of content could be enhanced.

Tweeting on scheduled TV relies on the schedule establishing an ad hoc community of users whose experience is synchronized by transmission.  Most useful enhanced social features/services would be associated with establishing simultaneous viewing for a community of interacting users (like the “forking” mentioned above) that would insure that they could all view the same thing at the same time.  Users could then interact with each other on the content-shared experience, and could invite others to join in the experience.

A variation on this social sharing would use metadata coding of the content.  Coding of scenes by an ID number or timestamp would allow a social user to refer to a scene and to synchronize others to that specific point.  It could also be used to jump from one video to another based on the coding—to find other scenes of a similar type, with the same actors/actresses, etc.  It would also be possible to rate scenes and find scenes based on the ratings of all users, of a demographic subset, or among a mobile user’s social contacts.

You can see that social video viewing/sharing would change the dynamic of content delivery.  Most obviously, you’d probably want to spawn a process to fork such sharing experiences from a common cache if the people were together, and perhaps even to make each viewer a parallel controller of the experience—one person pauses for all, for example.  You might also want to create multiple independent video pipes to a viewer if they’re browsing by metadata, and you’d need a database application to actually do the browsing.

As video content viewing becomes more social, it starts to look a bit like videoconferencing, and I think that over time these two applications would combine.  A videoconference might be a multi-stream cache source with each participant a “sink” for whichever streams they wanted to view.  They could then rewind, pause, etc.  And there are already many applications in healthcare (as a vertical) and compliance (as a horizontal) where metadata coding of conference content would be highly valuable.

Video of this sort could become a powerful driver for both SDN and NFV, but I don’t think it would be easy to make it a killer app, the app that pulls through a deployment.  Consumer video is incredibly price-sensitive, and operators will be pressed to make a business case for mass deployment in a market with low margins like that.  Still, if I were a mobile and video vendor (like Alcatel-Lucent) I might be taking a serious look at this opportunity.  At the least it would guarantee engagement with the media and consumer population.

I think the video opportunity for SDN and NFV shows something important at a high level, which is that NFV and SDN are not driven as much by “virtualization” as by dynamism.  A virtual router in a fixed place in a network for five years doesn’t need SDN or NFV, but if you have to spin up one and connect it for a brief mission, service automation efficiency is the difference between being profitable and not.  That’s an important point to remember, because most of our current network missions are static, and fears of thinking outside the “box” in the sense of physical-network constraints and missions could compromise both SDN and NFV’s long-term benefits.

APIs for NFV Operation: A High-Level Vision

There are a lot of technical questions raised by NFV and even the cloud, questions that we cannot yet answer and that are not even widely recognized.  One of the most obvious is how the component elements are stitched together.  In NFV, it’s called “service chaining”.  Traditionally you’d link devices using communications services, but how to link software virtual devices or features isn’t necessarily obvious from past tradition.  I think we need to look at this problem, as we should look at all problems, from the top down.

A good generalization for network services is that every device operates at three levels.  First, it has the data plane, which it passes according to the functional commitments intrinsic to the device.  Second, it has a control/signaling plane that mediates pair-wise connections, and finally it has a management plane that controls its behavior and reports its status.

In NFV, I would contend that we must always have a management portal for every function we deploy, and also that every “connection” between functions must support the control/signaling interface.  A data-plane connection is required for functions that pass traffic, but is not a universal requirement.  Interesting, then, is the fact that we tend to think of service chaining only in terms of connecting the data paths of functions into a linear progression.

Because we have to be able to pass information for all three of our planes, we have to be able to support a network connection for whatever of the three are present.  This connection carries the information, but doesn’t define its structure, and that’s why the issue of application programming interfaces (APIs) are important.  An API defines the structure of the exchanges in “transactional” or request/response or notification form, more than it does the communications channel over which they are completed.

I believe that all management plane connections would be made via an API.  I also believe that all signaling/control connections should be made via APIs.  Data plane connections would not require an API, only a communications channel, but that channel would be mediated by a linked control interface.  Thus, we could draw a view of a “virtual function” as being a black box with a single management portal, and with a set of “interfaces” that would each have a control API port and an optional data port.  If the device recognized different types of interfaces (port and trunk, user and network, etc.) then we would have a grouping of interfaces by type.

Going with this model in an example might help.  Let’s suppose we have a function called “firewall” designed to pass traffic between Port and Trunk.  This function would then have a management port (Firewall-Management) with an API defined.  It would also have a Firewall-Port and Firewall-Trunk interface, each consisting of a control API and a data plane connection.

Let’s suppose we had such a thing in a catalog.  What would have to be known to let us stitch “firewall” into a service chain?  We’d need to know the API-level and connection-level information.  The latter would be a matter of knowing what OSI layer was supported for the data path (Ethernet, IP) and how we would address the ports to be connected, and this is a place where I think some foresight in design would be very helpful.

First, I think that logically we’d want to recognize specific classes of functions.  For example, we might say we have functions designed for data path chaining (DPC, I’ll call it), others to provide control services (CTLS), and so forth.  I’d contend that each function class should have two standards for APIs—one standard representing how that class is managed (the management portal) and one that defines the broad use of the control-plane API.  So our firewall function, as a DPC, would have management exchanges defined by a broad DPC format, with specificity added through an extension for “firewall” functions.  Think of this as being similar to how SNMP would work.

The management plane should also have some of the TMF MTOSI (Multi-Technology Operations Systems Interface) flavor, in that it should be possible to make an “inventory request” of a function of any type and receive a list of its available interfaces and a description of its capabilities.  So our firewall would report, if queried, that it is a DPC device of functional class “FIREWALL”, and has a Port and Trunk interface both of which are a control/data pairing and supported via an IP address and port.

This to me argues for a hierarchy of definitions, where we first define function classes, then subclasses, and so forth.  All DPC functions and all CTLS functions would have a few common management API functions (to report status) but more functions would be specific to the type of function.  A given implementation of a function might also have an “extended API” that adds capabilities, each of which would have to be specified as optional or required so the function could be chained properly.

An important point here is that the management APIs should be aimed at making the function manageable, not at managing the service lifecycle or service-linked resources.  Experience has shown that pooled assets cannot be managed by independent processes without colliding on actions or allocations.  That’s long been recognized with things like OpenStack, for example.  We need to centralize, which means that we need to reflect the state of functions to a central manager and not reflect resource state to the functions.

To continue with the example of the firewall, let’s look at the initial deployment.  When we deploy a virtual function, we’d check the catalog to determine what specific VNFs were available to match the service requirements, then use the catalog data on the function (which would in my view match the MTOSI-like inventory) to pick one.  We’d then use the catalog information to deploy the function and make the necessary connections.  Each link in the chain would require connecting the control and data planes for the functions.

In our firewall, the control link on the PORT side would likely be something GUI-friendly (JSON, perhaps) while that on the TRUNK side would be designed to “talk” with the adjacent chained element, so that two functions could report their use of the interface or communicate their state to their partner.  We might envision this interface as JSON, as an XML payload exchange, etc. but there are potential issues that also could impact the management interface.

Most control and management interfaces are envisioned as RESTful in some sense, meaning that they are client-server in nature and stateless in operation.  The latter is fine, but the former begs the question of duplex operation.  A function needs to accept management commands, which in REST terms would make it a server/resource.  It also needs to push notifications to the management system, which would make it a client.  We’d need to define either a pair of logical ports, one in each direction, or use an interface capable of bidirectional operation.

What interface fits the bill?  In my view, it’s not necessary or perhaps even useful to worry about that.  The key things that we have to support in any management or control API is a defined function set, a defined information model, and a consistent set of interactions.  We could use a large number of data models and APIs to accomplish the same goals, and that’s fine as long as we’re really working to the same goals.  To me, that mandates that our basic function classes (DPC and CTLS in this example) define states and events for their implementations, and that we map the events to API message exchanges that drive the process.

How might this work?  Once a function deploys on resources and is connected by the NFV management processes, we could say the function is in the DEPLOYED state.  In that state its management interface is looking for a command to INITIALIZE, which would trigger the function’s setup and parameterization, and might also result in the function sending a control message to its adjacent elements as it comes up.

This sounds complicated, doesn’t it?  It is.  However, the complexity is necessary if we want to build services from a library of open, agile, selectable, functions.  The fact that we’ve done very little on this issue to date doesn’t mean that it’s not important, it just means we still have a lot of work to do in realizing the high-level goals set for NFV.

What’s Ahead for NFV in 2016?

The fall is an important time for operators, because they have traditionally embarked on a technical planning cycle starting mid-September and running forward to mid-November.  This is the stuff that will then be used to justify their budget proposals for the coming year.  We’ve now finished that cycle for the 2016 budget, and I’ve been getting feedback from operators on what they’ve decided with respect to NFV.

The good news for NFV is that over 90% of operators say that they will continue to fund NFV trials in 2016.  Very few are saying their current projects will be ending, even to be replaced by other activities.  That’s the bad news, though.  Only about 15% of the operators said they were confident they would deploy “NFV services” in 2016 and less than half thought they would get to field trials.

The operators who have made the most progress toward NFV have made it by bounding their missions with NFV.  Of the operators who said they would be deploying services, well over 90% said their target service was “vCPE”, with about two-thirds of these basing their deployment on premises hosting of agile features rather than in-the-cloud service chaining.  Managed service providers (MSPs) who add a management/integration layer to wholesaled transport from others make up about half the vCPE group.

The reason this is important is reflected in the next point.  Only about a fifth of the operators who say they plan deployments in 2016 say they have plans to extend their initial NFV commitment to other service areas—even beyond 2016.  Further, while nearly all this deploying-services group describe what they are doing as “NFV”, fewer than 1 in 10 base their deployment plans on a real, full, NFV implementation.

A Tier One who offers managed services offered the best description of what’s happening.  NFV, says the CFO of this operator, is attractive as a framework for delivering portal-driven agile feature delivery to customers.  They see managed services as an opportunity to climb the value chain, but their vision of implementation is really more about a service catalog and portal linked to a simple CPE management system that can load features than about full NFV.

This operator does have further NFV plans, though.  They are looking at services in IoT and mobile content for the future, and they see something more like “real” NFV as a pathway to deploy services in both areas almost on a trial basis.  “It’s the fast-fail model,” the CFO says.  “We deploy generalized infrastructure and we use that to test service concepts the only way that really works…by trying them in the market.”  This is what that particular operator needs service agility for.

There is a question of whether even this is “real NFV” though, and even in my reference operator.  The CFO admits that the operations organization and CIO have only been recently engaged in the activity, and that they are still unsure as to how they’d operationalize an NFV infrastructure and its services.  Their early target services are those where either there is no specific SLA experience in the market to conform to, or where SLAs are largely met by managing the infrastructure (NFV and legacy) as a pool of resources.  “NFV management for us is data center management plus network management,” my CIO friend admits.  “We don’t have a strategy yet that can unify them.”

All of this seems to point to an NFV evolution that’s far from the traditional way services and infrastructure have evolved in the past.  We have not been able to make a broad business case for NFV yet, though at least half-a-dozen vendors could in my view deliver the necessary elements.  Given that, what has happened is that operators have effectively extended the early trials and PoCs that aim toward a specific service-of-opportunity, hoping to prove something useful out along the way.  But doesn’t this present that old bugaboo of the silo risk?  It sort of does, and here we have some interesting commentary.

“NFV silos don’t worry us as much as silos of legacy gear,” the CFO admits.  “NFV’s primary cost is the new data centers—the servers.  The secondary cost is operations integration.  We don’t have the primary cost at risk at all with NFV because the infrastructure isn’t service-specific.  We don’t have ops integration yet so that’s not a potential sunk cost.  So we have a risk-less silo.”

That’s not a long-term strategy, as well over 80% of operators with specific NFV plans for 2016 will admit.  They believe they will be evolving toward that utopian true NFV goal.  The problem, they say, is that they can’t reach it yet.

What?  Do operators think nobody can deliver on NFV as a whole?  That’s pretty much what CFOs and CIOs think, for sure.  When I tell operators that six specific vendors would be able to make such a business case, they smile at me and say “Tom, it’s great that you can tell us that and even name the vendors, but you have to understand that we can’t deploy based on your assurances.  The vendors have to make the claim and back it up, and they are not doing that.”

That was the key result of my discussions, I think.  Operators are saying that it doesn’t matter if you have a complete NFV strategy because you probably can’t or won’t sell it to the operators.  Everyone has gotten so focused on islands of NFV that the whole NFV process has evolved to letting clumps of stuff driven by the wind and currents aggregate into little islands that will in time eventually make up a significant land mass.

In a vendor sense, who then does the aggregating, the building of that full NFV?  Not the early vendors, and in fact probably not even vendors who have NFV solutions.  Operators are seeing this increasingly as an operations step taken after early per-service trials.  Because CIOs haven’t been fully engaged so far, we have operation-less NFV, and the CIOs want to add operations by adding OSS/BSS tools above whatever NFV stuff lives below in the service layer.

If all this holds in 2016 it could have profound impact on vendors.  Smaller players might hope to drive service-specific NFV solutions that could in time be collected at the OSS/BSS level.  Larger vendors, though, will have to present something much more OSS/BSS-intensive as their central element because only that story would appeal to the CIOs who will likely drive the next step—the step beyond 2017 when NFV silos are collected.

I don’t think this is the optimum way to do things.  I think it will string NFV deployment out for five years when it could have been done in three.  I also think that we’ll sacrifice a lot of efficiency and agility by failing to fully integrate infrastructure orchestration and management with operations orchestration.  I’d have bet the outcome would have been different, but I’d also have bet that the vendors who can do full NFV right now would have pushed their capabilities more aggressively.  Now, according to operators, the opportunity to have a MANO-driven merger of infrastructure and operations has passed.

This isn’t going to stop NFV.  The risk that operators would drop it and look for solutions elsewhere doesn’t appear to be significant according to my discussions with operators themselves.  But it will change it, and how those changes will work through NFV functions and standards, and how it will impact vendor and market dynamics, will be one of my blog focuses for the coming year.

The Access Revolution: What’s Driving It and How do We Harness It?

All networking reduces to getting connected to your services, which means access.  In the past and in a business sense, the “divestiture” and “privatization” trends have first split access from long-haul, then combined it.  The Internet has also changed access networking, creating several delivery models inexpensive enough to serve consumers.  Today, virtualization is creating its own set of changes.  So where is access going?

The first important point to make is that the notion that it’s “heading to multi-service” is false.  It’s been there for decades.  The evolution of “the local loop” from analog voice to multi-service voice/data/digital started back in the era of ISDN.  Cable companies have confronted voice/data/video for over two decades.  It’s less how many services than how services are separated.

What is true is that consumer data services based on the Web generated new interest in multi-service access because consumer data needs rapidly evolved to the point where voice traffic was a non sequitur on a broadband path.  “Convergence” meaning the elimination of service-specific silos is just good business.  And consumer Internet and VoIP were the dawn of the real changes in access.

Many, perhaps in many geographies, most of us use VoIP.  Most stream video as an alternative to channelized linear RF delivery.  The simple truth is that for the consumer market, the access revolution is a predictable consequence of the increased demand for data/Internet capacity.  The bandwidth available is exploitable under the Internet model by any service (hence Skype) and that drives a desire to consolidate services onto that new fat pipe, displacing service-specific access and its associated cost.

Business services haven’t moved as decisively.  In part, that is because businesses were always consumers of both voice (TDM) and data and there was no sudden change such as the one that drove consumer Internet demand.  Over time, though, companies increased their own data demand in support of worker productivity gains, and also put up ecommerce sites.

Where we are now with access relates to this business evolution.  Companies have typically retained service-specific access technology, though TDM voice access is rapidly being replaced by VoIP via SIP trunking.  At the same time, though, physical access media has begun to shift more to fiber, which means that we’ve seen first consolidation of access channels on the same fiber trunks, and more recently we’re starting to see access multiplexing climb the OSI stack toward L2/L3.

It’s this ride up the OSI ladder that’s being driven increasingly by virtualization.  Network virtualization, NaaS, or whatever you want to call it doesn’t have to be classic SDN, it could be about tunnels, or MPLS, or whatever.  The point is that if you have a service at a given OSI level, you can use some form of multiplexing below that level to create ships-in-the-night parallel conduits that share the access media.  You can do this multiplexing/virtualization at L2 if you have L3 services, and so forth.  You have multi-service at the level of service consumption, but you may have any service from OSI Level 1 through 3 down below as the carrier.

Virtualization is a more dynamic solution than threading new fiber or copper, and the increased potential dynamism facilitates dynamic service delivery.  We all know that SDN, NFV, and the cloud all postulate ad hoc services, and if those services were indeed to grow in popularity and become significant consumers of bandwidth, they would tend to break the static-bandwidth access model of today.

Dynamism at the service level may drive access changes, but access changes then drive changes to services, even the basic connection services.  You can sell somebody extemporaneous capacity to ride through a period of heavy activity, but only if the access connection will support that incremental capacity.  Turning up the turbo-dial isn’t useful if you have wait two weeks or more to get your access connection turned up.

Our vision of elastic bandwidth, so far at least, is surpassingly shortsighted.  I’ve surveyed enterprises about how they’d use it, and in almost nine of every ten cases their answer boils down to “Reduce cost!”  They expect to cut back on their typical bandwidth and burst when needed above it.  If that doesn’t turn out cheaper for them, forget elastic bandwidth.  That means that business service access changes probably have to be driven by truly new, higher-layer, services rather than tweaks to basic connectivity.

Even new services like cloud-hosted applications or NFV-delivered virtual features can be inhibited by lack of access capacity.  If the new service or feature requires more bandwidth, it may be impossible to deliver suitable QoE over the same tired access path—unless the operator had the foresight to pre-deploy something faster.  Two operators, serving the same customer or even building, might compete as much on residual capacity as on “price” in a strict sense.  “Bandwidth in waiting”, waiting for something new to deliver, means it’s waiting for operators to exploit to gain new revenues.  This is the trend now driving access evolution for business services.

The business service flagship, Carrier Ethernet, shows this trend most clearly.  The MEF’s Third Network concept is an attempt to define, first and foremost, an access line as a pathway for multiple virtual networks, and that’s the end-game of the current trends.  The Third Network redefines Carrier Ethernet, but at the same time redefines what it carries.  As ad hoc provisioning of services becomes possible, services that benefit from it become business targets to operators.  Further, if access limitations resolved through virtualization were necessary to make ad hoc services work, it follows that those limitations virtualization cannot address—the basic capacity of the pipe—have to be somehow minimized or it hurts service evolution.

One thing this would do is drive much more aggressive fiber deployment to multi-tenant facilities.  Even the facility owners might want this sort of thing, and we already have some facilities in some areas served by multi-fiber bundles from several operators.  Imagine what will happen if we see more dynamic services, and if elastic bandwidth actually means something!

That means that “service multiplexing” and ad hoc services versus ad hoc capacity is the key.  Cloud computing, offsite data storage, anything that has additional capacity requirements as an offshoot of the service delivery, is the only credible type of driver for access virtualization changes on a large scale.  Any of these could produce a carrier revenue model based on ad hoc service sale dependent on ad hoc capacity availability.  That implies oversupply of access capacity.

The question is whether the revenue model for pre-positioning access assets could be made to work.  Certainly at the physical media level, as with fiber deployment, it makes sense.  However, physical media doesn’t become an OSI layer without something to do the conversion.  We’d need to think about how to either up-speed connections dynamically or meter the effective bandwidth of a superfast pipe unless the buyer pays to have the gates opened up.  We also need to think about how to utilize “services” that appear ad hoc on an access line.  How do you distinguish them from an elaborate hacking attempt?

That’s the forgotten point about access evolution, I think.  More often than not, we have static device and interface facilities feeding a static connectivity vision.  We’ll have to work out a way to harness dynamism by converging changes in service delivery and service consumption to match our new flexibility.  Otherwise access evolution could be just another trend that falls short.

SDN/NFV: We Don’t Need Everything We Think, but We DO Need Some Things We’re Not Thinking Of

Revolutions have their ups and downs.  “These are the times that try men’s souls,” said Tom Paine before the American Revolution, and the current times are probably trying the souls of many an SDN or NFV advocate.  For several years, we heard nothing except glowing reports of progress in both areas, and now we seem to hear little except statements of deficiency.  Both these conditions fail the market because they don’t conform to reality, which is somewhere between the glow and the gloom.  Obviously it would be nice to find a technical path that takes us there.

It’s easy to make progress when you’ve defined what “progress” means.  SDN and NFV both took the easy way out, starting down in the technical dirt and picking a little incremental piece of a very large and complicated pie.  That was dumb, to be sure, but it’s hardly fatal.  Even if we leave aside the ongoing question of business case and look to technology issues, we’re not really too deep in the woods…yet.

The biggest problem both SDN and NFV now face is that an effective operational umbrella and a complete picture of infrastructure have yet to be offered.  Given that services have a variable dependency on both operations and infrastructure depending on their specific features and targets, that means that it’s very doubtful that a vision for “total SDN” or “total NFV” could emerge from PoCs or use cases.

Absent an architectural vision to unify all these efforts as they progress, we’re already seeing what one operator characterized as “The Death of Ten Thousand Successes”, meaning an explosion of silos built around vendor and emphasis differences.  In many cases these silos will also be layered differently, approaching EMS, NMS, SMS, OSS, BSS, and whatever-SS differently.  It would do no good at this point to define a unified approach; it’s too late to take it.  What we need instead is to unify all our approaches in some way.  Fortunately that way has already presented itself in the principles of virtualization–abstraction.

An abstraction, or a “black box” or “intent model” is a functional concept or external behavior set that’s separated from its possible resources through a representation model.  “Black box” is a great term for this because it reflects the essential truth—you can’t see in, so you only know a black box by its properties, its interfaces and their relationships.

The reason this is nice is that such a black box can envelope anything, including any SDN or NFV silo, any interface, any network or IT element or service.  No matter what you do, or have done, it can be reduced to a black box.

Looking from the outside in, black boxes anonymize the implementation of something.  Looking from the inside out, they standardize it.  You can wrap something in a black box and transform how it looks, including from the management perspective.  It’s easiest to understand this by going to my description of an intent model from earlier blogs, to the notion that intent models or black boxes representing services have an associated SLA.  You could think of this SLA as a local data model, since everything that’s variable and knowable about a service should be thought of as being part of or contributing to an SLA.

Let’s take this principle and apply it at the top.  If we have an SDN controller or an NFV domain, we could wrap it in an intent model or black box.  Once inside it’s just an implementation of something, so a “service” created by either SDN or NFV is a realization of intent, an implementation inside the black box.  A legacy solution to the same service would also be such an implementation, so we could say that black box wrappings equalize the implementations.  And, if we assume that the black-box model of an SDN or NFV or legacy service has the same SLA (as it must, if it’s an implementation of the same abstraction) then we can say that the management variables of all this stuff is identical.

Now we get to the good part.  Because all of the implementations inside a black box are exposing the same management variables and features, they can all be managed in the same way.  Management is simply one of the properties abstracted by the black box or intent model.  It can also be “orchestrated” in the same way, meaning that everything that can be done to it on the outside can be done regardless of what’s inside.

Harmony at any level, within SDN or NFV, could be achieved using this approach.  If an SDN controller domain is an intent model or black box, then its management properties can be the same regardless of who implements it, or the boxes underneath.  They’re also the same as the properties asserted by real switches and routers or NFV-deployed virtual switches and routers, and all of these are the same regardless of the vendor.  If an NFV resource pool is represented by a series of intent models for the type of things they can do, then any set of hosting and connection features that can be combined to do anything useful can be represented, and the implementation can be anything that works.

With this approach the arguments on what the “right” data model is for NFV are moot.  As long as you model intent the way it’s expressed is a minor issue, one that software developers resolve regularly with transliteration processes for data and Adapter Design Patterns for interfaces.  The discussions on interfaces are moot too, because you can transform everything with these models.

What this represents for the SDN community, the NFV community, and the OSS/BSS community is profound.  You can pull this all together now, with a single stroke.  Further, if you focus orchestrating on combining intent models into services and focus management and operationalization on managing and operationalizing intent models, you can have one OSS/BSS/NMS model for everything.  All you have to do is make sure that what is inside an intent model translates between that model’s SLA goals and the internal steps represented.

We hear a lot about orchestration and modeling today, and it’s clear that having a single approach that crosses SDN/NFV and legacy, OSS/BSS and NMS, would have helped.  We can still have it.  You could build such a product and apply it at the top, the middle, the bottom.  Best of all, at least some of the products already built seem to have a lot of that capability.

I’ve always said there were six vendors who could make a business case for NFV.  These also incorporate at least some SDN and legacy vision, but it’s not always clear how these work in detail.  In particular, it’s not clear whether the modeling and orchestration meets black-box or intent-model standards.  Based on public documentation that I’ve seen, I can say that Ciena and HP are almost certainly capable of doing that which is needed.  Overture probably can, and I think Alcatel-Lucent, Huawei, and Oracle are at least looking at the approach.  While all these vendor-specific solutions are invitations to silos, that doesn’t hurt much in an intent-modeled black-box-seeking world.

What does hurt is a lack of acceptance of intent model principles lower down.  Both SDN and NFV need to look at their interfaces in those terms, and while there is some black-box momentum in the standards processes for both these areas it’s not yet fully percolated thinking.  I’d sure like to see the notion move quicker, because if it does then we would be closer to a deployable model of next-gen network services.

Finally, an Actual IoT Offering

I admit that in past blogs I have vented about the state of insight being demonstrated on IoT.  It would be far easier to provide a list of dumb things said and offered in the space than a list of smart things.  In fact, up to late last week I couldn’t put anything on the “smart thing” list at all.  Fortunately, things have changed because of information I’ve received from and discussions I’ve had with GE Digital.

Everyone who lives in even a semi-industrial society has likely been touched by GE at some point; the company is an enormous industrial and consumer conglomerate.  They created a new unit, GE Digital, to handle their growing software business, and it’s GE Digital that’s offering Predix, which they call “a cloud platform for the industrial Internet”.  Yes, it is that, but under the covers Predix is how the IoT should have been visualized all along.

If you recall my blogs on IoT, particularly the most recent one, you know that I’ve advocated visualizing IoT not as some “Internet extension” or “LTE opportunity” or even as a “network” but as a big data and analytics application.  My model of three ovals—big in the middle and a little one on top and at the bottom, reflects a view that real IoT will be a repository/analytics engine (the middle oval) connected to sensors and controllers at the bottom oval, and accessed by applications at the top.  This is essentially what Predix creates.

The central piece of Predix, the “Industrial Cloud”, is the repository and cloud platform plus a set of supporting applications that include analytics.  It’s fed from sensors and connected to controllers through a software element called a Predix Machine.  These can interface with (using proper adapters) any sensor/controller network technology, so this is my bottom oval.

You can have a hierarchy of Predix Machines, meaning that you can have one controlling a single device, a collection of them, a floor, a factory.  Each machine can do local analytics and respond to events based on locally enforced policies.  They can also generate events up the line, and this structure keeps control loops short and reduces central processing load, but the central repository can be kept in the loop through events generated or passed through.

Speaking of events, they could be generated by analytics operating on stored data, or on real-time streams through event recognition or correlation.  Events can change the repository and also change the state of the real systems, and in all cases they are processed by software that then decides what to do.  As I noted, some of that software can be strung along a Predix Machine hierarchy, some can be inside the Industrial Cloud, and some could be in external applications linked by APIs.

The top oval is a set of APIs available to GE Digital and developer partners to build either general or industry-specific applications.  There’s a Predix Design System that provides reusable components, developer frameworks to support specific developer types and goals, and a UI development environment based on Google’s Polymer (designed to build highly visual, interactive, and contextual user experiences).

Inside Predix there’s the concept they call the “Digital Twin”.  This is a kind of virtual object that’s a representation of a device, a system of functionally linked devices, a collection of devices of a given type, or whatever is convenient.  A model or manifest describes the elements and relationships among elements for complex structures, and the Digital Twin links (directly or through a hierarchy) to the sensors and controllers that actually represent and alter the real-world state the Digital Twin represents.  You can kind of relate a Digital Twin to social networks—you have individual profiles (Digital Twins of real humans or organizations that collect real humans) and you have any number of ad hoc collections of profiles representing things like demographic categories.  A profile would also be a network of “friends”, which means that it’s also representing a collection.

GE builds the “Digital Twin” of all its own technology, and you could build them for third-party elements or anything else as long as you provide the proper model data that describes what’s in the thing and how the innards relate to each other.  The Digital Twin provides a representation of the real world to Predix, collecting data, recording relationships, and providing control paths where appropriate.

One of the benefits of this Digital Twin approach is that Predix understands relationships or object context explicitly, and also correlations among objects.  If you look at a given profile in social media, you can see who it relates to.  Same with Digital Twins but in more dimensions.  A piece of an engine is part of the engine and part of a broad collection of that particular piece in whatever other things it’s also part of.  You can then gather information about that specific thing and from how it’s behaving elsewhere, and predict what might happen based on the current state of that single real thing and on the behavior of what’s related to it.  GE Digital has a blog about this.

You can analyze things in a time series too, of course.  You can correlate across classes of things to follow Digital Twin paths, project conditions from the general class to specific objects, and project the result out into the future for essentially any period where the asset you’re talking about has value.  The modeling used to define the Digital Twins lets you contextualize and data and policies let you define access and usage of information for security and governance.

Another interesting principle of Predix that directly relates to the popular vision of IoT is the notion of closed-loop operation.  The concept of “M2M” has been extended by IoT enthusiasts to apply to the idea that refrigerators talk to sinks, meaning that two machines could interact directly.  Even a cursory look at that notion should demonstrate it’s not practical; every device would have to understand how to interpret events sourced from another and how to act on them.  In Predix, closed-loop feedback from sensor to control is handled through a software process intermediary that does the analysis and applies policies.

The notion of closed-loop feedback also introduces yet another Predix concept that I think should be fundamental to IoT, which is “Outcome as a Service”.  OaaS says that in “thing systems” like IoT would generate, the consumer of the system is looking for an outcome.  “Get me to the church on time!” is an example of an outcome, and it would be dissected into routes, traffic analysis, vehicle condition, size, and speed, driver proclivities based on past history, etc.  OaaS is probably the most useful single concept to come along in IoT because it takes the dialog up to the top where the value to the user lives.

In an implementation sense, Predix is a cloud application.  Everything is implemented as a microservice that combine to create an evolving PaaS that future elements (including those developed by third parties) can support.  There are also DevOps tools to deploy applications and microservices, and “BizOps” tools to manage the cloud platform itself.  To say that Predix is an IoT cloud is no exaggeration.

Even in a blog over a thousand words long, I can’t do more than scratch the surface of Predix, and I don’t have any specific insight into what GE Digital might do to promote it widely or apply it to generalize IoT problems.  Their specific application is the “Industrial Internet” remember.  But this application, which includes transportation, healthcare, energy, and manufacturing, has enormous IoT potential and could generate enormous benefits (in fact, it already has to early customers).  All of that would make Predix a great on-ramp to a broad IoT success.

IoT is like a lot of other things in our age, including SDN and NFV.  You can nibble at little pieces and control your risk, cost, and effort, but the results will be little too.  The trick is to find early apps that are so beneficial they can justify significant infrastructure.  In the four key verticals GE Digital is targeting, you can see how a Predix deployment around the core (GE-supplied and third-party) technologies could build a lot of value and deploy a lot of technology.  The incremental cost of adding in peripheral (and yes, pedestrian) things like thermostats and motion and door sensors would be next to nothing.  These applications then don’t have to justify anything in the way of deployment, and they are all pulled into a common implementation framework that’s optimized for hardware and software reuse and for operations efficiency.

I think GE Digital under-positions Predix, and that the material is far too technical to be absorbed by the market overall.  This reflects the “industrial” flavor of the offering.  GE Digital is also seeing this more as a service than as a product, which would make it difficult to promote broadly—to network operators to offer, for example.  All these issues could be resolved, and most very easily, of course.  In any event, even the success of one rational IoT framework could change the dialog on the IoT topic.

We need that.  There might not be more IoT hype than for technologies like SDN or NFV, but there’s darn sure worse hype, and IoT is a newer concept.  The barriers to become a strident IoT crier are very low; anything that senses anything and communicates.  We’ve made a whole industry out of nonsense, when the real opportunity for IoT to reshape just about everything in our lives is quite large, and quite real.  I hope Predix will change things.

 

 

A Look at the MEF’s “Third Network”

There are a lot of ways to get to the network of the future, but I think they all share one common concept.  Services are in the eye of the beholder, meaning buyer, and so all services should be viewed as abstractions that define the connectivity and SLA they offer but are realized based on operator policies and resource versus service topologies.  In the modeling sense this is the “intent model” I’ve blogged about, and in the service sense it’s Network-as-a-Service or NaaS.

SDN and NFV have focused on what might be called “revolutionary NaaS”, meaning the specification of new technologies that would change infrastructure in a fundamental way.  Last year, the Metro Ethernet Forum embarked on what it called the “Third Network”, and this defines “evolutionary NaaS”.  I noted it at the time but didn’t cover the details as they emerged because it wasn’t clear where the Third Network fit.  Today, with SDN and NFV groping for a business case, evolutionary NaaS sounds a lot better.  Some operators are looking at the approach again, so it’s worth taking some time now to have a look ourselves.

According to the MEF, the Third Network “combines the on-demand agility and ubiquity of the Internet with the performance and security assurances of Carrier Ethernet 2.0 (CE 2.0).  The Third Network will also enable services between not only physical service endpoints used today, such as Ethernet ports (UNIs), but also virtual service endpoints running on a blade server in the cloud to connect to Virtual Machines (VMs) or Virtual Network Functions (VNFs).”  The notion is actually somewhat broader, and in their white paper the MEF makes it clear that the Third Network could be used to create ad hoc connections even over or supplementing the Internet, for individuals.

Taking this goal/mission set as a starting point, I’d say that the MEF is describing a kind of virtual overlay that can connect physical ports, virtual ports, and IP (sockets?) and utilize as underlayment a broad combination of Ethernet, IP, SDN, and NFV elements.  The Third Network would be realized through a combination of operations/management elements that would orchestrate the cooperation of lower-level elements, and gateways that would provide for linkage among those underlayment pieces.

I mentioned orchestration above, and the MEF says that “embracing complete Lifecycle Service Orchestration” is the key to realizing the Third Network’s potential.  LSO is envisioned not as a singular “orchestrator” but as a hierarchy, with the operator who owns the retail relationship running a top-level LSO instance that then communicates with the LSO instances of the underlayment pieces.

This, in my view, is very much like the notion of an intent model hierarchy of the kind I’ve been blogging about.  Each “service” that an LSO level is working on is decomposed by it into lower-level things (real deployment or other subordinate LSOs) and any LSO levels above will integrate it as a subordinate resource.  There’s an SLA and connection points and an implied service description, again like an intent model of NaaS.  That’s good, of course, because NaaS is what this is supposed to be.

It doesn’t take a lot of imagination to see that the Third Network could be the “orchestrator of orchestrators” or higher-level process that unifies SDN, NFV, and legacy technology and also operator, vendor, and administrative domains.  The LSO white paper shows this graphically, in fact.  From that, one might reasonably ask whether LSO has the potential of being that unifying concept that I’ve said SDN and NFV both need.  Yes…and no.

The Third Network’s approach is the approach that both the NFV ISG and the ONF should have taken to describe their own management and orchestration strategies.  That a service is made up of a hierarchy of models (intent models in my terminology) should have been fundamental, but it wasn’t.  The good news is that the Third Network now creates such a hierarchy, but the problem is first that it doesn’t go low enough, and second that it doesn’t go high enough.

The MEF can (and does) define LSO as a top-level service lifecycle orchestrator, and it can (and does) subordinate SDN and NFV implementations and legacy management to it.  But it can’t retroject the service model hierarchy into these other standards processes.  That means that in order for LSO to define a complete orchestration model for a complex service made up of all those technology pieces, it has to model the service entirely and pass only the bottom-level elements to the other standards’ implementations.  Otherwise whatever limits in terms of service structure those other standards had, they still have.

It’s possible that the MEF could take LSO to that level.  Their diagrams, after all, show LSOs talking to LSOs in a hierarchy, and there’s no reason why one of those subordinate LSOs might not itself be top of a hierarchy.  But it’s a lot more complicated to do that, and it extends the work way beyond the normal scope of a body like the MEF.

There’s a similar issue at the top, where the LSO connects with the OSS/BSS.  The diagrams the MEF provide show the LSO below the OSS/BSS, meaning that it would have to look to operations systems like a “virtual device”.  That’s not something unique to the Third Network approach; most NFV implementations and SDN implementations try to emulate a device to tie to management systems, but it can create issues.

A service lifecycle starts with service creation and ends with sustaining an operational service for a customer.  While there are events in a service lifecycle that have no direct OSS/BSS impact (a rerouting of traffic to correct congestion that threatens an SLA is one), many events do require operations system interaction.  The service lifecycle, after all, has to live where services live, which is really in the OSS/BSS.

It’s not clear to me from the Third Network material just how the MEF believes it could address the above/below orchestration points.  There is value in the kind of orchestration the MEF proposed, even if they don’t address the holistic orchestration model, because for business services like Carrier Ethernet, VLANs, and VPNs we have established management models.  However, if somebody does develop a full orchestration model, then the Third Network functions would duplicate some of the functions of that broader model.  It might then have to be treated as a “domain” to be integrated with OSS/BSS, SDN, and NFV orchestration through federation techniques.

LSO is good in concept, but it’s still in concept and so I can’t really say where the concept will go.  The MEF white papers outline a highly credible approach and even indicate the specific information models and specifications they plan to develop.  Even with a limited-scope target, this is going to be a formidable task for them.  It would be facilitated, of course, by a broad notion of how new-age operations and management looked from the real top, which is above current operations systems.

We really need a true vision of top-down services for next-gen networks.  You can see that vendors and operators are working on this and that, pieces of technology or standards that have real utility but only in a broad and operationalizable context that we’re still groping to identify.  The main signals of real progress so far are in some of the TMF Catalyst demonstrations, which for the first time are starting to look at both the service realization below and the operations modernization above.  Hopefully the vendors involved will push their efforts earnestly, because there’s a lot riding on the results.