Commenting on Our Blog

I’m Tom Nolle, the author of CIMI Corporation’s blog and the president of CIMI Corp.  I’ve been asked from time to time how someone can register to comment on the blog posts, and I want to answer that here.

You cannot reply on the blog; comments are disabled and will remain so.  All the blog posts are posted to LinkedIn, and you can follow me to see them all.  The posts are also reposted to relevant LinkedIn groups.  In either case, comments and questions can be added to the LinkedIn posts.

A word of caution.  My blog is to present my views, not your promotions or commercials.  Do not add comments to my LinkedIn posts that are simply commercials for your company.  If you do, I’ll provide you with a warning not to post that kind of comment.  If you ignore it, I’ll never respond to anything you comment on, and I’ll contact LinkedIn to see if I can report the post.

The ONF May Be Onto Something…Or Not

Sometimes it seems like industry consortia have longer lives than the stuff they work on.  Part of that is because the smarter ones keep moving to keep pace with industry needs and media interest.  The Open Networking Foundation (ONF) is reinventing itself, launching a new strategic plan, and if you combine this with some of the other recent innovations, Stratum in particular, it shows the ONF may be on to something.

The sense of the new ONF mission is to expand “openness” beyond “OpenFlow” to the full scope of what’s needed for operators to adopt open solutions to their problems, with transformation and otherwise.  There are four new reference designs delivered to support the new mission.  The first is an SDN-enabled broadband access strategy, the second a data center function-host fabric, the third a P4-leveraging data plane programmability model, and the final one an open model for multi-vendor optical networks.

The ONF process is defined by a Technical Leadership Team (TLT) that determines the priorities and steers things overall.  The project flow envisioned is that you start with a set of open-source components and a set of reference designs, and these flow into a series of Exemplar Platforms (applications of the reference designs), from which you then go to solutions, trials, and deployments.  The ONF CORD project.

The latest documents show CORD to mean “Cloud Optimized Remote Datacenter”, which might or might not be an acronym rebranding from the original “Central Office Rearchitected as a Datacenter; the ONF site uses both decodes of the acronym.  Whatever it means, CORD is the basis for multiple ONF missions, providing an architectural reference that is fairly cloud-centric.  There’s a CORD for enterprises (E-CORD), 5G mobile (M-CORD), and one for residential broadband (R-CORD).  CORD is the basis for executing all four of the new reference designs.

CORD isn’t the only acronym that might be getting a subtle definition shift in the ONF work.  For example, when the ONF material says “SDN” they don’t mean just OpenFlow, but rather network services controlled through the ONOS Controller, which could be any mixture of legacy and OpenFlow SDN.  They also include P4-programmable forwarding, the Stratum project I’ve already mentioned in a number of blogs.  They also talk about “NFV” and “VNFs” in reference designs, but they seem to take the broader view that a VNF is any kind of hosted function, and NFV any framework for using hosted functions.  That’s broader than the strict ETSI ISG definition, but it would include it.

I think that if the ONF is trying to create something more broadly useful for itself to attack, and something more relevant to current market needs, it’s doing a decent job at both.  There are only two potential issues, in fact.  The first is whether the underlying CORD model, even with some cloudification applied, is too device-centric to be the best way to approach architecting a software-centric future.  The second is whether “standardization” in the traditional sense is even useful.

If you look at the OpenCORD site, the implementation of CORD is described as “A reference implementation of CORD combines commodity servers, white-box switches, and disaggregated access technologies with open source software to provide an extensible service delivery platform. This gives network operators the means to configure, control, and extend CORD to meet their operational and business objectives. The reference implementation is sufficiently complete to support field trials.”  In the original CORD material, as well as in this paragraph, there is a very explicit notion of a new infrastructure model.  Not evolutionary, but new.

In my view, a revolutionary infrastructure is valuable only if you can evolve to it, and that means both in terms of transitioning to the new technology model to avoid fork-lift write-downs, and transitioning operations practices.  Both these require a layer of abstraction that harmonizes old and new in both the capex and opex dimensions.  It doesn’t appear that the ONF intends to do that, which means they end up depending on the same deus ex machina intervention of some outside process as both SDN and NFV have, to their disadvantage.

In a related point, the original CORD concept of rearchitecting a central office fits nicely with the notion that services are created in a central office.  It’s very likely that future services will draw on features that are totally distributed.  Is that distributability, the notion of a true cloud, properly integrated into the basic approach?  That kind of integration requires service modeling for distributed operations and management.

To me, this point raises the same kinds of questions I raised with respect to service modeling and ZTA earlier this week.  If we can’t do lifecycle management at scale without a service model structure that roughly tracks the old TMF NGOSS Contract strategy, can we model a “service office” without referring to that kind of structure?

The second point, which is whether the services of the future should or even can be guided by standards is both an extension of the first point and perhaps the overriding issue for bodies like the ONF.  Service infrastructure that builds services from hosts of composable independent elements may look in application terms like a collection of controllers that handle things like content delivery and monitoring, but these features wouldn’t likely reside in one place and might not even exist except as a collection of functions integrated by a data model.  We’ve seen in the past (particularly with NFV) the danger of taking a perfectly reasonable functional view of something and translating it literally into a software structure.  You tend to propagate the “box and interface” picture of the past into a virtual future.

Related to this point is the question of whether completely composable cloud-hosted services have to be described explicitly at all.  What you really need to describe is how the composition works, which takes us back to the service model.  If you deploy containerized applications, the container system doesn’t know anything about the logic of the application, only the logic of the deployment and redeployment.

The risk here is clear, I think.  We have a tendency to create standards that constrain what we are trying to standardize, to the point where the whole value of software-centric networks could be compromised.  Static structures are the opposite of agile, but when you try to draw out traditional functional models of what are supposed to be dynamic things, you end up with static structures.

The ONF knows it needs to make its stuff more cloud-centric, and I think that shows in the nature of the projects and the way they’re sliding around a bit on terminology.  I hope that means they know they need to avoid the other pitfalls of traditional standardization, the biases of the old CORD and the fundamental problem with creating a non-constraining description of a dynamic system.  If they do, they have the right scope and mission set to make themselves truly relevant in the quest for a transformational future infrastructure.  If they don’t, they’ll join the ranks of bodies left behind.

5G as a Driver for Modernizing Infrastructure

Since we are pretty much done with 5G standards, it’s a good time to take a realistic look at the way 5G might actually deploy and how it might impact metro infrastructure and the carrier cloud.  I want to stress that operators are internally divided on 5G and its impacts, so I’ve necessarily had to use my own judgment as a filter where competing views have to be harmonized.

Most operators agree that 5G non-stand-alone (the 4G/5G New Radio hybrid) and 5G/FTTN millimeter wave applications in “wireline” broadband will deploy.  We can expect to see both get out of the trial stage in 2019.  Operators cite a mixture of reasons for the deployment, but the consensus is that it’s really driven by changes in the mobile and video markets, and that the 5G connection is more a matter of using the latest standard for next-gen deployment than that 5G is driving these changes.

At the root of all these drivers is video streaming.  Mobile usage is growing every year, and in particular mobile content viewing, including live TV.  Cord-cutting is also exploding, reducing the appetite for linear TV.  AT&T recently introduced its WatchTV service, which streams 31 channels (36, eventually) for fifteen bucks a month, free to unlimited AT&T mobile customers.  Obviously, this means that more mobile capacity will be needed, and it makes sense to build out 5G rather than to try to make 4G work.

The 5G/FTTN hybrid is related to this.  Pure FTTH is practical in areas where demand density (roughly, GDP per square mile) is high, but with the decline in the value of linear TV it’s harder to build a business case for FTTH even where demand density is good.  DSL doesn’t work well for high-speed broadband, despite advances in the standard, so operators like Verizon are very interested in using 5G mm wave as a tail circuit to fiber to the node.  Most people agree that the combination is great for broadband but could deliver TV only in streaming form.

Video demand and 5G, in either NSA or mm-wave form, combine to drive a considerably larger requirement for fiber in the access/metro network.  In truth, fiber will be the big infrastructure winner in what’s usually seen as a 5G-driven infrastructure revolution.  No matter what kind of RAN you use, you need more capacity per unit area to deliver more video traffic, so we can expect to see a lot of new fiber runs to new nodes or base stations, and more capacity per fiber connection as well.  That will accommodate (at least accommodate better) the larger number of users viewing streaming video content and the larger number of hours spent per user doing that.

5G NSA, the mobile form of 5G likely to deploy first, depends on standardization because it depends on compatibility between handsets and RAN.  Operators are always reluctant to say that standards aren’t needed or useful in any space, but many think that standardization of 5G/FTTN isn’t nearly as important because it won’t demand a lot of interoperable elements.  The operator will control both ends of the radio connection, and the relationship will be fairly static.  However, there is at least some thought about having “node convergence” between 5G NSA and mm-wave technology, since it makes sense to leverage fiber runs wherever you decide to do them.

Estimates of how far and fast this first phase will go vary significantly, depending on factors like demand density, competition, and residual capacity in the 4G infrastructure of the area.  However, most operators don’t think that there will be widespread 5G NR even in 2020, and that 5G will outstrip 4G in deployment only by about 2022.  The 5G/FTTN stuff is likely to roll out faster, and fastest of all where cable TV (CATV cable delivery) is a competitor.

The timing of this first phase has a lot to do with the deployment of 5G beyond NSA, meaning 5G Core.  This part of the 5G story is where there’s the greatest difference in viewpoint among and within operators.  At the core, perhaps, lies the fact that many senior operator types still think in public-utility terms.  You design a service, deploy it, and people use it.  This view is rarely held by the CFO organization, however, and that is a big part of the division on the topic.

If you do not see future services as much more than expanded video-centric mobile, then the only driver to 5G Core is whatever impetus MVNO relationships might provide to network slicing.  Things like roaming between WiFi and cellular and even satellite could be provided without 5G core, and many operators are unsure whether that capability really benefits the incumbents.  Some also wonder whether a lot of MVNO interest wouldn’t just cut into their retail revenue to feed a smaller wholesale stream.

If you think future services are more than video, you probably think they relate to IoT.  Unfortunately, I don’t think there’s any topic in networking where fewer people have a realistic vision of progress.  Not only that, operators are probably lagging even the (pathetic) level of overall industry realism.  Saddest of all, even IoT doesn’t drive a convincing demand for 5G Core.  The problem is that everyone knows that “real-time” means “low latency”, but they don’t know the answer to the “how low can you go?” question, rephrased as “how low do you have to go?”  Most IoT applications could easily tolerate latency measured in the low hundreds of milliseconds, and that could be provided without 5G core (or even 5G NSA).

The credibility of IoT as a 5G driver, or indeed as a driver of anything, is still a question.  An article in Light Reading describes an initiative to come up with 5G justifications, and three were identified; augmented reality, M2M, and medical.  Gosh, I remember when “medical imaging” was the justification for practically any new network technology, so I guess that one’s no surprise.  To me, these three are tenuous enough to make IoT look good.

I think IoT will emerge, not as a sensor-connect 5G technology but as a cloud-service technology.  Because not even that point is accepted, I’m not expecting much from it in the near term.  My model says that we can’t expect to see IoT emerge as a significant driver of carrier cloud until after 2022, which means that even when 5G NSA and 5G/FTTN are deployed at significant levels, IoT isn’t doing much to move things beyond those two phase-one initiatives.  It’s not until 2024 that IoT drives carrier cloud as strongly as video/advertising to streaming users.

That’s bad news for the media and pundits who think otherwise, and perhaps for some of the mobile-specific network vendor giants, but in an overall infrastructure sense it really won’t hurt much.  Access and metro fiber will be driven by video far more than anything else.  Carrier cloud, likewise.  If video demand is fully exploited by operators, most of the edge hosting, fiber deployments, and even function hosting that people dream of will still come along, just for different reasons and in subtly different forms.  “Carrier cloud”, for example, will focus “NFV” in a strict sense at the edge and in business services, but “cloud hosting” of network features will be driven along by advertising and video.

I think 5G, even 5G Core will deploy, eventually.  The important truth is that while it will, it’s not a direct driver of the near-term changes in infrastructure, just a driver of a new RAN and a new last-mile broadband delivery mechanism.  Some operators will retain a strong 5G linkage because it’s convenient for them to use that to build consensus internally and link to something with good ink, but there are plenty of opportunities for vendors having little or nothing to do with 5G to play a big role and make big gains in the next three years.

Defining (and Understanding) Service Models

What is a service model?  I get comments on my blogs on LinkedIn and directly from clients and contacts via email, and that’s a question I’ve been getting.  I’ve dealt with modeled services for almost 15 years now, and perhaps that’s demystified them to me when they’re still mysterious to others.  I want to try to correct that here and explain why the concept is critical, critical to the point where a lot of what we expect from transformation can’t happen without it.

A network service or cloud application could well have a dozen or more explicit components, and each of these might have a dozen “intrinsic” components.  If you host a function somewhere, it’s a part of a higher-level service relationship, and it’s also dependent on a bunch of hosting and connection resources, each of which may involve several components or steps.  You could prove this out for yourself by drawing a simple application or service and taking care to detail every dependent piece.

The difficulty that multi-component services pose is the complexity they introduce.  If you look at the NFV concept of service chaining, for example, you’re proposing to replace a simple appliance with a chain of virtual machines, each linked with a network connection and each requiring the same level of management as any cloud component, or more.  If you have a goal of agility and composability of services from logical elements, the variations possible only add to the complexity.  Get enough of this stuff wrapped up in a retail offering and it becomes almost impossible to keep it running without software assistance.

Service lifecycle automation, or “zero-touch automation” is about being able to set up services/applications, sustain them when they’re in operation, and tear them down when needed.  Think about that in the context of your own service/application diagram.  There are a lot of steps to take, a lot of possible things that could break, a lot of responses to problems that could be deemed proper.  How do these things get done?

The best way to start is by looking at the high-touch, manual, way.  An operator in a network operations center (NOC) would use a service order/description and take steps to provision everything according to the order.  Everyone knows that this approach is expensive and error-prone, and more so as the number of elements and steps increases.  In the IT world, human steps were replaced by scripts a long time ago.  A script is essentially a record of manual steps, something like a batch file.  From that, scripting evolved into the imperative model of DevOps tools; “Do this!”  Scripting handles things like deployment fairly easily.

Where scripting falls down is in handling abnormal conditions.  Suppose that something breaks or is unavailable when it’s called for in the script?  You’d have to write in exception conditions, which is a pain even if you assume that there’s a parallel resource you could commit.  If a whole approach to, for example, deploying a software element is invalidated because some specific thing can’t be done, you can’t replace the specific thing, you have to replace the approach.  That means going backward, in effect.

It’s even worse if something breaks during operation.  Now you have a broken piece, somewhere, that was supposed to do something.  The context of its use has to determine the nature of the remedy, and it could be as simple as using a parallel available element or as complicated as starting over again.  That’s where service modeling comes in.  Service models are declarative, meaning that they don’t describe steps, they describe states.

A service model is a representation of the relationships of elements in a service.  Each element in the model represents a corresponding functional element of the service.  The collection of elements creates a functional hierarchy, a representation of how the overall service breaks down into pieces, eventually pieces that can be deployed and managed.

With a functional hierarchy, a service or application is defined at the top as an object with a specific set of properties.  That object is then related to its highest-level functional pieces, and each of them is decomposed in turn into their highest-level functional pieces.  At some point, this decomposition reaches the level where the pieces are not themselves decomposable objects, but rather specific resource commitments.  A simple example of a top layer of a functional hierarchy is that “Service” consists of “Access” and “Core”.

This model is interesting, but it’s not in itself an answer to ZTA.  When we were talking about service deployments and problems, we were talking about events.  The notion of events is critical to automation because it’s events that automation is supposed to be handling.  A long time ago, the TMF came up with a then-revolutionary (and still insightful) vision, which was that a service contract (which is a model of a service) is the conduit that steers events to processes.

Events and event-handling are absolutely critical to the success of ZTA.  What happens in the real world is asynchronous, meaning that everything is running in parallel and everything that happens is also in parallel.  It’s possible to queue events up for a serialized monolithic process, but if you do there’s a good chance that when you process Event A, you don’t know that a related event or two occurred later on, and you’re now working out of sync with reality.  It’s not enough to be able to understand what an event means in context if your context is wrong.

OK, so let’s suppose that something in our “Access” component of our “Service” breaks.  The fault event is directed to the “Access” data model.  That data model is built around a state/event engine or table, which says that every event in every possible functional state (orderable, ordering, active, fault, etc.) has a target process.  When the event is received, that process is run.

There are a lot of things an activated process might do, but they break down into three main categories.  It can do something “functional” (like changing a parameter or even initiating a tear-down of a resource commitment), it can do something “signaling” (generate an event), and it can do something “stateful” (changing its own state).  Usually it will do several of these things, sometimes all.

If a model object like “Access” gets an event it can handle within its own scope, it would run a process and perhaps it would set a state indicating it was waiting for that process to do the handling.  When the completion was signaled (by another event), it would then restore its state to “active”.  If a model object cannot handle the thing an event is signaling, it might signal down the chain to its subordinates.  That would happen if a service change were processed.  Finally, if a model object cannot handle the event, or pass it down, then it has to report a fault up the chain of objects to its superior, again via an event.

Example: “Access” faults, and the internal repair is attempted by FaultProcess, after setting the “Repairing” state.  After the process runs, the repair is not successful.  The object then sets the “fault” state and reports a “fault” event to “Service”.  Service might then try to replace the whole access element, essentially treating the fault as a signal to recommit resources.  If that can’t work, then “Service” enters a “fault” state and signals a “fault” event to the user.

The “models” here are dualistic.  In one sense they’re an abstraction of the process set they represent; the sum of what can be done to resources to fulfill the functional mission.  That makes them an intent model.  In another sense, they are a blueprint in a data form.  If I have a model of “Service”, I have everything needed to do anything that can be done to or for it.  That means that any process set can handle any event for the service.  I could spin up a new process for each time I had an event, or I could send the event to a process already set up.  The ability to have parallel processes handling parallel events is critical to scaling, and also to keeping the context of your service elements up to date.

This was my notion of the “Service Factory” in the first ExperiaSphere project I did in association with the TMF’s Service Delivery Framework work.  You have a blueprint, you send it to a Service Factory, and that factory can do whatever is needed because it has the blueprint.  Models give you a pathway to ZTA, but not just that.  They give you a way to exercise full scalability of process resources, because any instance of the “FaultProcess” in our example could handle the fault.  There is no single monolithic application to queue events to, or even a fixed set of them.  The model mediates its own process scheduling for any number of processes and instances.

There are two things that I think emerge from understanding this model stuff.  The first is that it’s very difficult to see how you could respond to events in a service lifecycle any other way.  State/event processing is a fundamental element in real-time design, proven for literally decades.  The second is that without the intrinsic process scalability and “Service Factory” approach, you end up with a lifecycle manager that has fixed capacity, meaning that some combination of events will likely overrun it.  It doesn’t scale.

A deeper insight from the model approach is that the functions of “orchestration” and “management” are intrinsic to the state/event sequence and not explicit in a process sense.  What you have in a model-driven system is a series of interconnected state/event systems, and their collective behavior creates all the lifecycle responses, from deployment to tear-down.  There is orchestration, but it’s a set of model-driven state/event processes.  Same with management.

This is why I object to things like the NFV ISG approach.  You cannot have a real interface between abstractions.  If you define an interface to the concept of “orchestration”, you are nailing the abstraction of orchestration to an actual component that the interface can interface to.  There is, or should be, no such thing.  Real-time event-driven systems have only one interface, an event interface.  They don’t have explicit, individual elements, they have collective, model-driven, behaviors.  You can do anything with them that a state/event process-model system can describe, which is pretty much anything.

All this depends on building the models right, which is how SDN, NFV, ZTA, IoT, and everything else we’re talking about in real-world services and applications should have done from the start.  None of them have done so, and so we are at the point where we either have to accept that we’ve done a lot of stuff totally wrong and start over, or go forward with a path that’s not going to lead to an optimum solution, and perhaps not even to a workable one.  I’m at the point where I won’t talk to anyone about a service lifecycle automation/ZTA approach unless we can talk modeling in detail and up front.  Anything else is a waste of my time.

The great irony is that there’s nothing in service modeling that’s rocket science.  There have been service automation projects based on modeling for five years or more.  I’ve detailed the specific application of hierarchical modeling to network services in six annotated slide presentations in my latest ExperiaSphere project, and everything there is freely contributed to the public domain except the use of the trademark “ExperiaSphere”.  You don’t even have to acknowledge the contribution if you use part of or all of the ideas.  Anyone with some real-time programming education or experience could do this sort of thing.

Whether the importance of the approach I’ve outlined will be recognized in time to save the concept is another matter.  I’ve been able to make every software type I’ve talked with understand the structure and issues.  I’ve also run though the modeling with operator standards types, and while they seemed to like the approach, they didn’t insist on its adoption for SDN, NFV, or ZTA.  Even within the OSS/BSS community, where TMF types may recognize the genesis of modeling in the NGOSS Contract event steering from a decade ago, it’s hard to get an endorsement.  The reason may be that real-time event programming isn’t something that non-programmers are tuned into.  Many can understand it works, but most apparently don’t see it as the optimum approach.  Because it’s an event-driven approach, and because lifecycle management is an event-driven process, I think it is.

How the HPE/Aruba Software Defined Branch Measures Up

I saw an interesting article on the HPE/Aruba “Software-Defined Branch” concept, and it raises yet again a question we’ve grappled with in tech for decades.  The model is an expansion of the classic-of-today SD-WAN, but it’s an expansion in that other pieces are added/integrated to form a package.  Is this the right path for SD-WAN?  Do you create a winning strategy by applying a technology to a niche set of needs, or by generalizing to support the broad market trends?

There are a number of industry trends that are combining to create the need for a “service network” that’s a whole new higher layer of networking, not Level 4 of the OSI model, but rather a kind of super-Level-3 approach.  “Super” in the sense of being “above”, and also in the sense of being “better than”.  SD-WAN and some SDN products have been gradually morphing into this sort of mission, and I think that’s also what a software-defined branch should be.

There’s a natural connection between SD-WAN and branch networking, because an increasing number of branch or secondary business locations cannot be connected directly to a corporate VPN based on MPLS, either because it would be too costly or because the service isn’t available in the areas needed.  SD-WAN supplements VPN connectivity using the Internet.  It’s good that software-defined branches can do that, but it’s table stakes.  My readers know that I favor a “logical networking” model as a superset of SD-WAN, SDN, and also branch networking, and it’s that standard I have to apply to Aruba.

The tag line for the story is that “Aruba’s Software-Defined Branch (SD-Branch) solution is designed to help customers modernize branch networks for evolving cloud, IoT and mobility requirements.”  In terms of logical networking, this is kind of a “see your three bucks and raise you one” thinking.  Branch networking is creating problems for enterprises, as I’ve noted, which is the first dollar.  Cloud computing demands a whole new dimension of application component agility, so that’s the second dollar.  Mobility creates agility on the user side, which is dollar number three, and the raise is IoT.  What more can you ask?  The story says that this “marks a significant advancement beyond pure-play SD-WAN offerings”, and it certainly marks a significantly different positioning and packaging.

What Aruba is doing is melding cloud-hosted management with a combination of SD-WAN and other features, including wireless, by packaging multiple elements for co-located installation.  The goal is to create a service demarcation for branch locations, one that offers enterprises all the local communications-related features they need in a single piece, and then blends it with management.

Pulling a bunch of stuff into a single package isn’t exactly a strategic revolution.  It’s clear that we’re evolving to a new model of “service network” or “network-as-a-service”, so is it coincidental that HPE/Aruba is moving after competitors like VMware/Dell and Cisco have already made announcements in the same logical networking space?  Hardly likely.

I’m not knocking HPE/Aruba for providing more advanced branch connectivity.  Where I get a little uncomfortable is in the “marks a significant advancement beyond pure-play SD-WAN offerings” statement.  As I noted in my “stages of SD-WAN” blog, the SD-WAN space is evolving quickly, and the state of the art for SD-WAN is actually well ahead of the Aruba stuff.  This is particularly clear when you look at the specific point the Aruba story makes about support for the cloud.

For years, cloud computing to the enterprise was hosted server consolidation.  You needed to have the ability to access an application hosted in the cloud, so you need a convenient way of linking that application to the corporate VPN.  Cloud support equals software SD-WAN client, in other words.

The problem is that hosted server consolidation isn’t where it’s at.  The cloud is now the hybrid cloud, the multi-cloud, the elastic cloud, the failover cloud, the event cloud, and more.  You need more than application awareness in the sense of being able to recognize certain IP addresses and ports so you can prioritize traffic.  You need the logical network layer to be truly independent, mapping to physical connectivity only to the extent needed to deliver logical paths to a physical network service access point.

Independent logical networking means that you know who every user is, what every application is, what components make it up, how to follow components that are redeployed or scaled, how to decide who’s allowed to access what.  This knowledge exists not at the IP VPN or Internet level, but in the logical overlay network.  That means that the cloud’s agility, the user’s mobility, and everyone’s security is not an integrated function but an intrinsic element of the logical network.  I don’t have to make a logical network secure; it is secure by how it’s defined.  It’s mobile and agile for the same reason.

SD-WAN is a central piece, a valuable and even critical piece, of the Aruba story, but it’s not what it should be in terms of features.  SD-WAN in its futuristic logical-network form should be the heart of the story, the thing from which everything else extends.  If a network has to be retrospectively be made aware of users and applications then it’s not a user or application network, and that’s where we are going, with SD-WAN, SDN, cloud networking, branch networking, and everything else.

I don’t think Aruba is going to stop its software-defined branch evolution with this announcement, but I wish they’d built a more solid foundation before they jump into the IoT story.  What exactly does IoT demand from a branch network?  Wireless?  That’s hardly enough to make something an IoT story, but I really don’t see much that could be said about “real” IoT requirements like event-handling.  Do they propose to use traffic prioritization to improve latency?  Not enough, in my view.

I think the competitive announcements in this space almost surely caused Aruba and HPE to rush things a bit, responding quickly initiatives from rivals.  That’s too bad, because as I’ve said in the past, neither VMware nor Cisco really nailed down the key value propositions either.  There was an opportunity here to leapfrog competitors, and HPE/Aruba didn’t take it.

Is There Any Such Thing as uCPE?

You have to love the notion that we have “universal CPE” that somehow seems designed to serve one mission.  The great majority of uCPE products seem directed at the NFV-related “virtual CPE” or vCPE goal of having a box on premises that can be remotely augmented with endpoint features like firewall, encryption, and perhaps SD-WAN.  I never thought vCPE was an application with earth-shaking impact, so as the total mission for uCPE it seems a yawn.  Are there other missions, and if so what would uCPE end up looking like?

CPE-edge devices serve two broad missions, one to terminate a connection service and host service-related features, and the other to provide local processing that coordinates in some way with deeper processing capabilities, including perhaps in a cloud.  To be “universal” a uCPE strategy has to cover both areas.

Connection-point services are divided less by the service than by the nature of the customer.  Business users with carrier Ethernet connections typically have a need for rich service management tools and a number of features associated with the service itself—firewall is the most popular example, but some users (SMBs in particular) would like things like email servers with anti-virus, DNS/DHCP, encryption, SD-WAN, and more.  Consumers actually want similar capabilities, but in the consumer space there is an almost automatic presumption that WiFi will be provided by the box.  That means that the box itself has to be “real” not virtual, and since user demands for connection-point features are fairly static, there’s less value in having dynamic feature-loading capability.

The enterprises I’ve talked with are interested in uCPE for connection-point services only to the extent that they lower TCO.  A significant reduction would be required (25% minimum) to induce a change, and only displaceable assets would be targeted.  On the average, these enterprises have about two years of useful life remaining in their current service-edge strategy, so that limits the benefits to be reaped from uCPE replacement.

Operators are mixed on the value of uCPE for consumer termination.  Some operators like the idea of having the ability to remotely update features/firmware, but others see that as a source of complaints and perhaps even lawsuits.  In established market areas, the prevailing view is that consumers aren’t a target of opportunity at all.  For consumerish uCPE, the best target might be branch office locations that utilize consumer-like broadband technology for connection.  Since this space is the target of SD-WAN vendors, it’s likely that uCPE’s future for connection services will depend on SD-WAN.

The requirements for uCPE from connection-point services are easy to define; they would logically fit a mixture of P4 programming for flow-handling missions and more traditional function execution, where general-purpose hardware and software would work.  You can make Linux or embedded OSs work for that.  For symbiotic edge-of-cloud missions it’s harder, because you have to not only understand what the broad mission is, but also how separating some features off to be run at the edge would benefit the application overall.  Only then can you say what features would be useful.

If we go beyond connection-point services, things get more complicated rather than getting simpler.  We’ve known for some time, because of things that cloud providers like Amazon and Microsoft have done, that there is a role for devices that act as a kind of on-premises agent for cloud services.  Amazon’s Greengrass IoT function hosting is a good example of this, but IoT is far from the only example.

IoT is an event-driven service, and event processing is probably the most credible application for edge computing in terms of feature requirements and future potential, but it’s not a strong near-term driver.  That doesn’t mean it won’t impact uCPE requirements, though.  The useful life of a uCPE box is likely to be three to five years, and by that time IoT could be the largest single driver.  The most significant difference with IoT is that you probably don’t need a general-purpose processor and traditional operating system, and if you could trade those two against device cost or event-specialized features, it would likely be worth it.

A nearer-term example of a useful edge mission for uCPE, one applicable in the consumer space, is video and ad caching.  We’re moving now from streaming as a library function to streaming as a substitute for linear TV.  The latter places a significant restriction on cache delays because if video is held up, or if ads can’t be started on time and kept within the allocated slot, you either drop something or you overrun into the next program.  Caching, compression, pre-staging of ads, and other stuff suitable for uCPE support could address these issues.  We already have streaming devices to link to these services; could these devices evolve to a form of uCPE?

That may be the real issue with “universalism”.  If we look across the consumer and business spaces, the working and entertainment missions, the connection-point and event-based edge-of-cloud roles, we quickly see a very large range of CPE types.  If we define a single box targeted at one of perhaps two dozen possible combinations of missions and roles and types, we’re hardly universal.  That undermines the accuracy of the term, but worse it undermines the credibility of the whole concept.  I have a universal box for you, dear operator, but you’ll need it and twelve other forms of CPE to cover your needs.  Not a good sales positioning.

It’s doubtful whether we could shoehorn a streaming stick consumer device and a universal carrier Ethernet termination into the same box, without making one or both the applications a waste of time for cost or functionality reasons.  We might see growth in the scope of missions, though.  Might consumer devices, for example, absorb things like personal agent features, email and IM clients, caching for video and even video guide features?  Clearly that could happen.  Might it also absorb IoT features for smart home?  Even more likely.

We have all this stuff already, of course, spread around in different boxes or software elements.  We did that not because we didn’t have universal CPE, but because we didn’t have universal requirements to converge on a single model that would be cost- and feature-effective in all the missions.  We did purpose-built stuff because people typically had one purpose in mind at a time.  That’s not true now, and so we have to juggle the timing with which uCPE opportunities will emerge to decide what’s best.

Some Sayings That Sum Up the Industry

I tend to use colorful phrases and examples in my work, because it makes important points easier to communicate and remember.  A couple of long-time clients recently and coincidentally sent me emails with some of the things they remember from our sessions.  Since they bear on the state of the industry today and are still points I try to make, I thought I’d share them here.

The first is the best competitive position is helpless against a failure of the business case.  My research on buyer behavior goes back over 30 years.  In that time, I’ve learned that there are three kinds of positioning statements that matter, but they matter only under specific conditions.  One position is differentiation, which compares your stuff (presumably favorably) with competitors.  The second is objection management, which addresses specific likely push-backs, and the final one is enablers.  These are critical because they represent the tie to the business case, the thing that enables the purchase in the first place.

You need to own enablers.  The only time when differentiation or objection management matter is when enablers are contested among competitors.  Today, buyers tell me that vendors spend their time explaining why they’re different, but never get to why the buyer cares in the first place.  As a result, a vendor can get “traction” but never get any money.  The seller who controls the enablers controls the deal, because they control the business case.

Another is that editorial presence sells website visits, website visits sell sales calls, and sales calls sell your product or service.  This falls out of what I call “trajectory management”.  You can’t sell anything to a buyer who doesn’t know you exist.  Generally, buyers say they learn about a company from seeing its name in the media, which is what I call “editorial presence” because they’re in a story.  When a buyer sees a company in an article, it’s usually associated with a product or service they’re interested in.  They then go to the company’s website, and if they find information they think is useful (which might mean website or additional downloaded collateral) they’ll contact the company for a sales call.

Many companies don’t understand this simple progression.  They try to get their sales message into an article, which in the first place is not going to work (the reporter won’t run it) and in the second place tries to induce a buying decision without ever making the buyer visible to the seller (which everyone in sales and marketing knows is dumb).  You can’t take a sales message into a marketing channel.  You get visible on a technology news site, but you sell the usual way.

A corollary to that one is “News” means “novelty”, not “truth”.  For those interested, “Pravda” means truth, which is its own exercise in positioning.  When you talk with reporters and analysts you have to remember what their interest is.  Generally, they have to be read to be paid, so if your story is dull as dishwater, they have to find something more exciting to say or they’ve wasted their time talking with you.  Something that might or might not be favorable to you, but certainly won’t be what you wanted out there.

I understand the problem of getting media attention, and the value of editorial presence as part of the trajectory I’ve talked about already.  I don’t personally approve of falsehood as a means of promotion, but I can’t tell my clients that it doesn’t work or isn’t sometimes the best approach.  I’ve also said that the truth is a convenient place to start a marketing story, and I tend to call press stories “fables” because they convey a message without getting too bogged down in reality.  If you’re a reader of the media, keep all this in mind.

Here’s another media-related saying:  Everyone is the same size in a ten-point font.  Positioning for the media is about being exciting, interesting, newsworthy.  You don’t have to be a big company to get “good ink” as the saying goes, but you do need to be interesting.  “Interesting” means both a good story and one that’s not been told to exhaustion.  If Concept X is already in the news, don’t expect to be able to announce it and get headline positioning.  Imagine a story headline like “Vendor is the Twentieth to Announce NFV!”  You have to be the company getting that concept into the news in the first place if you want the best outcome.  If you’re repeating old news, you have to make it sound newer.

Sometimes being contrarian is a good way to get into an already-established story line, but if you really want to sell Concept X, it doesn’t make sense to make news by dissing it.  More often the key is to take a different slant on the story—Concept X is good but not for the reasons others have told you.  A good way to look at it is to ask whether what you want to say is a true insight.  If not, if it’s old news, stay away from it.

On the PR side, another phrase:  An “expert” is somebody who knows an editor.  There are a lot of very bright people in every part of technology, most of whom you’ve never heard of and will never hear from.  The reason is that they don’t have access to the media.  If editorial presence sells website visits, it does that by making a reader/buyer aware of something or someone.  That’s also true of analysts and other resources used for quotes and background in news articles.

Most media people will admit that they aren’t technologists, which means that they probably can’t pick an “expert” based on their own knowledge.  Some people get the title “expert” by being around a long time, or by being able to explain things well, both of which are good criteria.  Some get the title by being accessible, or by offering pithy quotes, or by being willing to fit a quote into a story the reporter/editor is trying to salvage.  Those aren’t so good, criteria-wise.  This is another thing for readers to keep in mind.

For user perspective, we have this one:  To a user, “the network” is whatever isn’t on their desk or in their device.  This one cuts a lot of ways.  One is to show that most users aren’t all that aware of the structure of the services they use.  “The Internet is down” means that they can’t reach it.  It may be that what’s wrong has nothing whatsoever to do with the Internet.  The user may still call their ISP to complain loudly, and the issues with customer support have exploded as broadband Internet has spread network services to people with little network knowledge.

Another important point here is that good management of the device at the network edge is critical for everyone.  The edge device sees both the service and the service delivery into the user’s own network environment.  You can really do SLA enforcement only where everyone can see the same thing, and the most important place in problem determination is the demarcation point between service and user.  Get things right here, from either a user or service provider perspective, and you’re off to a great start.

How about this one for a fitting close?  There’s no substitute for knowing what you’re doing.  I’m always hearing comments like “I thought…” or “Joe told me…” or “Everyone knows…”, and they’re almost always a substitute for the admission that the speaker didn’t really know what they were doing.  If you are dependent on technology in any form, you need to understand enough to play your role.  You can manage a team of professionals if you know enough to recognize a professional when you see one, and if you can tell whether they’re doing roughly the right thing.

Businesses today are dependent on technology to an extent that no one would have believed possible twenty or thirty years ago.  How many do you suppose are really able to understand the technologies they depend on, even enough to be an intelligent consumer.  Remember caveat emptor?  A small office or home office needs some tech knowledge to use tech optimally, and an enterprise needs a lot of it.

For service providers and vendors, this is especially important in the emerging era of virtualization and software-defined network functionality.  I know hundreds of network professionals who know nothing about software, virtualization, the cloud, orchestration, and so forth.  Some of them are involved in standards activities that will shape the future of the industry and determine the fate of their company.  That’s not going to work.

Tech is glamorous, prosperous, exciting.  It’s also difficult, sometimes almost impossibly so.  No matter how smart you are, you need to be educated/exposed to tech in the specific areas of your responsibility, to do your job and ensure others do theirs.

This is a complicated industry, complicated in the business goals of all the players that intertwine and somehow have to be reconciled, and in the technologies that do the same thing.  Cute sayings like the ones contained here are helpful in communicating something quickly, but in the long run everyone has to take more than a moment to contemplate their place in the ecosystem.  To recapitulate the last point, there’s no substitute for knowing what you’re doing.

Can the White-Box Server Players Find the Right Platform Software?

I blogged last week about the white-box revolution and “platform compatibility” for white-box servers.  Since I had a number of interesting emails from operators (and vendors) on the blog, I wanted to use this blog to expand on the whole issue of platforms for white boxes, and on what “compatibility”, openness, and other attributes of platforms really mean to the buyers and the market.

Hosted features and functions, including everything that would come out of NFV, OTT-like services, and other carrier cloud missions, are all “applications” or “software components”, which I’ll call “components” in this blog.  They run on something, and the “something” they run on is a combination of hardware and platform software, meaning an operating system and middleware tools.

Components match platform software first and foremost, because they access the APIs that software exposes.  Components can also be hardware-specific if details of the CPU instruction set are built into the component when it’s compiled.  However, the hardware-specificity can be resolved in most cases by recompilation to avoid those details.  Platform software dependencies are much harder to resolve, so you have to be pretty careful in framing your platform software details.

On the operating system side, Linux is the de facto standard.  There are specialized operating systems (called “embedded systems” in most cases) that are designed for use on devices with a limited software mission, but even these often use Linux (actually, the POSIX standard) APIs.  The big difference across possible platforms, the difference most likely to impact components, is the middleware.

Middleware tends to divide into three broad categories.  The first is “utility middleware” that represents services that are likely to be used by any component.  This would include some network features and advanced database features.  Category two is “special-function middleware” designed to provide a standard implementation for things like event-handling, GUI support, message queuing, etc.  This stuff isn’t always used, but there are benefits (as we’ll see) for having a single approach to the functions when they are needed.  The final category is “management middleware”, which supports the deployment and lifecycle management of the components and the service/application ecosystem to which they belong.

Today, almost every possible piece of middleware has at least a half-dozen competing implementations, both open-source and proprietary.  Thus, there are literally thousands of combinations of middleware that could be selected to be a part of a given platform for cloud or carrier cloud hosting.  If one development team picks Combination A and another Combination E, there will be differences that will likely mean that some components will not be portable between the two environments, and that programming and operations practices will not be compatible.

The purpose of “platform compatibility” as I’ve used the term is to define a platform that works across the full range of services and applications to be deployed.  That means anything runs everywhere and is managed in the same way.  As long as the underlying hardware runs the platform, and as long as components don’t directly access hardware/CPU features not universally available in the server farm, you have a universal framework.

This isn’t as easy to achieve as it sounds, as I can say from personal experience.  There are at least a dozen major Linux distributions (“distros”), and packages like OpenStack have dependencies on operating system versions and features.  Things that use these packages have dependencies on the packages themselves, and it can take weeks to resolve all these dependencies and get something that works together.

One of the advantages of broad-based open-source providers like Red Hat is that their stuff is all designed to work together, giving users a break from the often-daunting responsibilities of being their own software integrator.  However, it’s also possible for others, acting more as traditional integrators, to play that role.  The plus there is that many network operators and even cloud providers have already made a decision favoring a particular version of Linux.

This naturally raises the question of how you’d achieve an open framework when even the synchronization of open-source elements in a white-box hosting world is hard to achieve.  One possibility is to first work to frame out a set of basic function-hosting features, and then to map those to all of the popular Linux platforms, perhaps with the notion of a release level.  We might have Function1.1, for example, which would consist of a specific set of tools and versions, and Function 1.2 which had a more advanced set.  Platform vendors and integrators could then advertise themselves as compliant with a given version level when they could deliver that set of tools/versions as a package.

All of this would probably be workable for carrier cloud hosting from the central office inward, but when you address the actual edge, the so-called “universal CPE” or uCPE, it gets more complicated.  One reason is that the features and functions you think you need for uCPE have to be balanced against the cost of hosting them.  You could surely give someone a complete cloud-like server framework as uCPE, but to do so might well make uCPE more expensive than proprietary solutions targeted at specific service-edge missions like firewall.  Yet if you specialize to support those missions, you probably leave behind the tool/version combinations that work in a server farm, thus making feature migration difficult.

Another issue is the entirety of the service lifecycle management framework.  The ETSI Network Functions Virtualization (NFV) framework is way too complicated to justify if what you’re really doing is just shoveling feature elements into a premises box.  Service chaining inside a box?  The simplest, cheapest, mechanism with the lowest operations cost would probably look nothing like NFV and not much like cloud computing either.  That doesn’t mean, though, that some of the lifecycle management elements that would be workable for NFV in its broad mission couldn’t be applied to the uCPE framework.  The ETSI ISG isn’t doing much to create lifecycle management, nor in my view is the ETSI zero-touch initiative, but rational management could be defined to cover all the options.

The most critical challenge white-box platform technology faces is lifecycle management.  Vendors are already framing packages of platform tools that work together and combining management tools and features into the package.  These initiatives haven’t gone far enough to make deployment, redeployment, and scaling of components efficient and error-free, but there’s no question that vendors have an advantage in making those things happen.  After all, they have a direct financial incentive, a lot of resources, and a lot of the components are under their control.  Open platforms of any sort lack this kind of sponsorship and ownership.

NFV has failed to meet its objectives because it didn’t consider operations efficiencies to be in-scope, and the decision to rely on them being developed elsewhere proved an error; five years after NFV launched we still don’t have ZTA.  Can white-box platforms somehow assemble the right approach?  In my view, it would be possible, but I’m not seeing much progress in that area so far.  History may repeat itself.

White Boxes: What, Where, and How Much?

One of the most important questions in network-building today is the role “white-box” technology will play.  It’s a fundamental precept of operators and even many enterprises that they’re being gouged and locked in by network vendors.  The obvious solution is to adopt an “open” technology, meaning something that doesn’t include proprietary elements and is subject to the pricing pressure inherent in a competitive market.  How good an idea is “white box”, and is it something that’s limited to network devices or inclusive even of large-scale server systems?

The fundamental basis for the white-box movement, whatever kind of box we’re talking about, is the notion that there is a baseline hardware architecture available that could serve as the basis for creating an open device.  This can be true only if there are no proprietary hardware enhancements that would significantly augment value to the user, and if there’s some source of standard that could be expected to drive competitive solutions that stayed interchangeable.

In the PC space of old, the IBM PC created such an architecture, and we have had many desktop and laptop products based on that architecture, all of which are open and fairly equivalent.  The PC space, and later the server space, introduced a second architectural reference more generally useful—“platform compatibility”.  The purpose of these open boxes is increasingly tied to running open software, since software creates the real features of any device technology.

Platform compatibility means that a device (switch/router or server) has not so much a cookie-cutter-identical hardware configuration, but that it offers a software platform (operating system and middleware) that adapts the traditional platform APIs (like Linux) to the hardware, erasing any visible differences.  Hardware plus platform equals white box.

In the white-box switch and routing space, there are a number of embedded operating systems designed to host switch/router feature software.  The problem with a pure hardware model is illustrated by this multiplicity.  It’s like having a standard PC hardware framework with a half-dozen different operating systems, each with their own applications.  We know from industry history that doesn’t work out.

Fortunately, we have an emerging platform-compatibility framework based on the P4 language.  If you can host a P4 interpreter or “virtual machine”, you can run forwarding programs written in P4.  I think that P4-based stuff, including the Linux Foundation’s DANOS (evolved from AT&T’s dNOS) and the ONF’s Stratum are likely to be the leaders in the white-box forwarding device space.

In all the furor of white-boxing network devices, we kind of forgot the server side.  NFV proposed the substitution of “commercial off-the-shelf servers” (COTS) hosting virtual functions for proprietary network devices.  At the time, most of the COTS focus was on Linux platform-compatible systems, the stuff already widely used in enterprise and cloud provided data centers.  These systems are “commercial” and even “open”, but are they really conforming to the spirit of white-box?  No.

More than a quarter of all servers sold today are commodity white-box implementations of a de facto Linux platform-compatible architecture.  Many are built around (or from) a reference architecture that was defined by the Open Compute Project, but my definition of platform compatibility doesn’t require that hardware be identical, only that it be provided with a platform software kit that has a standard set of APIs for the mission it’s targeting.  For white-box switch/routers, I think that platform is P4.  For servers, it’s Linux…and perhaps more.

One truth about today’s white-box stuff is that it would be rare to see no differences whatsoever in hardware, because of the wide range of missions the devices are expected to support.  We know, for example, that some processor chips (Intel’s in particular) are valuable where third-party software compatibility is critical.  There are better options in price/performance terms when the goal is simply pushing bits around, and of course things like GPUs are critical for high-performance video operations and even some encryption.  All of these can be accommodated providing the software platform makes the hardware differences invisible to the applications/functions being run.

Platform compatibility is taking over, in my view, because it addresses these issues and the real goals of openness—no lock-in, compatibility of application/feature software, and compatibility of management tools and practices.  We are likely to find things like content-addressable memories and GPUs as optional features on white boxes, and where these are provided there should be a standard interface to the hardware through an open API.  It would be helpful to have a standard API set to represent all of the platform and hardware features, with an emulator if there were no hardware support for a given capability.

White-box servers and OCP got their start with Facebook, who decided it would be a lot cheaper for them to build their own servers than to buy a commercial product.  Most users won’t find that to be true, obviously, and so we have already seen white-box companies offering both bare metal (switch/routers and servers) and platform-equipped devices.  These are the kinds of boxes that most white-box prospects would look for; they’re cheaper than big-name stuff and support open functional software of various types.

It seems inevitable to me that there’s going to be a kind of two-level shakeout in the white box space.  The first level shakes out a small number (probably two or three at most in both the switch/router and server categories) of players based on architecture.  Did they pick the right hardware features and configurations and the right platform software to attract the most useful applications?  If “No!” then they die off.  If “Yes!” then they will compete on price and service with others who also made the architecture cut.

Will this eventually shake the big-name providers?  Surely, at least where those providers are selling to buyers like network operators, cloud providers, and large enterprises.  Those players can afford to have the technical planning and support staff needed to deploy and maintain products that, let’s face it, aren’t going to have the vendor and community support of a switch/router or server giant.

The support angle is driving a trend toward a “white box ecosystem” rather than an a la carte approach.  One example I like is SYMKLOUD from Kontron, which offers both bare metal devices/servers and open platforms, and that also supplies open data center switches.  With a data center package in white-box form, buyers have fewer integration worries, which promotes the white box concept even to enterprise buyers.

Because network operators are such big buyers, they’re an early focus for the white-box crowd.  I think that initiatives like VMware’s Virtual Cloud Network are aimed at operators because they are the ones most likely to move aggressively to escape proprietary hardware/software strategies.  So far, white box players have been slow to see how networking has to change to adapt to the cloud.  Many are simply talking about supporting SDN and NFV, which any open model would likely do automatically.

Which white-box trend, switch/routers or servers, will have the biggest impact?  In the long run it may well be that servers will, because virtualization and hosted functions make no sense if you assume you’re transitioning from a proprietary appliance to a proprietary server/cloud platform.  If openness is good, it’s good everywhere.

Carrier cloud is the promised land for anyone chasing the big operators, and it’s surely going to be worth the effort.  In the near term, it’s also going to spread a lot of confusion and insecurity on the part of both sellers and buyers.  There are just too many drivers to carrier cloud, operating in different ways in different areas, and at different adoption rates.  SDN, NFV, cloud computing, video and ad optimization, IoT, 5G…the list goes on.  It would be easy for a white-box vendor to position to the wrong driver, and just as easy for their big-name competitors to do the same.  Right now, none of the white, black, gray, or branded boxes seem to be chasing the optimum story.  There’s still time for one camp to take the lead.

Reading The Tea Leaves of Cisco’s “Think Differently About Networks” Comments

Cisco has been a powerhouse in networking for decades, so when CEO Robbins says something at a big Cisco event, you have to pay attention.  Most of the Cisco Live event seemed to focus on partnerships and opening of APIs for its Digital Network Architecture and intent-based networking initiatives, but there was a parallel move that I think might be more significant in the long term.  It’s embodied in a quote by the Cisco CEO.

What Robbins said was that “We have to think differently about how we build networks”.  Interestingly, the focus of Cisco Live seemed to be cloud networking, containers, and Kubernetes orchestration for hybrid clouds.  It’s hard to escape the conclusion that somehow cloud and containers frame the “different” way Cisco (and presumably the industry) need to think.  How, exactly?

There are two ways that Robbins’ statement could be taken.  One is that the cloud and containers will play a role in building the network itself.  The second is that the cloud and containers will somehow redefine network requirements in a very significant way, enough to change network services and practices.  You could find truth in both statements, and Cisco may be admitting to both truths, but each would have a different impact on the industry.

We already have various initiatives, formal standards and open-source projects included, to use hosted features in networks instead of custom devices.  NFV, which I’m sure all my readers know is something I don’t think will fully realize its potential, has been focused mostly on virtual CPE missions that are usually dependent on premises hosting of functions in universal, open, devices.  In that mission, the difference between containers and VMs would be smaller, and frankly a totally different approach based on an embedded operating system would be even smarter.

Another mission for hosting of functions is linked with things like 5G, IMS/EPC, CDN, and so forth.  All these are multi-tenant service elements and therefore would be easier to deploy and manage if they were treated simply as cloud applications, without any NFV veneer.  In these missions, containers could indeed be useful if the functions weren’t too performance-centric.  My personal view is that hosting in virtual machines creates too high an overhead to be optimum; container hosting would be more efficient and thus offer better price points and profits.

Despite the fact that container hosting in the cloud could well play a role in building future networks, I’m not convinced that’s what Robbins was talking about.  First off, Cisco hardly benefits from a transition to hosting open features on an open platform.  Second, I don’t think that the change is profound enough to say that you’re going to “think differently” about network-building.  Fundamental shifts in how you build something usually come from the missions you’re supporting, which is the second of our possible meanings.

My conclusion is that Robbins is talking about the revolution in service requirements created by the cloud age.  There is no question that virtualization in any form disconnects resources from features by introducing an intermediary abstraction.  That’s what it’s supposed to do.  There shouldn’t have been any question that the virtualization process, undertaken to generate agility and flexibility, would have an impact on the connectivity requirements.  A component instance in the cloud, if it’s redeployed, is somewhere other than where it started.  Its address was assigned to its original location.  We have to make addressing dynamic in order to prevent the complexity in making network connections to agile components work from killing operations efficiency.

This thought seems to fit the partnership with Google on using Kubernetes orchestration in a hybrid cloud (consisting of private clouds and Google’s cloud, of course).  Hybrid clouds (and multi-cloud) are places where routine redeployment and scaling can mess up connectivity because of the intrinsic conflict in IP addresses between what something is and where it is.

I think that Cisco’s cloud-and-containers strategy is related to a number of market truths.  One is that in the public cloud service space, Microsoft is the player to beat when it comes to direct cloud service sales to enterprises.  Amazon does really well in cloud services to startups and also as a provider of cloud resources to network operators, but because Microsoft has a foot in many data centers, it has been able to grow its base among the enterprises.  Google, now decisively behind in the cloud overall, needs to catch up, and orchestration tools like Kubernetes can facilitate adoption of Google’s cloud.

Another market truth is that this can benefit Cisco too.  Rival VMware made their own announcements, advancing their Virtual Cloud Networking story. If Cisco can use the Google deal to leverage hybrid cloud symbiosis with Google’s cloud to pull through their own private cloud solutions to enterprises.  Google has been improving its “package platform” solutions for cloud computing steadily, adding new software and options.  A hybrid cloud network model might counter VMware and make Cisco’s DNA the de facto leader in a more virtualization-friendly networking model.

That, of course, begs the question of what such a model would look like, and then the question of how it might be implemented.  I’ve long been a supporter of the idea that the service network of the future would be a service-layer overlay, built as a virtual layer on top of current L2/L3 technology.  This model was first suggested by a vendor (as far as I can recall) back about a decade ago when Huawei promoted its idea of a “Next-Generation Service Overlay Network” or NGSON at a meeting of the IPsphere Forum.  That model was subsequently taken to the IEEE for standardization work, and never really got much traction.  Today, overlay networks are based on things like SDN or SD-WAN.

The mechanics of a service-overlay network are well understood.  You have an underlayment network service that’s connective—it touches the endpoint sites you need to provide service in.  At each site, there’s a function (a hosted one or a device) that processes an additional header.  It’s a destination for the underlay network, and it creates a hub from which overlay connectivity is then extended.

The question that Cisco’s remarks raise is whether, given the presumptive role of the cloud and containers in driving different network thinking, there’s more to it than this service-overlay-101 trivia.  While there are benefits to simple overlay networks, the specific mission of supporting the kind of agility that virtualization and cloud computing bring needs more support than that.  Basic-think, meaning connectivity-driven service overlays, don’t move the ball enough.

I’ve suggested that “logical networking” is what’s needed, and I’ve defined in other blogs (and in a CIMI Corporation report HERE) what that would mean.  Could it be that Cisco intends to augment DNA to include the essential logical-network features?  Perhaps, but it’s not Cisco’s style to present a revolutionary solution to something, just to present a revolutionary set of claims in an extravagant PR rollout, then ease into the actual delivery.  Cisco likes to be a “fast follower”.

It might be time for fast-follower moves, though.  As I’ve already noted, VMware’s Virtual Cloud Network already has the very mission that Robbins talked about, and certainly represents a different way of thinking about networking.  Cisco has long been wary of VMware, and it can’t afford to let a new battlefront open in the networking space with VMware leading the charge.  Since VMware left a lot on the table in its announcement (it went only somewhat beyond the basic overlay connectivity features we’ve had all along), Cisco could do a lot here by adding only a little.

So could a lot of other players.  HPE, who competes with both Cisco and Dell in the server space, has no overlay model it can call its own, and doesn’t make virtualization-friendly networking a fixture of their positioning.  The MEF, who proposes to define a service overlay layer (SD-WAN) in its MEF 3.0 model, could take the steps necessary to make at least a credible start to that initiative.  Juniper could do something with its Contrail SDN solution, and any of several SD-WAN vendors could lock down the critical elements of the virtualization-friendly overlay service before Cisco gets around to the details (then, perhaps, get themselves bought, even by Cisco).

Positioning for a virtualization-and-cloud-centric future overlay service network has been tentative at best, up to now, but there does seem to be momentum developing.  A big player like Cisco, prepared to do the heavy lifting in the market, could definitely move the ball.  Based on what seems more a partnership focus, I don’t think Cisco will go all-out at this, unless they decide that the market is on the edge of becoming critical and they need to overhang it a bit.  If Cisco had to develop the market in the first place, they’d be the one making it critical, so doing nothing would be easier.  That puts things back in the court of Cisco competitors, and in particular VMware.

The Virtual Cloud Network story is absolutely critical for VMware, and Robbins’ comment wasn’t much more than a nice PR spin for Cisco.  VMware cannot afford having it turn out to be something Cisco really takes seriously, unless VMware goes further, faster.  Of course, the faster VMware runs, the faster Cisco runs as well.  We’ve created market races, and markets, with less than that.  Maybe we’ll do that with logical networking too.