Unraveling the Cisco SD-WAN-in-Router Move

Cisco always signals important market moves, sometimes with tangible changes and sometimes just by erecting an attractive billboard aimed at the media.  The one they announced last week, the integration of the Viptela SD-WAN software with Cisco routers, is surely in the first category.  Still, does this signal a problem for vendors in the space, an opportunity, a shift in the market dynamic, a change in SD-WAN technology, or perhaps all of the above?

The SD-WAN market is complicated enough as it is.  I’ve noted in prior blogs that there are really two visions of SD-WAN.  The first is based on the simple extension of MPLS VPNs to sites where MPLS isn’t economically viable.  The second, on an “overlay” of existing networks with an SD-WAN/SDN virtualization model to define connectivity in a way much like “network as a service” or NaaS.  There are also two principle sales models, one direct SD-WAN to the enterprise and the other SD-WAN services offered by network operators or managed service providers (MSPs).  It’s into this multi-dimensional mix that we thrust Cisco’s own strategy.

Cisco has two SD-WAN products, both acquired through M&A.  One is Meraki, which is a basic small-site offering that’s probably best for SMBs, and the other is Viptela, which Cisco has appeared to target primarily at the enterprise rather than on service providers.  It’s Viptela that’s the focus of the recent announcement.

Viptela is among the SD-WAN market-share leaders, in part because they were doing well when acquired and in part because of Cisco’s influence thereafter.  They are fairly conventional in terms of features, offering that I’ve called the “basic” extension model of SD-WAN.  I’ve said from the first that I believed that SD-WAN’s future was “logical networking” as the basis for network-as-a-service (NaaS), and I don’t think Viptela has those features yet.  Of course, we’re not in the future yet, and so the immediate question is what this will do today.  The next question is whether Cisco will push a broader logical-networking position later on, and even the answer to the immediate question may depend on how competitors see that second question being answered.

The Viptela enterprise-centric positioning is a bit off-center with respect to current market trends.  MSPs are the primary channel for SD-WAN to business, and network operators the fastest-growing channel.  Even the notion that adding SD-WAN to a Cisco router as an option for deployment is strange, if you assume the classic mission of SD-WAN (extend the company VPN to small, thinly connected, sites) is the driver.  Those sites wouldn’t likely have a router in place, and it would be way cheaper to add a minimalist SD-WAN appliance or a software instance.

Cisco’s enterprise focus might have left many service-provider-centric SD-WAN competitors feeling pretty save, and surely the enterprise targeting of the current announcement could be seen as extending that enterprise focus.  Given all that, this latest SD-WAN-in-the-router play might not seem to matter much, but it does.

The first reason is simple.  If Cisco is going to push SD-WAN aggressively to enterprises, it makes operator or MSP sales of SD-WAN more competitive and difficult.  Cisco has enormous influence in the big enterprise accounts; it has the highest strategic influence of any company in our most recent surveys.  The influence means Cisco people can to a degree control the pace of SD-WAN consideration and establish the mindset of the buyers.  It used to be called “wiring the RFP.”  Making SD-WAN a part of a Cisco router moves it into the mainstream of Cisco sales efforts, and any competitor who isn’t afraid of Cisco sales is delusional.

The second reason is more complicated.  Small sites typical of SD-WAN deployment are not likely targets for routers at all, as we’ve noted.  Cisco still has the old Viptela options for those sites, but this new announcement has the potential to get SD-WAN into all sites with routers.  It’s the sites that have MPLS VPN connectivity that are now easier to serve with SD-WAN, not to enable users to switch to Internet transport (though some sites could) but to provide SD-WAN agility everywhere.  That moves us from what we called, in our SD-WAN tutorial, the “extend” to the “overlay” model of SD-WAN, a model where SD-WAN has complete control over enterprise connectivity.

The primary drivers of the overlay model of SD-WAN have been, up to now, desire for complete management visibility and exploitation of “logical-networking” identity-based connection policy management.  The overlay model requires an SD-WAN node in every location, and if Cisco can easily put one there, it means that Cisco might then migrate to the logical networking model to exploit their position in the network.  Just that possibility would then push SD-WAN vendors faster toward logical-networking features.

They’d need them anyway.  You don’t want to do no-differentiation-for-us competitive sales positionings against the Cisco sales machine.  It will be quickly more difficult for SD-WAN competitors to avoid introducing new and valuable features, because loss of market relevance would be the result.  MSPs who have an interest in enterprise services will be the first, and most, impacted because Cisco hits them directly from the enterprise side.  In any case, Cisco could confound everyone by taking the logical-networking step themselves, preemptively.

Is there anything that would encourage Cisco to shift toward a logical-networking positioning for SD-WAN?  Sure is, and it’s a good one.  Cisco knows security is perhaps the hottest area in networking when it comes to new product decisions, and nothing puts an incumbent at risk as new-think on the part of buyers.  SD-WAN, universally deployed and equipped with strong logical-networking powers, could revolutionize network security.  That’s true whether it’s integrated with an edge router or deployed as a software instance or appliance, and whether it’s done by Cisco or by somebody else.  Hence, Cisco has to think seriously about doing it, or they admit another vendor onto every site in their network.

Thinking seriously isn’t the same thing as committing, and keeping other vendors out of Cisco accounts could also be achieved simply by getting the Cisco SD-WAN in place and letting the feature wars go on without Cisco’s participation.  That might on the surface seem rather short-sighted, but remember that very few SD-WAN vendors (or other vendors) offer logical networking.  Cisco was criticized for taking a back seat in adopting the OpenFlow SDN model, and their strategy of policy-based software-defined networking was more than enough to stave off any SDN threat.  Could happen again here.

That’s particularly true given that the SD-WAN space is hardly hopping with stories about logical networking or buzzing with RFPs that demand it.  Most buyers and vendors are still quite happy to be supporting that old MPLS-VPN-extension model, and it will probably take some major market force to drive things in a more logical-network-oriented direction.  If that force doesn’t act, then Cisco is perfectly safe with a conservative enterprise-focused position.  The “major market force” can’t come from just anywhere, either.  Only two competing SD-WAN vendors have the overall market strength to push things against Cisco’s approach on their buyer influence alone. 

One company is VMware, who kind-of-promised a unified logical-networking story with its Virtual Cloud Network, but isn’t quite there at this point.  Their Velocloud SD-WAN is used by Windstream, who just claimed to have the largest SD-WAN service base, and they’ve just announced they’re picking up Dell EMC’s service assurance suite in a move aiming (to they say in the release) at network operator 5G evolution.  I’d rate them the largest of the vendor threats.

The other is Nokia/Nuage, who has an exceptionally good relationship with the network operators.  They also have what I think is the top SDN product, and their SDN and SD-WAN approaches are integrated.  The problem Nokia has is largely one of articulation, but in a sales war with Cisco that’s enough of a problem to give pause.

Another possible “major market force” would be the adoption of a logical-networking story by one of the major network operators or MSPs.  Whoever supplied the technology, the source of the service would then put the same pressure on Cisco to be more aggressive with logical networking features.  It’s impossible to say at this point whether this is going to happen soon, or at all.

Another impact this might have is on management integration.  Cisco intends to integrate the Viptela management into its overall management platform, and that could offer users a significant benefit.  In turn, it could make other SD-WAN vendors focus more on management and management integration.  I expect this to be a longer-term impact, though, not only because management integration isn’t there yet for Cisco’s SD-WAN strategy but because Cisco’s enterprise targeting yields different management requirements than exist for vendors who target network operators and MSPs.

SD-WAN is obviously the populist solution to VPNs.  There’s an enormous market benefit to offering SD-WAN as a managed service to SMBs, and similarly to smaller enterprise sites.  That pressure operates on both network operators and MSPs.  There’s also a significant benefit to having enterprises extend VPNs via SD-WAN, either through managed services or direct SD-WAN product purchases.  I think SD-WAN is going to end up as the de facto VPN strategy fairly quickly, and Cisco has to stay level with the SD-WAN playing field or the trend threatens their account control.

Thus, the law of unintended consequences.  Cisco has, on the surface, reinforced its enterprise focus for SD-WAN, but by linking SD-WAN into an edge router, it has also made it easier for enterprises to adopt an overlay SD-WAN that would give SD-WANs total connectivity control.  Rivals like VMware and Nokia/Nuage, who have a more service-provider focus for their sales, could see Cisco’s move as a broad threat, and turn to logical networking features to help differentiate their stuff and support their customers, who are SD-WAN service providers.  Anyone who blinks in the direction of a feature war in SD-WAN promotes logical networking, as the obvious end-game for features.

Could a “typical” SD-WAN vendor push logical networking, and thus push Cisco?  It would be far more difficult to do that now, given that SD-WAN is hardly a new concept and the media hates to write recapitulation or repositioning stories about something already covered.  Difficult, but not impossible, and if Cisco really locks up the larger enterprises with its SD-WAN-in-a-router model, many of the current SD-WA players might have to take an aggressive swing, before it’s too late.

Modern Network Software and the API

As a former software engineer, architect, and head of some high-tech programming groups, I love APIs.  I also have to admit that APIs are getting to be a lot more complicated than they used to be, and things like edge computing, event processing, and microservices are promising to make them even more complex over time.  For those who think they’re just starting to get their head around the API concept, this isn’t exactly good news.  It’s also bad that APIs are becoming a kind of lefthanded proof of progress.  I’m great; I have an API!  There’s a lot more to it than that, which is why many of the APIs we’ve seen devised have failed in their mission.

The term “API” stands for “application program interface”, which isn’t really what APIs are turning into.  The original API concept was pulled from operating systems and middleware that exposed certain features to application developers through (you guessed it!) an API.  Today, we use the term to describe any interface between software components.  APIs are the equivalent of hardware interfaces like your Ethernet connectors, except for software.

There are two things an API represents; a function and an application model.  The function is the thing an API does, the relationship between its inputs and outputs.  The application model is the way the API provides for component integration.  Two components are running merrily along, coupled by an API.  The way the API works will constrain how the components can work, or they’d be unable to use the API to communicate.

Unless the function of an API has utility, the API is worthless.  A decade ago, we saw all manner of initiatives by network operators to “expose their features” via API.  Most of those features related to voice calling, which operators were then pricing based on unlimited usage.  How valuable can something be if you are giving it away, and how successful will you be in attracting third-party developers to exploit your “features”?  Even today, operators talk about APIs as though they were somehow the gateway to new revenues and new services.  They typically expect third-party developers to take the risks, and that’s a bit naïve when you consider that the operators themselves would be the best people to be working out new service features.

Most APIs today are aimed not so much at third-party developers as at integration of applications into more complex ecosystems.  You can see examples of these APIs in the work done by the ETSI NFV ISG, where APIs provide the connection between the functional components of the architecture, like VNF Managers (VNFMs), Management and Orchestration (MANO), or Virtual Infrastructure Manager (VIM).  These same ETSI definitions illustrate the other problem with APIs, which in the current age may be the most serious.  APIs codify someone’s visualization of an application model, and it may not be the best model, or even a valid one.

If you have an API between the MANO then that API defines the relationship.  If it sends something and expects a reply through the same API as a “return”, that’s a call relationship.  That kind of thing is common in legacy applications, but it doesn’t fit nearly as well with more modern applications designed to handle events.

Most network applications, including service lifecycle management, are logically event-driven, meaning that they are designed to process a bunch of asynchronous conditions that pop up here and there, both from planned sources like moves, adds, and changes to services and from unplanned sources like faults.  As these events come in, they have to be handled in order and they will kick off processes in response that may themselves generate events.  The in-order part is a matter of queues and “serialization”, but the problem of contextualizing events to what the management system is trying to do overall has to be done by recognizing state.  A state is a particular context of operation.  Normally there’s an “Active” state that’s the goal of a system, and there are various states representing the progression of steps toward that goal, or the restoration of that goal if something breaks.  The combination of events and states has long been recognized as the key to handling asynchronous stimulus in any application, which essentially means its key everywhere in networking.

Modern event-driven systems are what are known as “finite state machines” in software terms, meaning that it’s a system that responds to events by doing things and changing its context of event interpretation as it does them.  A simple example is that a service starts in the “orderable” state, and responding to the “order” event, transitions to the “active” state.   An event has an association with something, like a connection or service or device.  There’s a data model for each thing that’s being event-driven, and that model not only records the current state, it defines, for each possible state and event, what’s to be done at the intersection of the two.

If you want to expose the features of an event-driven system, how do you do it?  Answer, by generating an event, which means that you’d probably need only a single API, called perhaps “Inject-Event”.  Why all the API discussion, then?  Most of the APIs people talk about are really not related directly to network software, they’re related to management systems.  You don’t build networks on management systems, you manage networks with them…via humans.

Since most management systems are designed to (at some point) present something to a human (a customer service rep, a network operations engineer), these systems have been “humanized”, meaning that a specific window has been created into the network’s operation.  At some point, at least some of what can be seen and done through that window may have to be turned into events (if the network is itself event-driven), but what happens underneath isn’t visible through the window.  An order in an event-driven service management system has to generate an event.  If we have an order API, then somewhere down the line, it has to generate an event if it front-ends a modern system.  But because the system can’t be seen through the API “window” we don’t know if it does.

This is why the API story can be dangerous.  We need to think of network software, even network management software, in an event-driven way.  You can’t do lifecycle automation, for example, without having the software be aware of the state of the elements of the network as it attempts to drive the network into a given behavior.  At the least, that goal behavior is itself a “state”, and so is every possible step of progress toward, or away, from it.  But API discussions don’t impose state/event thinking below, and in fact usually hide whatever thinking is going on.

Data-modeled, event-driven, network software is the answer to the challenges we face today in service lifecycle automation, resilience in network elements created through software instances of former appliance-driven processes, and elasticity in capacity to do stuff.  If we have that, then we have a true event-driven system, and we also have the only API we need to have at the network level, the “Inject-Event” API.

To my mind, the easiest and best way to do APIs is to start with the functional architecture of what you want to do—the division of features and capabilities.  You then factor in the application’s “model”, the way that components relate to each other and to outside users, and what you get is a structural picture of the application at the top level.  This is where APIs should be considered first, and hardest, but it’s a picture we rarely hear about, and judge even less often.

When someone talks about APIs alone, they’re not telling you (or me) enough to be able to judge the value of what’s being exposed.  Is it 1980s technology, monolithic and cumbersome, or is it 2020 technology with events, microservices, and so forth?  There is a huge difference, and those who don’t accept that now may find out the truth the hard way when they try to deploy their wares broadly.

The Role of the Portal in Managed Services

I want to continue my comments on managed services with some insights from the service buyers.  I’ve been watching the buyer side of the space for about five years now, in part to see what might be required of NFV in support of managed service deployments.  Since then, I’ve kept in touch with some of the early and later adopters of managed services, and when I tallied them for this piece I was surprised by the consistency they showed.

A “portal” is the gateway for information to pass between a managed service provider (MSP) and the service buyer.  It’s the constant, visible, and (hopefully) highly visual presentation of the MSP’s presence, and of their commitment to the buyer.  No matter how complicated the management systems used by an MSP might be, the portal is the way the user wants to see it all.

That’s the main point I got from users; managed service providers are seen through the eyes of their portals.  Whatever brought users to the MSP decision, they quickly identified their service with the management portal through which they obtained status.  In fact, by the end of the second year of managed service use, almost all users lost awareness of what went on behind those portals.  Clearly, from the user perspective, presentation is nearly everything.

I don’t want to leave the impression that users didn’t care about what a managed service did; that isn’t true.  However, users framed their view of the service and its provider from what the portal showed.  If the portal was very good, meaning it conveyed a lot of useful information, they believed the service was equally good.  If the portal was less than ideal, then the service was likewise.  Portal problems were at least in the top two problems cited by every user who later abandoned managed services or changed providers.  That there were problems was a bit surprising to me, because what users wanted from MSP portals was pretty straightforward.

First and foremost, users wanted their portals to be the single point of contact between them and their provider.  They didn’t want to interact with multiple systems, call someone on the phone, or rely on a visit from a representative.  “Show me the portal!” might well have been an appropriate battle-cry for them.  They wanted to be able to get instant service status and drill down to identify problems and the state of resolution.  They wanted to be able to do all service ordering, and check on order status.  They wanted to be able to contact key people involved in their service and its SLA directly through the portal, and they wanted the portal to keep tabs on what they’d done in past contacts.

The problem with incomplete scope of portal capability was one of the top two problems users cited as a reason for leaving an MSP.  This was particularly true for users who gave senior (CxO) level people access to a portal to track the state of a far-flung operation.  Even though these users didn’t typically do problem isolation or moves, adds, and changes in configuration, they ended up expecting someone down the chain of command to be able to do that, perhaps from their exposure to a portal at a higher level.  “Why do you have to call them?” one user reported a CIO as asking, when told that a set of changes would require they talk to a technician, “Do it online!”

Users also wanted to be able to customize their portal display and features.  This was something that came up only in about a quarter of pre-deployment requirements but was later named as critical by almost all the managed service users.  The easiest way to think of what users wanted in this space was to visualize the portal as a web front-end to a business application (or applications).  Users wanted to be able to diddle with widgets or other simple graphic tools to set up their portal display and they wanted to be able to do this on a per-user basis.

User customization, meaning framing the portal to the specific users/roles that would be expected to use it, comes out clearly as a requirement as users mature in their exposure to managed services.  Most of the users identified at least three roles of personnel who were expected to interact with portals, and there were distinctly different requirements for what the portal showed and even what the users were allowed to do.  The more different roles a user assigned to MSP-related interactions, the more likely the user was to expect to see portals edit the set of features that could be exercised by some of the roles.  Usually they wanted only one role/person making any kind of configuration change, and the same with problem resolution, but they wanted everyone to be able to see the current state of the network and the state of all problems still considered open, or even recently closed.

It’s important to see how MSP portals relate to things like management consoles, which may seem to be the same thing.  The major difference between a user’s view of a managed service portal and their view of network or service management overall, is the relentless focus on SLA.  Here, SLA means “whatever the MSP has committed to managing”.  An SLA is a contract, and that’s what buyers understand how to enforce.  It starts with monitoring to determine whether it’s being violated, of course, and so that’s where the portal has to start.  Almost 95% of managed service users say they want their home screen on their management portal to show them the state of their SLA.

Things like problem diagnosis and determination, in managed services and portal terms, really mean deciding whether the problem is the MSP’s or their own.  Users are not interested in pursuing further problem isolation at the portal level; the SLA defines the boundaries of the contract and the functional limits of the portal.  Unless the buyer of the service has contracted for desktop-to-the-world managed service support, the MSP portal is the primary tool in making what’s the biggest decision of all—is it “us” or “them?”  The user will go to other tools for further refinement of the stuff in the former category.

Linking portals to user management systems is a complication, of course.  If a given managed service user is using managed services as little more than a piece of a complex network, then it’s likely that user will want to be able to link the MSP portal into their own management system.  That suggests a portal structure that starts with policy filtering of what can be seen and done, and then a layer of presentation that might end up in a display or app, or might end up in an API used to link information to the service user’s primary network operations tools.

Managed service users’ portal obsession may be a good thing for MSPs.  First, it indicates that the service user is focused where an MSP is focused, which is providing an SLA.  Second, portals are logical front-ends to any number of management systems, converging them on a common user view.

One of the challenges that MSPs face is the lack of unified management practices for the stuff their SLA covers.  Over-managers like OpenView or Tivoli have been a popular strategy not only for framing the administrative functions of network management (which include SLA management, of course) but also for unifying a somewhat disconnected set of element management tools.  Even some of the more modern takes on this software level (like Pandora) are probably overkill for MSPs, but there are plenty of open-source tools that could be collected under a portal-front-end umbrella.

I want to close with a recap of an earlier point.  Portals are not a substitute for effective management tools and practices, but they can elevate the user’s perception of a managed service, unify the users’ view with MSP management, and even frame MSP thinking around the right points—including the SLA.  There are major benefits in getting the portal story right.

Balancing the Future and the Quarter

Sometimes you just seem to have the wrong people.  Not bad, not incompetent, but just not right for the times or their role.  In the last couple of weeks, we’ve had IBM release another lackluster quarter, and GE rumored to be selling its GE Digital division, once seen as the point of light in the company’s transformation.  I’ve tracked IBM for ages, and followed GE Digital’s once-promising foray into IoT, and the common thread is bad execution.  It’s not a problem limited to these companies, and it may not be their “fault” in a sense.  On one hand, we live in tumultuous times, difficult times.  On the other, leadership is charged with working well within the current environment.

I’ve surveyed companies for three decades, and through most of that period IBM has been a standout at something I’ve called strategic influence.  Selling technology isn’t like selling apples, it’s something that users have to plan just to get to a point where they have a rational thing to do with the technology, and only then can they undertake to buy it.  Vendors who have the credibility and influence to guide the planning can make sure the user gets to a buying decision, and then of course steer management thinking toward something the vendor can sell.  Some call it “account control”, and IBM had it…for a while.

Starting in early 2011, my surveys showed that IBM’s strategic influence, which had risen almost steadily for 20 years, suddenly started to decline.  Since then, while its decline has stabilized for a year or so here and there, it’s never again gained influence.  That’s almost unique in my survey, and I think it says a lot about IBM’s challenges, but not everything.

IBM of old had control over accounts representing 85% or more of IT purchasing at its peak, and its influence with these accounts are what kept IBM on track.  From about 2000 onward, there was a slow decline not in IBM’s influence but in how many they were influencing.  It wasn’t that companies were rejecting IBM’s advice, but that they were leaving IBM because they’d picked a different computing strategy.  In most cases, it was the growth in Linux and servers versus mainframes that caused the shift.  In any event, IBM was able to influence, by 2005, less than two-thirds of IT purchasing, and by 2010 it was down to less than half.

According to my surveys, this problem was created because IBM did not promote its own server-farm solutions to computing as much as try to defend against it.  They saw this new paradigm as a threat and so tried to lead customers back to the True Way.  When that failed, they lost the customer.  The problem of loss to another paradigm was compounded by the fact that nobody was likely to be a “new mainframe prospect”, and IBM could hardly market the alternative approach while trying to sell the old way into accounts.  Thus, IBM’s marketing became disconnected, then itself went into decline.

What happened in 2011 was just an extension of this.  IBM’s customers took IBM’s advice for their classic applications and went somewhere else for the rest of the stuff, and that residual part grew over time, until there was so much that wasn’t under IBM’s influence that their net influence fell.  The market changed underneath IBM, proving that while you can build seawalls to stave off high tides, eventually plate tectonics will do you in.

IBM markets today primarily to who it sells to.  It’s not an evangelistic, inspirational, firm any longer, and that means it can’t easily make up its losses.  If that doesn’t change, IBM will never regain its old glory and may have a difficult time not becoming the victim of consolidation.  All because inside IBM, they’d gotten stuck by the culture of old, the people of old.

What about GE?  Once this was one of the most admired American companies, and when GE Digital was formed many believed they’d revolutionize software.  This seemed to be coming along when, in November of 2015, I looked at the GE Digital IoT framework, Predix.  It seemed to be the absolute best approach to IoT that I’d seen, the first realistic architecture.  A lot of GE Digital people who read the blog post I referenced really liked it, and one said they should post it on their homepage.  They didn’t, and in fact didn’t do much at all with Predix.

Why?  Because when GE Digital said “IoT” they really meant industrial IoT.  They came up with a grand architecture that they applied to a mission so specialized that probably 99% of businesses could have done nothing with it.  Who could?  GE’s current industrial customers, at most, so we’re seeing again a targeting so narrow that no loss could hope to be replaced, and everyone loses a few.  Not to mention that the barrier to adopting IoT was perhaps the very highest in the target areas GE identified.  That spells a very complex value proposition, meaning a long sales cycle.  Salespeople generally make about six or eight calls on a technology to see if they get any traction, then start calling on new prospects for employment.

Some of the key GE Digital people on Predix were, frankly, insufferable snits.  They understood process automation, and if everyone didn’t need it, that was their loss.  Some also admitted that they had taken Predix into traditional GE markets to “avoid making people nervous”.  Is that worse than making them unemployed?  Most of all, few in the organization had any broad vision of IoT, and the view of management was narrower still.

This isn’t a universal problem in tech, in my experience, but it’s probably a factor in many firms.  Wall Street’s “one quarter, right or wrong” approach to business planning doesn’t encourage risk-taking, and risk avoidance is easy to turn into product planning and positioning inertia.  Cisco’s approach, always called “fast follower” to indicate a wait-and-see attitude about new tech ideas, is actually smart as a business planning approach, but it’s easy to morph that to mean “reluctant follower”.  If you are prepared to follow fast, you’re prepared to go in the right direction at the drop of a hat.  If you’re reluctant, you probably don’t do much of anything till the market signals are clear, by which time it may be too late.

OK, how do you solve the problem?  There are, today, two very distinct constituencies in any company.  One is the financial mavens, the CFOs and their direct reports and the other is the R&D or product management people.  I think that too many companies try to harmonize these two groups by creating a single collective behavior.  The best approach would be to let each group do their thing instead.

Somebody in any organization has to plan for disruption.  What are the most disruptive factors that the market could experience over the next five years?  How will my company, my products or services, respond?  All tech companies should have a five-year plan for technology evolution, and they should also have taken the necessary developmental steps to ensure they can execute on one of their trend-plans in a year’s time.

Somebody also has to plan for the next quarter.  Just because R&D and product management has the next big thing in mind doesn’t mean you immediately overhang the market with it.  You need to be positioning for transformation not necessarily driving it, but just how aggressive you are depends on how much you have at stake with the status quo.  Transformation equals revolution, which equals a churning of the industry power structure.  Those on the bottom might expect a shot at a better position, while the one at the top has nowhere to go but down.

This latter factor is the only thing that seems to be operating in tech today.  We see the incumbents papering over old technology that’s long past prime, and we see the innovators talking up things that couldn’t possibly be accepted in the near term.  Instead of separating companies into progressive and regressive, we need to separate teams within the companies, and let each company push the limits of the current market.  IBM used to do that well, and so did others like Cisco.  I’d like to see them get their mojo back.

The Multi-Dimensional Rush to Managed Services

We are definitely hearing more about managed services these days.  I’ve talked about some technology developments linked to managed services (notably SD-WAN) but I’ve been going through user surveys and comments to decode the driving forces.  Since those forces will shape the market far more than technology will, I want to spend some time on them, and also on some of the things I think those forces will promote.

In managed services as in other technology sectors, one size (or one driver) doesn’t fit all.  There are two broad classes of technology buyers—“enterprises” and small-to-medium-sized businesses (SMBs).  Both these groups have a strong and growing interest in managed services, but different things are behind them, or at least different mixes of some common factors.

The issue that influences managed services most complexity.  Networks are getting more complicated, and I mean a lot more complicated, and the pace of complexity is accelerating.  That means that the skills needed to build or expand networks and sustain network operations are expanding.  Most companies have some difficulty acquiring and retaining qualified network professionals, and with more and more dependence on networks and more and more complexity to deal with, the skill problem just gets worse.

For SMBs, this is the big problem, particularly for mid-sized companies who are unusually rich in professional or other high-unit-value-of-labor people.  The services these companies offer (healthcare, financial planning, engineering, etc.) don’t allow for a lot of interruptions and delays, and the workers themselves can’t be diverted to fix problems with networks.  In surveys, I’ve found that the median number of network professionals in these companies is two.  That doesn’t leave much room for illness, for late-night operations, or for retirement or resignations.  It also means there’s probably no career path for the professionals within the company, so you can see the problems these firms face.

Almost two-thirds of SMBs have told me that they would prefer a managed service deal for WAN services to one where the user was responsible for the network, and almost as many said they would really like to have managed services “right from the desktop”, meaning on-premises too.  The problem is less willingness to pay (say the SMBs) as it is a lack of familiarity with offerings, or even knowledge of their availability.

Providers say that the problem with the SMB space is the difficulty in selling them, combined with the low return if you succeed.  One managed service provider salesperson told me “It takes me twice as long to sell to a mid-sized business as it does to a big enterprise, and the commission I end up getting is ten times as large with the enterprise, or even more.”  Even network operators, who already have a relationship with SMBs, say that it’s difficult and usually unrewarding to sell managed services to them.

I think the real problem here is the classic problem of marketing versus sales, and what I’ve always called “trajectory management”.  When you have a product or service that can’t really be sold efficiently by cold-calling, you need leads, which means you need a chain of events that ends with a prospect identifying themselves as such.  Often that starts with articles on the topic, read by users because they are network-dependent and know it.  That leads to website visits, then to sales calls.  That happy sequence hasn’t worked well with managed services, though.

Part of the reason is the other side of the picture, the enterprise.  Enterprises are worth more to sellers, to advertisers, to practically everyone.  Their professional service situation is very different from that of the SMB; they have a larger staff, fewer retention problems, and most of all the people who might be making a managed-service decision may see their own jobs at risk in the deal.  You need to couch managed service positioning carefully to avoid entangling yourself in job-fear pushbacks.

Actually, enterprises see their reason for adopting managed services as being more one of containment of professional staff needs than a desire to reduce staff.  Right now, the biggest problem enterprises face (according to them) is that of scope of operation compared to scope of services and support.  Big businesses have people everywhere, and so they need network services everywhere.  In today’s world, that means they need IP connectivity everywhere because all meaningful current applications depend specifically on IP connectivity.  It’s one thing to drop a data line in a remote area (or country) to connect a financial terminal, and another to sustain a workgroup that has to be connected to your corporate VPN.

The problem enterprises say they have is in remote locations where VPNs can’t be directly extended.  These locations likely have different service providers supporting access, and fewer employees who can be pressed into service to do local diagnostics and take remedial steps that are necessarily local and hands-on.  It’s this problem set that has become more serious as companies shift from providing simple device or local-server data connections to trying to add these locations to the corporate network.

What buyers say they need from managed services is more uniform.  They want a single point of contact, proactive monitoring and problem notification, a formal and detailed service-level agreement (SLA), an escalation plan if a problem isn’t resolved, a strategy for applying remedies locally, and a partitioned management system approach that doesn’t require the contact the MSP for things like adding a system or changing a security policy.  They don’t want to lose management visibility completely, a mandatory local “geek-squad” partner that does all the local work even where someone on site is considered qualified, and per-incident pricing that makes it impossible to budget for the costs.

Then, of course, there’s pricing.  There used to be a rule of thumb that said that total cost of ownership for a network was twice service costs, but that was in a day when people built their own network from circuits and devices.  An “average” midsize business today, one with 5 sites to connect, can make the connection with services costing about $6,000 per year, and would require two network types at a burdened labor cost of about $160,000 per year.  Obviously managed services could save this kind of buyer a hefty chunk.  Enterprises are different; the average TCO for enterprises in my survey ran 1.58 times service costs.  Both these figures omit capex.  Obviously, there’s a huge range of possible value propositions, possible prices, and possible drivers.

What the MSP has to do in order to meet buyer requirements and realize the managed service opportunity at a profit can be reduced to a phrase: management economy of scale.  Managed services displace in-house technical costs, so to be profitable they have to be able to deliver at least acceptable or equivalent support, at a price that buyers say should be about 30% lower than doing it in-house.  Getting to that breaks down into four specific points of action.

First, the MSP must have an excellent network operations center, and that means not only having a powerful and resilient management system but also a staff who has the experience to run things properly and deal with problems.  In any managed service, while the NOC isn’t the only thing needed, it’s the heart of the process and the place where the other stuff has to come together.

The second thing the MSP must have is an organized point of management visibility and control at the service boundary.  The “organized” part means that since a managed service is a form of virtual service, whatever collection of network stuff is needed to create the functional boundary point between WAN and LAN on a site must be abstracted into something singular and easy for all parties to interpret.  The border works, or it doesn’t.  You can’t have people digging through subtle components to figure that out; the user can’t and the NOC shouldn’t be taking the time.

The third point is that the location of the visibility boundary and the “policy boundary” determine the scope of the service level agreement.  This boundary point then serves as the default point of visibility and control for the MSP.  To extend it deeper means not only extending the responsibility of the MSP, but also the ability to see and control behavior at the new location of the boundary.

Point number four is orchestration and automation of the service lifecycle within the visibility/control boundary defined by the SLA.  You can’t have efficient management and management economy of scale without a toolkit to reduce human intervention.  Ideally, this toolkit should be one that can also be deployed by the user, perhaps offered as an add-on by the MSP, to facilitate problem detection, isolation, and remediation end-to-end.

Users say, by an almost 90% margin, that they don’t see this from MSPs today.  Interestingly, that statistic is nearly the same for enterprises and SMBs.  I think that enterprises are seeing the real issues with the MSP management platforms because they have the technical skill to assess those issues.  For the SMBs, I think the problem is more “subjective frustration”.  Nobody is talking with them about these points, and when they hear them they believe that the silence is because the MSP doesn’t offer the stuff or isn’t doing it right.  In either case, I don’t think the problems would be difficult to solve in most cases.

The product implications of this are harder to pin down.  If the virtual visibility/control boundary is made up of a number of “real” products or connections, some kind of management aggregation of the interfaces will be required.  If something tangible sits at the boundary, like an SD-WAN node or some network agent process, then that element may be able to provide a complete picture of the network.  Since most branch locations will be a single IP subnetwork, the visibility into that subnet is significantly improved if the boundary element provides DNS and DHCP services.

A final managed service driver that creates its own technology issues is security.  While managed service prospects don’t rate security as their primary driver (that’s support issues and costs), it places a close second and it’s becoming a driver for the primary issue.  Lack of technical support skills is one of the three biggest reasons for security problems for SMBs, and one of four for enterprises.  The contribution networks can make to security are primarily in the area of connection validation, a topic we include in “logical networking”, and Cisco recently purchased Duo Security for its ability to provide a broad strategy for access authentication.  However, security raises the problem of management system partitioning, because many companies don’t want to cede security policy control to a third party.  A strong managed service offering with good connection security could be a “killer app” for the whole space.

Another possible challenge for security as the basis for managed services is that security issues tend to be proportional to the number of user-to-resource relationships a company sustains in normal operation, which is only very loosely linked to the number of sites.  Managed services today, of course, are valuable proportional to the number of sites, so there’s a bit of “orthagonality” going on.  Security is a broader issue, and some MSPs might want to lead with it and then slip in managed services where appropriate, rather than try to redefine managed services in a more security-centric way.

Because of that, security may also redefine managed-service platform strategies.  As I’ve pointed out in a number of places (including our SD-WAN tutorial), security is a natural feature of “logical networking” and thus of advanced SD-WAN implementations.  However, you can also argue that SD-WAN is a natural feature of connection-validation security solutions, and we already see some companies (Fortinet, for example) making the move from the security space to a broader SD-WAN space.  That positioning, with security taking the lead, might be more palatable if security becomes the primary driver for managed services.

The final point here is that managed services don’t necessarily have to be simply a managed version of a traditional service.  MSPs in the SD-WAN space (the most active space at the moment) already include cloud hosting of elements, transport trunks that are combined with Internet connectivity, and other native-network features.  This is likely to expand both because of the desire for MSPs to differentiate their offerings, and because of technologies that abstract a variety of network underlayments into a common service model.

There is a lot of interest in standardizing various aspects of managed service, management itself not being the least of the targets.  These efforts, in the next three years or so, are almost certain to lag behind the new developments in the space, so we can expect managed services to get more diffuse and (some would say) disorderly in the near term.  That may be a good thing, because a lot of varied offerings are the best way for the market to vote on what’s really important.

Do We Have (Finally) a Real Cloud-Native Technology?

I’m a big fan of things like open source, Kubernetes, and now the Istio project that many have described as focusing on extend Kubernetes.  It’s hard for non-programmers to digest software architectures, and certainly nobody involved with Istio has made it any easier, but I think the portrail of Istio as a Kubernetes extension project is way short of the mark.   At the same time, I wonder if Google and IBM, the primary backers of the project, are perhaps expecting too much too soon.

Istio is really a microservice ecosystem, aimed at providing four things.  First is the connection, meaning integration, of the APIs involved in a distributed microservice deployment.  Second is security of the distributed microservice complex, and third is the control of policies and resources.  Finally, Istio provides for observation, the monitoring of behavior that forms the basis for making any software system stable and responsive.  Kubernetes really doesn’t do this stuff, so in that sense, Istio is an extension, but there are a lot of microservice tools and design patterns that don’t do any of this either.  Istio is in fact a step in a broader and essential thrust toward cloud-native behavior.

Work distribution is essential to multi-component applications, and in a sense, work distribution tools anchor applications within the model they were designed to support.  The idea behind Istio is that applications, particularly those designed natively for virtualization and the cloud, aren’t the familiar linear-bus-like processes of older models like SOA, but rather a mesh of services connected through API calls.  Workflows are more dynamic, more diverse, in this kind of structure, and the complexity of integration and management for the service mesh that Istio creates demands a new platform, which is really what Istio is.

This point frames a reality for Istio, which is that it may well be essential for the future but (for many, or even most), it may be difficult to justify in the present.  Microservice-based application models are well-suited for things like event processing and perhaps even some mobile front-end components, but most enterprise applications are transaction processing, and for these decomposition into a mesh of microservices is likely to require complete rewriting and introduce a lot of network delay, for limited benefit.

Another point is that not only is Istio not designed to extend Kubernetes, it doesn’t even require it.  In fact, it doesn’t really require any orchestration tool at all, but Istio itself can be extended by linking it to orchestration, including Kubernetes but also Mesos, Marathon, and so forth.  That’s because orchestration tools tend to create an implicit application model, and such a model provides a focus for the kind of operational integration Istio offers.  Not to mention that architected deployment and redeployment facilitates our four Istio goals.  Istio, then, is a part of this distributed-microservice, cloud-native, future evolution, not a complete toolkit to support it.

The principles of Istio are fascinating.  If you look at the Istio model functionally, Istio has the typical data/control plane separation that defines its basic architecture.  The core of Istio is a proxy-based distributed data bus that creates the Istio mesh architecture.  What that means is that there is an adjacent component of this bus (which uses Envoy as its basis) that is a “sidecar” to a microservice.  The microservices interact (meaning connect) via the Envoy sidecar, and the control plane of Istio does its monitoring and traffic management through control of the sidecars.

You can look at this as being the same basic principle as SD-WAN for branch offices or cloud subnets.  In effect, Envoy provides an overlay to the “real” network, and the “nodes” that create that overlay (the Envoy sidecars) are then able to abstract the real network services, control access via policy, and provide management information that would likely be unavailable if some point of monitoring/observing were not added.

Data traffic between microservices never flows directly; it always goes through Envoy proxies, and that’s the separate data plane piece of Istio.  The control plane is (at least in how I view it) created by another component, Mixer, that talks to the Envoy proxies and receives information on the flow of work among components and distributes policies that govern how the data plane works and how microservices can use it to connect.  This is why I said that Istio was in effect a distributed data bus, a logical form of the old Enterprise Service Bus (ESB) technology.  It’s also why it looks (as I’ve said above) something like an SD-WAN framework because it creates an abstract connection model that overlays the real network.

The data-bus association is important because distributed microservice meshes don’t have the kind of central control of work and policies that something like an ESB would offer.  If you have an application or set of applications that are distributed because that’s the right way to build them (event-driven applications are the easiest example), then something like Istio would be the right approach to building them.  If you wanted to modernize an existing application that’s componentized using an architecture like SOA and ESB, Istio would be the right way to frame the modernization.

But obviously the benefit of a distributed mesh depends on how much distributing you’re doing.  For most virtualization and cloud users, the Istio approach would be overkill without some reason to do a lot of distributed microservices.  The primary reason would be to support a highly scalable framework of components where new instances were regularly spawned and where load-balancing was therefore almost mandatory.  If this spreads across a hybrid cloud or multi-cloud, the need is even greater.

Google announced Istio (in conjunction with IBM) as a part of its evolving cloud strategy, and this is one reason I said in a prior blog that Google was working harder to change culture than to support current business needs.  Companies don’t typically run Istio-justifying applications today.  They would have to be taught to frame the way they think about “transactions” and “worker productivity” in a very different way in order to justify the microservice-mesh model that Istio is so good at supporting.

There’s been some talk about who supports this approach (like Google and IBM) and who hasn’t yet jumped on the bandwagon (Amazon, Microsoft, and Red Hat), but I think that’s also missing the key point.  Cloud-native is a class of applications that are designed for the cloud.  Istio is for sure a cloud-native idea, but a mesh of microservices is only one of many different possible cloud-native application architectures within the cloud-native class.  Further, as I’ve said, Istio isn’t a complete cloud-native framework.  I like it a lot, but I think there’s another step needed here before things like Istio can really take off, and it’s a step I’ve advocated before.

Application architectures are frameworks for building applications, and to be useful they have to be embodied in a toolkit that’s sometimes called “middleware”.  The middleware for the kind of application that runs only on virtual components, virtual hosts, virtual connections, and so forth, needs to be supported by what’s effectively a “Platform-as-a-Service” framework.  Istio should be a strong candidate for inclusion in that framework, but so should Google’s Cloud Spanner distributed RDBMS, and so should things like SD-WAN connectivity.  In short, to create true cloud-native stuff you need to abstract everything that an application sees, from its data through its hosting and connection, to even its users.  Istio doesn’t try to do that, but it needs somebody to do what it doesn’t.

We need service-layer abstractions to represent network connectivity today; that’s what I’ve been harping about with my “logical networking” theme.  We need a way to provide the cloud with the kind of workflow tools that we’ve had in monolithic data centers.  We need to think of virtualization in terms of how one writes applications to virtual abstractions of hosting and networking.  Istio might be a step to that, but it might also be just a tad too geeky to be accepted in the near term.  The business problem that Istio solves may be, for most enterprises, too far in the future to worry about.

Is Istio going to be a success, at least a pathway to that cloud-native future?  I don’t know, because the biggest problems that any new technology can face are that it requires a culture shift to understand at the business level, and that it’s too much of a stretch for most to understand it at the technical level.  Istio has both problems, and if backers like IBM and Google want it to revolutionize the cloud world, they’ll have to make it approachable first and comprehensive second.  That’s not happening yet.

As a former programmer and software architect, I really like Istio and what it represents, which is a step toward that illusive concept of a toolkit that programs the cloud like it was a server, that starts with an abstraction of resources and communications and build that into an application platform just like a traditional concept of OS and middleware would create, except for the virtual/cloud world.  I’d like to see Istio take off, and I think that the concept will eventually rule the programming world.  Just not the way, and at the speed, that Google and IBM hope.

The problem is that while software is essential to today’s world, and developers are essential to software, the process of buying and building software is still driven by business considerations first and foremost.  Google, who values culture changes above all else, and IBM, who at least used to understand business, may be able to somehow average out to something that can address the way tools like Istio can be made business-valuable and not just developer-interesting.

Two Steps to the Network of the Future

It is becoming clear that business networking is undergoing a transformation.  Like all “transformations” it’s built on a combination of revolutionary but still emerging technologies, and problems that have been growing for decades.  The mixture of these things creates unpredictable combinations, and results, so we can’t yet say for sure where we’re going to end up, other than that it won’t be where we are now.

The root issue for business networks is the challenges in cost-benefit that we’ve been seeing develop since the 1980s.  For a time, computing-driven enhancements to productivity were limited by the network’s ability to connect the new information-intensive applications to workers.  The golden age of networking came along because it was the limiting factor in applying new productivity options.  That problem largely solved itself by 2013 according to users I’ve surveyed.

It solved itself for larger businesses, that is.  Virtual LANs and virtual private networks (VPNs) came along to replace the private trunks and routers/switches, reducing both capex and opex for businesses big enough to consume them.  At some point, though, you’ve reaped the incremental pent-up productivity benefits, and now buyers only want lower prices for the services.  Lower prices and even less technical complexity are also the only pathway to getting small-to-midsized businesses on board.  One path toward increasing network utility is to get everything connected, so these smaller sites are important not only as new potential service revenue sources, but also as a step toward getting network-driven productivity enhancement on a broader track.

SMBs may be the key to the future of business networking, and the reason is simple; business networking requires business sites.  In the US, for example, there are about 5.8 million businesses and about 7.5 million sites, which says that about 1.7 million business sites are potential secondary-site multi-site network targets.  Of that group, about 1.2 million are satellite sites of large businesses (about 180,000 businesses fit in this category), and half a million are sites of SMBs.  The sites of large businesses are, according to my surveys, over 93% connected today, and those of SMBs are less than 25% connected.  Less than half of those connected have a company-wide “virtual private network” of any sort, meaning that their sites, users, and resources appear on a common IP network.

Even large businesses often have small sites.  Today, business VPNs connect only about a third of the connected large-business sites, while the rest use a variety of technologies to connect.  Just short of 60% of these large-business satellite locations are part of a unified company IP network.  Sites outside the US are a third as likely to have a business VPN connection as a US site, and less than 30% are on a company IP network.

Network operators worldwide agree that further growth in business VPN services based on Ethernet access is unlikely without price reductions that would lower net revenues for operators, because the business case for those connections quickly becomes untenable as the sites get smaller.  Remember that in general, the benefit of networking to a site is proportional to the number of workers there.  The smaller sites, whether from SMBs or enterprises, have less than 20 workers per site on the average (there are about 800,000 sites of this size in the US).

The big problem with business networking is the cost and availability of a common access technology.  Business broadband connections cost several thousand dollars per month for less bandwidth than a consumer could obtain for less than a hundred.  It’s been possible to get “business forms of residential broadband” for a decade or more, usually at between a 50% and 100% premium over residential bandwidth, which would still make it an order of magnitude cheaper.  However, since it’s currently not possible to get MPLS VPN technology extended to these sites (and because the operations costs would be high even if you had it), we’ve not been able to provide a VPN strategy that covered everyone, everywhere.

All of this demonstrates that growth in business networking, which is obviously directly linked to growth in the number of business sites being networked, means having lower connection costs for at least one option for small-site connectivity.  Operators have jumped on SD-WAN technology because it can make business site connectivity a viable option for sites with as few as four or five workers.  And, of course, because MSPs can offer that extension of the VPN if the operators don’t.

SD-WAN is just the best-known of what are probably a half-dozen different approaches to building a universal company VPN where a single business service can’t provide it at tolerable price points.  The majority of these either don’t extend the company VPN or use a more complex technology that requires more expensive equipment and a higher level of technical support.  What SD-WAN does could be done, in some way, by many SDN and tunneling technologies too, providing that these were absorbed into a managed service so the support of the remote sites was easier.  However, SD-WAN is an architected solution to a problem previously solved by custom integration, and that’s a very big step.

Today’s applications expect to run on an IP network, meaning that they expect all the features of an IP network to be available.  If you build company connectivity on a mixture of technologies that don’t add up to being a unified IP network, stuff isn’t going to work as users want.  It’s deadly to expose differences in connection technology at the application level.  Given that, the number one mission of any business networking strategy has to be create a company IP network as your VPN.

The fact that the Internet is IP and is also ubiquitous may be whe Internet has already emerged as the logical way to extend VPNs to more sites.  Where traditional wireline Internet technology doesn’t reach, you could in theory ride on Internet-over-cellular, Internet-over-satellite, or whatever.  The Internet is indeed what some people said it would be, which was the dial-tone of the data age.  There is a significant support cost advantage to having everything harmonized on Internet delivery except where true business access is justified, and of course the point where Internet delivery becomes the preferred option is changing in the Internet’s favor as consumer and consumer-like broadband improves.

Whatever the technology used, all of this combines to create a shift of business networks toward a more “abstraction-friendly” model, treating a VPN as an abstraction that rides on top of various transport/connection options.  Services need to be agile at a time when transport networks are still stuck in place by long depreciation cycles, and often by longer standards processes.  Whatever the network of the future is, it has to be more independent of the services of the future, or we’ll miss a lot of opportunity.

It can be that, whatever technology happens to take the critical steps.  My talk about “logical networking” is an attempt to predict that that service layer of the future will be, based on how needs are evolving.  I happen to think that SD-WAN is the closest thing to an architected approach to that logical-network-based service layer, but I admit that most SD-WAN implementations haven’t made that same jump.  We could still see other paths to the future emerging.

We also need to address the reality that eventually the “easy” (relatively speaking) task of connecting all the business sites into a unified network model will be completed.  My model says that by 2023, we’ll have connected three-quarters of qualified sites in the US, for example.  At that point, we’ll need to find new ways for networking to enhance productivity, new benefits to drive future network growth, if we want to prevent network services from commoditizing.