Does Cisco’s Blog on Pathways to NFV Really Lead Anywhere?

Cisco is arguably the technology powerhouse of networking, not to mention the marketing gorilla and the emerging leader in the efforts of network vendors to broaden their base in response to declining capex.  Network functions virtualization (NFV) is arguably the natural fusion of IT and networking, and a logical place for a vendor with broader server/switch/router aspirations to look to.  So when Cisco blogs on the topic, with “Three Pathways to NFV Success…or How to Tame the NFV/SDN Monster”, you just have to read it.  In fact, I recommend that you read it now, and read the full report available on Cisco’s site, before we look into the details.

I’ve surveyed operators since 1989, and I’ve been involved in NFV longer than the NFV ISG has existed.  I was chief architect of the first NFV PoC approved, and was named one of the most influential players in NFV by Light Reading in the first summer of NFV’s work.  I’m citing all of this to show that I have decent NFV credentials and contacts, and thus to validate my own views of the problem Cisco is addressing.

If there’s an NFV monster to tame, it’s that NFV adoption has been way more difficult than expected.  Cisco presents three pathways to NFV success (technology transformation, service-based transformation, and cultural transformation), and they provide survey data to back up their view.  My own surveys suggest that the Cisco results can and should be interpreted differently, but that the alternate interpretation can also be a pathway to NFV success.

What Cisco sees as three pathways to NFV is really more like three stages of operator denial.  Everyone started their NFV exploration based on the 2012 “Call for Action” white paper, which suggested that NFV would succeed by substituting commodity hosting and inexpensive software for costly proprietary appliances.  Within a year, the same operators who had promoted that first vision had accepted that there wouldn’t be enough capex savings, and that NFV would have to address opex.  In another year, they realized that they needed to address revenues too, “service agility”.  So, when Cisco’s survey found nearly 80% of respondents thought “Agility” was the prime justification for NFV, they were just saying that nearly all operators had followed the attitude evolution and were now in the end state…well, end state except one.

The last of the pathways, cultural transformation, really came about not from the operator side but from vendors and the media.  Even at the end of 2013 I was getting complaints from NFV vendor sales types, saying something like “The problem with NFV is that buyers don’t understand that they have a Holy Mandate to buy the stuff, and they keep wanting a business case!”  The media had their own problem, created by the fact that only sensational headlines get URL clicks.  I always said in conference speeches that “To the media, everything is either the single-handed savior of Western Culture, or the last bastion of International Communism.”  When NFV wasn’t taking off, the media picked up the “culture” story.

The facts are a lot simpler.  Even Cisco’s survey shows that NFV projects are owned in two-thirds of operators by the technology organization, under the CTO.  CTOs don’t make buying decisions, they validate technology.  From the first, the operators’ real moneypeople, the CFOs, said that they were not seeing any convincing business case for NFV, and they still say that by a large margin.  There are operators who have special interests, primarily managed services, and they have been able to justify a virtual CPE decision, but the truth is that today’s virtual CPE is more like the Verizon agile white-box strategy (you can load features into a CPE device) than like formal NFV.  About 90% of “NFV” stuff now in progress falls into this category, yet only about 16% of respondents to the Cisco survey said that lack of a business case was the problem.

Because they’re not really trying yet.  The majority of NFV projects are in planning or PoC, and this after four years of NFV progress.  Proving the concept of NFV couldn’t possibly take that many PoCs and that long a period, unless you’re really not proving concepts but groping for an overall concept to prove.  My own interpretation of where things are?  It’s that operators are trying to balance between a concept of NFV that has very narrow boundaries directly related to getting virtual functions hosted, and a justification of NFV that demands automation of a complete service lifecycle across all infrastructure in order to achieve agility and opex efficiency.  Right now, the first of the two is winning, and as long as that’s true, NFV won’t move forward.

We know that the “concept” view of NFV is still dominating by the Cisco survey data, too.  If operators recognized the service lifecycle automation goal as critical, we would expect to see them focusing on OSS/BSS and CIO issues as being important.  OSS/BSS gets only 4% recognition.  Same with orchestration.  In fact, only AT&T (with ECOMP) has any NFV model that has the scope necessary to actually make a broad business case, and while ECOMP has champions, it’s clearly not sweeping all alternative visions of NFV before it.  And even ECOMP isn’t complete in terms of its lifecycle management automation support.

Cisco’s three pathways blog, then, doesn’t seem to really address the issues that seem to have created the “NFV Monster” their blog title cites.  Particularly if Cisco, who has servers and server aspirations to transform their own business model, sees NFV as a pathway toward those carrier cloud data centers.  Perhaps the biggest problem with Cisco’s strategy is that anything they try now has risks.  Follow an open approach like ECOMP and you’re one vendor among many.  Try to promote a proprietary model of NFV and you’ve taken on a project that AT&T says involves millions of lines of code.  And, of course, there is simply no time to push such a model now.

What I’d love to see Cisco do is to frame a top-to-bottom service lifecycle management problem in service model terms, using YANG.  Cisco bought the technology with Tail-f, and it’s time to either demonstrate that it can do the whole job, or admit it can’t and fit it into some broader service lifecycle management story.  No technical evolution, or service evolution, or cultural evolution, to NFV can possibly succeed without making a business case, and no business case can be made without complete service lifecycle management automation.

You can dissect the “monster-driving” issues, say that the problem is the greed of the VNF vendors or the lack of interoperability at the VNF level, or the lack of specific standards for key interfaces and specific management metrics that can drive an SLA.  All of that is just noise unless you have a strategy that can make a broad business case, and that strategy is not there today except, in a prototype form, in ECOMP.  Just the fact that everyone hasn’t jumped on ECOMP already is proof enough that NFV at the heart is still just a science project.  Unless Cisco is prepared to move NFV beyond that, their blog is just PR.

Is Vyatta the Dross of Brocade’s Breakup, or the Hidden Gem?

I mentioned Brocade and its Vyatta virtual router yesterday, as a proof point that perhaps we have too simplistic a view of the value of “virtualization” in operator transformation.  I want to go into the story in more detail now, because there’s a lot of important stuff to be learned.

One of the most important metrics for a vendor is their ability to influence the strategic planning of their prospect and customer base.  In the early part of this decade, Brocade’s influence had been shrinking noticeably—by almost half, in fact, between 2010 and early 2013.  Then along came their Vyatta positioning and suddenly Brocade’s influence more than doubled in a single half-year period.  In the following four years, though, it fell back to slightly below its pre-Vyatta level.  This sequence is a story in itself.

Both operators and enterprises were initially enthralled by the notion of a software router.  In that critical spring, I met with a couple of Tier One vendors and listened as they laid out their plans to adopt the Vyatta router.  I even heard Brocade’s story that NFV wasn’t really about networking, it was really about the data center.  That story, as much as the virtual router, resonated with the network operators, but the NFV position ended up creating some poisoning down the line, as we’ll see.

What happened then?  In the fall of 2013, Brocade had already lost 13% of their spring strategic influence, according to my surveys.  They lost another 20% in the following spring survey, and you can see the trend.  Brocade never recovered from this decline.  At one level, what this says is that Brocade no longer had the influence to drive decisions and thus couldn’t exploit the insights it had communicated.  That’s true, but it begs the question of how they lost the influence in the first place.

Suppose you see a news story title that grabs you.  You click on it, and get a nonsense piece.  You don’t stick around to read drivel just because it’s there, you hit the back button and go on browsing.  If you apply this behavior to operator buyers, you see that it’s easy to generate interest in something, but the follow-on to that interest has to be just as insightful or more than the jazzy headline that created it.  Brocade, in that critical year after their insight success, had no credible follow-up.

What do you do with a virtual router?  Answer: Route.  You replace a physical box, in short.  That could be an attractive concept to operators that were first trying to control spending because of eroding profit per bit, and second were looking for a new and agile infrastructure model.  What they expected from Brocade was a viable story on how virtual routers would result in a net reduction in network spending, and how they could create that agile infrastructure.  In short, they wanted a credible virtual network architecture that could be costed out, analyzed, tested, and adopted.  They didn’t get it, for two reasons.

The first and most cynical reason is that even then we were in the era of hype.  The goal of most marketing is to sell engagement by selling editorial mentions in publications.  PR sells sales calls; sales calls sell products.  The problem is that if the PR is about some enormous, even-tectonic, shift in technology direction, the selling cycle could be long indeed.  Vendors wanted instant gratification; something to fill into the next quarter.  That offers an incentive to sell tests and trials and avoid the complex stuff, which in Brocade’s case diluted the whole strategic story.

The second, technical, reason is that most vendors didn’t have any idea what a good deal would look like to their customers.  Vendors couldn’t tell operators how their current capex was distributed, so even if they truly understood how NFV (or SDN) would impact some specific devices, they couldn’t project the impact on the bottom line.  I also had operator after operator complaining to me that vendors were telling them how much opex they could save, when that same vendor had no idea what their current opex was or how it was distributed!  In any event, you can’t plot out future operations costs if you don’t have a complete model of your new operations processes associated with a service lifecycle.

The effect of all of this was to create a climate of “claim-jumping”, not to find gold but to find a set of “benefits” that hadn’t yet been discredited.  The further things went afield from the simple notion of a hosted router instance, the harder it was to collect valid information on what the total cost of ownership might be, because the whole ecosystem was now changing.

Vyatta needed to be positioned as a cloud router, and it has two possible applications.  First, it’s a multi-tenant device like the real router.  You don’t instantiate it with a service order; it has the same relation to services as physical routers do—they share it.  Second, it’s a per-tenant router because we create service partitions below the router level, with virtual wires created by a combination of agile optics and SDN.  Now we don’t need massive multi-tenant routers because every tenant has their own router instances.  In the first mission, it has nothing to do with NFV.  In the second mission, it depends on infrastructure transformation to allow you to virtualize Level 3, and even then, it’s really still a cloud router and not a traditional NFV virtual function.  Because Brocade never had a story of its own, having lost the PR and influence wars, they were caught in the shifting tide of NFV, and Vyatta didn’t (and doesn’t) belong there.

This all may answer a question asked by SDxCentral, which is “What If a Service Provider Snagged Brocade’s Virtual Router?”  They’d probably have to make it open-source rather than “own” it, but even if they managed to dodge the anti-trust-and-collusion issues, would a service provider have the answers to the questions that Brocade couldn’t answer?  It gets back to the problem of creating a strategic model for the network of the future.  Operators are used to building networks from products and not to creating models on their own.  Add to this the fact that they can’t really collect in an operators-only group to work out the plans (back to anti-trust and collusion), and you see the problem.

Vyatta is a great virtual router, perhaps the best ever developed.  That doesn’t mean that it’s useful, or more important transformational, if it’s not built into that universal strategic model of the future network, the model we don’t have yet.  And that’s why Vyatta was left at the church after the Brocade break-up.  And why it may languish now.

There is a transformational model available for Vyatta; several, in fact.  I’ve noted before that if you framed networks on a series of agile electro-optical virtual wires with resilience and tenant segmentation, you could transform networking into islands of separate IP behavior, each of which would be well within the capabilities of a hosted router instance.  Operators aren’t going to drive this kind of transformation, only vendors can.  Perhaps, only optical vendors.

You don’t need the virtual-router and virtual-wire mission to validate the carrier cloud.  In fact, my models don’t suggest that it would add materially to the number of data centers operators deployed.  You do need that Vyatta-like mission to totally transform networks and services, though, and if you’re a vendor you need to understand that a new network model will inevitably emerge.  Eventually, Vyatta will be valuable, and somebody might be smart enough now to snap it up, and wait.

Some More Evidence that Consumer Opportunity Drives the Carrier Cloud

If the Internet is the source of technology revolution, perhaps we should remember that consumerism is the source of the Internet’s importance.  Revolutionizing communications wouldn’t have happened had the Internet stayed a haven for researchers and university eggheads.  Given this, we should look to what is or could happen with Internet consumerism to see what might happen in networking and tech.

Functionally, the thing we call “the Internet” is a three-level structure.  The first level is Internet access, which is what ISPs and mobile operators provide us.  The Internet’s second level is the content and experiences that the user is seeking, and the third level is the communications facilities that tie all that content/experience hosting into a common network that users then access.  Up to now, most of what we could call “innovation” in the Internet has been in that second content-and-experiences layer.  Innovation is going to continue there, but it’s also going to crop up elsewhere.

You are, Internet-wise, what your DNS decodes you to.  Users know Internet resources by URL, and the URL decodes into an IP address.  It doesn’t matter where that IP address terminates, and that has allowed content delivery networks (CDNs) to jump into the market to provide close-to-access-edge hosting of content to deliver a better quality of experience.  With CDNs, a URL is translated to an IP address of the best cache, which means that the DNS decoding is optimized for the experience.

Logically, this same approach could be applied to the next-generation “contextual” experiences that users want to have supported.  A good example of this is the classical auto-GPS application.  You want to know how to get to your destination, but also to have that route optimized with conditions and presented to you in useful turn-by-turn directions.  When you ask “what’s ahead” either explicitly or implicitly (via route optimizing), your “ahead” is contextual.  You could envision the processing of such a request as a request to decode a URL what_is_ahead and having the question then directed to the answer process for your current location.  That would eliminate the need for you to find sensors based on where you are, or wade through a zillion intersections to get results that were actually useful.

Another example of contextual requests is the “where can we meet for lunch?” question, which is contextual because you aren’t likely to want to pick a location it would take ten hours to reach.  In this case, what our where_can_we_meet URL would likely decode to a processor for the city in which the user resided, but it would have to consider the locations of the others implicit in the “we” part of the question.

The point here is that contextual enhancements to services would likely be handled by a set of processes that would obtain focus, and likely even hosting position, based on the geography of the people involved with the request.  Visualize a set of naked processes hanging about, to be instantiated where they are needed—cached, in effect, like content already is.

IoT creates a similar need set.  If you have sensors, they have first and foremost the property of location, meaning they are somewhere and it’s the surrounds of that somewhere that the sensors can represent as status.  If we wanted to interpret sensors to get insight into conditions, we’d probably think of the condition we wanted to review first (“what’s ahead?”) and expect it to be contextually decoded.

All of this suggests that future service will likely have cached processes to provide contextually relevant answers to users, which means they’ll have places to cache them, which means a migration of interpretive resources toward the edge, to ensure that the answer a user gets is still relevant when it’s delivered.

If the purpose of caching of content or processes is to get things close to the user, it naturally follows that experiences, features and content might be considered a kind of elastic set of properties that could be hosted anywhere based on QoE needs.  In some cases, the properties might get pushed all the way to a user device.  In others, they might live inside the Internet, even away from the edge.

An example of a migratory process is easy to find; look at Comcast’s upcoming xFi service capability.  They propose to create a more personalized and elastic Internet experience, one that exploits modules of connectivity (pods) that extend WiFi range and manages what repeater a user might exploit based on where they are in the home.  It doesn’t take much thinking to see that you could extend pods to host content and processes, and thus use them to build a multi-layer service like home control.  Think of a “control pod” and “sensor pods”.

All this could be based on a nice easy-to-use drag-and-drop home control programming system that’s hosted in the network.  The user inputs a floor plan (by scanning or building it with online tools) and locates sensors, controllers for lights and appliances, thermostats, sprinkler systems, and so forth.  The user builds the program to do what they want, and can test it by clicking on a sensor to see what effect it has on the program and the controllers.

Increasingly, the notion of controllable mapping between logical/virtual destinations (my process URLs) and real hosting points, seems certain to create a new Internet model where access and interior connectivity is pretty much plumbing, and the experiences and processes and content float around within a hosting layer that extends from what’s literally in your hand to what might be a world away.

You can see that, as it matures, this model creates a different kind of Internet.  There are two parallel worlds created.  One is the world of process-host caching, where the URL the user clicks (actually or implicitly) decodes to a hosted, optimized, process.  The other is where the URL decodes to a persistent host on the Internet and not part of the infrastructure.   We have the division informally for cached content today, but in the future it will probably be most important for the contextual and IoT processes.

This is the kind of thing that will drive “carrier cloud”.  NFV doesn’t have the same potential for the simple reason that having independent processes hosted for individual services doesn’t scale to consumer levels.  The most credible NFV application is virtual CPE, and yet we know from announcements like that of Comcast that WiFi is the most important element of a home broadband hub.  We can’t build a cloud-hosted WiFi hub, and now Comcast is extending what goes on premises with pods.

Another data point is that the Brocade break-up has left a conspicuous orphan, Vyatta, their virtual router property.  Why, if there was tremendous potential for virtualization of routing, would that be the piece left at the dance?  The answer is that multi-tenant services like IP routing are based on static locations based on aggregate traffic.  If I’m going to have a box in Place A to serve all traffic from all users there, what’s the value of making it a virtual box and not a real physical router?

There are a half-dozen drivers of carrier cloud, and we need all of them—including NFV—to get to the best possible place.  We’re not going to get them by ignoring that common process-hosting model, and every day we get more evidence of that.  Time to listen!

Taking the Practical Path to Transformation in Networking and IT

Transformation is hard.  That’s perhaps a simplistic way of summarizing all the things I’ve learned and heard over the last two months, but it certainly reflects operator views.  Transformation based on technology changes is so hard, in fact, that a growing number of operators (at the CxO level) aren’t convinced any more that it’s even possible.  Vendors, as I suggested in my blog last week, seem to have abandoned the role of “driving” transformation and adopted instead a position of “if you do it, I can sell you the products.”  We’re in for some confusing times, for sure.

The word “transformation” means “a thorough or dramatic change.”  It doesn’t mean gradual evolution, it means revolution, and a revolution is expensive, particularly when the value of incumbent infrastructure is enormous.  If we look at the issue from a financial perspective first, we can get an idea of why things like cloud computing, SDN, and NFV have fallen far short of early expectations.  Then perhaps we can understand how the problem could be fixed, or at least alleviated.

Let’s say we have $100 billion in investment in information technology and networking.  Presuming a 5-year average expectation of useful life and an even distribution of purchasing over time, we have $20 billion per year of assets that are written down.  To achieve a “thorough” change, we could reasonably expect to have to impact at least 51% of our infrastructure, which would mean two and a half times as much replacement as normal write-downs would allow.  The extra $31 billion would have to be offset in some way by benefits.

One CFO told me five years ago that “There’s nothing cheaper than what you’ve already bought.”  Savings in capital cost, which is what many “revolutionary” technologies offer, means nothing if you’re not expecting to incur capital cost because you already have something installed.  So, the target market for a “new” technology is, as a baseline, only the 20% of infrastructure value that’s passed useful life.  If we penetrated that by, say, 10%, we’re penetrating the real infrastructure by only 2%.

A two-percent change would take 25 years to reach revolutionary proportions, which hardly qualifies as a revolution or transformation.  This is why capex-driven arguments for the cloud, or SDN, or NFV are by themselves almost always doomed to fail.

So now, given this, let’s look at the periods where “transformation” actually worked.  We’ve had three past periods where the rate of growth in IT spending exceeded the rate of growth in GDP by a significant margin.  They were from 1950 through 1968, from 1982 through 1989, and from 1992 through 2000.  In all these periods, you can tie the happy IT outcome to a significant change in the relationship between computing/networking and workers/consumers.  The explosion in computing and the data center was the first wave, the second was distributed personal computing, and the third the Internet.  During these waves of positivity, we exceeded the current rate of growth in IT spending versus GDP by an average of 40%.

We’ve not had a wave of this happy kind since 2001, and part of the reason is that in the past we were “underempowered”.  We had, up to the PC, no personal IT at all.  We had, up until the Internet, no model for a consumer data service at all.  That underempowerment meant underinvestment, which means that the risk of trying a new (and transformational) idea was lower because you didn’t have to displace non-depreciated infrastructure.  I believe we could still promote a transformation on the demand-benefits side, but it will be harder.

Harder, but not impossible.  Let’s take a space I’ve studied in detail, and so have decent numbers to work with.  A network operator spends between 18 cents and 22 cents of every revenue dollar on capex.  They spend about 18 cents on profits returned to shareholders, and the remainder is operations and administration.  Within that, operators currently spend about 29 cents on “process opex” meaning the direct costs of service operations, marketing, and support.  Suppose we could cut that process opex cost in half.  The savings could be 14.5 cents per revenue dollar, which is between 66% and 81% of total capex.  The savings could allow operators to increase their capital budgets by at least two-thirds without changing their profits.

What this all demonstrates is that the only way to transform networking and IT is to start by doing something profound to the cost of operations.  Application lifecycle management for enterprises, or service lifecycle management for network operators, should be a primary automation target.  Not only would that cover a much higher rate of capital spending, which would make infrastructure transformation feasible, it would also reduce or eliminate the risks associated with adopting a new technology, by automating the way it’s operationalized.  Amazon and Microsoft both had cloud failures recently that could be directly linked to an operations error.  The right lifecycle management could have prevented that.

This is perhaps the most important lifecycle management benefit.  The ability to deploy new services and applications quickly, and sustain them through normal and abnormal conditions with far fewer errors, is fundamental to rapid introduction and stable operations.  Here is also where the carrier network space is helping advance technology overall.  DevOps, the deployment automation tool that’s been developed for enterprises, has not caught on nearly as well as it should have.  Perhaps that’s because DevOps tools are more about deployment than full lifecycle management.   With NFV, in particular, the carrier space has advanced the concept of orchestration, meaning a model-based handling of everything in a lifecycle.  What’s needed now is to make orchestration, in its fullest sense, universal.

Logically, you can start universal orchestration either at the top—with management systems—or at the bottom with the transport infrastructure.  Top-down orchestration has the advantage of generating a lot of opex savings with a very low capital investment.  In fact, you could achieve ROIs up to 50 times as high versus SDN or NFV-driven transformation, with top-down orchestration.  That doesn’t mean that you should never do infrastructure transformation, only that you could pay for more of it faster.  But starting at the optical level has the advantage of fundamentally changing the way networking is done.

Every layer of networking builds on the layers below, which means that if you were to take care of a lot of issues at the bottom, they’d disappear from requirements above.  Vendors tell me that about 80% of router code is associated with resiliency under load and failure conditions.  Suppose you dramatically reduced those issues by handling them at the optical layer?  One thing that hasn’t been considered widely is that SDN’s central control paradigm could be extended a lot further if you could assume that the physical layer of the network didn’t break often, or at all, and that path loading was instantly and invisibly handled by agile transport.

The transport or bottom-up model may have the best chance of success, because there is still a lot of room for vendor support.  The notion of transforming networking by diminishing the role of L2/L3 technology is understandably unappealing to vendors who make their money there.  For computer and white-box vendors it’s a different story, and for fiber players like Ciena or Infinera, the new model could be a big boon.  In contrast, it’s hard to see any network or IT vendor changing course to promote top-down service lifecycle automation at this stage, and the OSS/BSS vendors who might have the interest (and even, in some cases, some products) have to fight the division between operations-driven IT and network equipment that prevails among operators.  But so far neither group has done much, and so it’s possible that open-source lifecycle management will spread without vendor support.

You don’t certainly don’t hear about lifecycle automation from SDN vendors, or NFV players, or even optical players.  Perhaps that’s why we’re not already in a golden age of network transformation.  Perhaps that’s why The Economist has sort-of-postulated the notion of Amazon becoming the telco of the future.  They call it “cloudification” but IMHO that’s just editorializing.  Most of networking can’t be hosted, or shouldn’t be.  What makes Amazon (or, more likely, Google) a potential winner in a future war with telcos is that they’re planning their networks around services, and presuming that services will evolve quickly.  They don’t have a trillion dollars’ worth of infrastructure to depreciate to fund their innovations.  But the telcos could step out, not by adopting a cloud-hosted everything but by adopting a cloud-centric vision of automation of the service lifecycle.

This could extend to the enterprise, too.  There is little difference between deploying a service and deploying an application, or between the lifecycle requirements of the two.  The same scheme used for service lifecycle management could automate cloud deployment and application lifecycle management, and do the same even in the data center.  DevOps has always been about deployment, with only small recent innovations to address full lifecycle management.  Orchestration/automation of the full lifecycle is possible.

I think lifecycle automation could fix most of the tech problems we have, open a new wave of innovation.  But it’s not just selling the same crap and pushing for simple clicks on jazzy headline URLs.  We’re going to have to work at this, and transformation is hard, as I said at the opening of this blog.  Failure is harder, though.

Are We (Finally) on the Verge of Realizing SD-WAN/Overlay Network Benefits?

The modern view of a virtual private network is clearly trying to balance the “virtual” part and the “private” part.  Private networks based on dedicated per-tenant facilities were the rule up to sometime in the 1980s, when IP VPNs came on the scene and introduced shared-tenant VPNs.  Now the VPN space seems to be moving toward the use of the Internet as the shared resource, the “software-defined WAN” or SD-WAN.  Cisco’s buy of Viptela, an SD-WAN player started by former Cisco employees, is perhaps a critical validation of the space, and at a critical time.

SD-WAN technology is based on the notion of an overlay network, a layer higher than Level 3 (IP).  Each user gets a box (or software element) at each site, and that box terminates “tunnels” or overlay connections over which the user’s VPN traffic is carried.  In most SD-WAN implementations today, there is no presumption of internal “nodes”, and in effect the sites are fully meshed using these overlay tunnels.

For enterprises, SD-WAN can be a boon.  Internet connectivity is far less expensive than IP VPNs, and SD-WANs are also far easier to use and manage than MPLS VPNs that require implementation of BGP.  You can extend an SD-WAN to anywhere that Internet connectivity is available, and you can even make sites “portable”.  Since SD-WAN VPNs can be sold by anyone, not just by MPLS network operators, it’s inevitable that users will be exposed to them and consider them, so even some network operators are offering SD-WAN today.  Most use it to supplement their MPLS VPN offerings, but there’s a trend toward SD-WAN-only options.

The big problem with SD-WAN Internet overlay VPNs is the lack of QoS (or, in buyer terms, lack of a good service level agreement).  The Internet is a best-efforts service, which makes senior managers cringe when they consider that “best” efforts at any point might equate to “no effort.”  Some buyers also worry about security, given that Internet overlay networks can be attacked from the Internet using DDoS techniques because the endpoints are all addressable.

Mainstream equipment vendors haven’t been thrilled by SD-WAN, in part because they saw it as a threat to their carrier and even premises router business.  However, refusing to sell a given product doesn’t mean the buyer won’t get it, only that they won’t get it from you.  “Better to overhang your own product than to let someone else do it!” is how one vendor put it years ago.  Still, it does seem odd that Cisco would jump into the space now.  Why would they?

One reason is that operators and enterprises are already slow-rolling capital spending on networking, and seeking lower-cost options.  Cisco has never been a price leader, and price pressure either hurts their margins or (gasp!) shifts the deal to Huawei.  SD-WAN gear could offer Cisco a way of supporting VPNs at a lower cost than before, which makes their gear more attractive in a capex-constrained market.

Another factor is the explosion in the use of the Internet as a front-end.  Companies reach their customers almost exclusively through the Internet, and more of them use the Internet every day to reach workers, particularly mobile workers.  With the growth in enterprise commitment to the Internet, CIOs and senior managers have accepted much of the risk of Internet best-efforts service and even security problems.  SD-WAN is less threatening.

But there’s a new issue on the table now, that of “Internet QoS”.  While senior management tends to rate lack of an Internet SLA and lack of Internet security almost equally important in preferring another VPN technology over SD-WAN, CIOs say it’s lack of an SLA by almost five to one.  Internet QoS, in regulatory terms, means two things—paid prioritization of traffic and settlement among ISPs for premium handling.  In the US, both these QoS requirements were off the table—until recently.  Now FCC Chairman Pai seems headed to eliminating the FCC’s classification of Internet services under Title II (common carrier).  That would, based on my reading of the court opinions on prior orders, eliminate the FCC’s authority to regulate either paid prioritization or settlement practices.

Settlement and paid prioritization on the Internet could create an SD-WAN boom to end all booms.  It would almost guarantee that network operators would adopt SD-WAN on their own, and create a whole new managed service industry around third-party SD-WAN services.  Cloud providers like Google, Amazon, Microsoft, Oracle, and (now that IBM has bought Verizon’s cloud business) IBM, would probably offer SD-WAN in conjunction with their cloud services.  Given that the overall market trends favor SD-WAN anyway, the regulatory shift would only make things better and faster, and that could well be why Cisco is moving in now.

There’s also talk that an SD-WAN boom created by incumbent network providers or cloud providers could boost virtual CPE sales, which could boost NFV.  The only bright spot for NFV so far has been vCPE, but it’s not a huge opportunity given that most of the value would be in enterprises, and most enterprises already have their own solutions for vCPE features like firewall.  What vCPE needs is a camel’s nose, something that is new and thus not already fulfilled by customer equipment on the premises.  SD-WAN, anyone?

An Internet-overlay SD-WAN could also be much more tactical, since the only special network feature needed is the “paid prioritization”.  You could spin them up, expand and contract them, and change their QoS and capacity pretty much at will, because all would draw on an enormous reservoir of Internet bandwidth.  You could also implement 5G network slicing using SD-WAN, better support cloud services and content delivery.

Perhaps the most interesting thing about this SD-WAN trend is the impact it could have on global telecom infrastructure.  No more specialized VPN technology, clearly, but there are two seemingly contradictory offshoots of the trend.  First, the Internet could become a truly universal data dialtone service and eventually be the only “transport” service that’s available.  Second, all services including the Internet could become overlays on any mixture of transport that happened to be available.  Think the MEF’s Third Network.  Which of these might happen would likely depend on the pace of regulatory reform globally, the success of optical vendors in defining virtual-wire services, and the insight of SD-WAN vendors show in promoting the notion.

The notion of an overlay-driven service network isn’t new.  I encountered it almost ten years ago when Huawei proposed the “Next-Generation Service Overlay Network” concept to the IPsphere group, and subsequently got it moving in the IEEE.  We’ve never quite gotten the idea over the finish line in terms of fulfilling its potential, but it just might now be taking that critical step.  The question may well be whether Cisco is actually going to promote SD-WAN, or is just creating a nice home for some former engineers.

The NFV Torch Has Passed, Forever, to Open Source

Look at the quarterly reports from both vendors and operators and you see all the signs that the traditional model of telco and Internet services is slipping.  This, after more than a decade of supposed progress on technologies that would change that, most recently SDN and NFV.  So here are the questions we have to ask now; have key transformation vendors given up on transformation?  Is the wave of SDN and NFV now dead, or are we transitioning to a new vision with perhaps new players behind it?

Commoditization would seem to be the driving force today; vendors are suffering continued drops in revenue as operators place limits on capital spending.  Operators have long hoped to avoid this commoditization by “transforming” infrastructure, operations, and services.  Transformation, if it meets operator goals, could at least slow the negativity, but making transformation happen has been more difficult than most expected.  Let’s be honest—SDN and NFV have both undershot expectations.  It’s more than waving a magic SDN/NFV wand, apparently, and there are signs that vendors won’t be taking as big a role as vendors, operators, and the media all expected.

SDN and NFV have been the at-the-point technologies for transforming operators’ business models.  The initial notion was that they could radically impact capital spending, both by simplifying technology through software hosting of features and from the substitution of white-box switching for proprietary devices.  From the first, vendors seemed to have offered only lukewarm support for the capex-driven flavor of transformation because operator benefits would come at the vendors’ expense.

Starting last year, we saw a shift in focus for SDN and NFV, from reducing capex (which meant reducing the total addressable market for network vendors to go after) to reducing opex or improving revenues.  Operations efficiency and service agility were still framed as benefits for SDN and NFV, but the value proposition for opex-and-services transformation needs a lot more moving technology parts to fulfill.  From where?  Network equipment vendors, understandably unimpressed with strategies to lower their own TAM, should have lined the opex-and-services focus better.  Not so; network vendors have still held back even with the TAM threat off the table.  Now, even vendors who don’t have incumbent network businesses to defend seem to be pulling back.

The news that HPE was closing its OpenSDN unit (which interestingly came from ConteXstream who was at the time active in the NFV ISG) is the most obvious signal of change, but it’s only the most and latest.  Operators tell me that in 2017 there has been a decided shift in vendor positioning.  You can read it in the media, where SDN/NFV stories have either disappeared or turned to tales of disappointment.  Vendors would still be happy with SDN/NFV success, and perhaps even happier with transformation overall, but they are now looking to sell their own products into solutions that the operators themselves are driving.  The idea that vendors would offer pre-packaged products to drive things forward seems to have vanished.

Many vendors knew from the first that “transformation” was the right approach, or at least their deep thinkers did.  The challenge was that transformation is an enormous task, one that would involve years of effort at the sales level, considerable customer education, a broad product portfolio, and so forth.  None of this looked like it would add up to sales in the current quarter, so even the vendors who saw the opportunity were reluctant to grasp it.  I think some found reasons to believe that they could take a tactical approach.  Nobody really faced the transformation issue head-on.

Even operators tended to think of transformation in technology terms.  NFV in particular was launched to solve the problem of declining profit per bit, or at least to mitigate it.  It attracted enormous operator enthusiasm, and as a result it came to appear like a technology commitment rather than a possible vehicle to ride to transformation goals.  I think the big problem both SDN and NFV fell prey to was that they became the goal rather than the path to the goal.  Not surprisingly, most of the vendors in the NFV ISG believed at some level that NFV was going to be purchased, and that it was just a matter of who would win at the vendor level.  This was about products already validated, not about solutions that needed to be proven out.  Buyer and seller, then, both forgot where the transformation goal line was.

But the NFV ISG made the critical contribution of our time—orchestration.  Service lifecycle management has to be automated to make transformation work, and orchestration is the overall process to do that.  Had NFV orchestration been made more inclusive, had it morphed into service lifecycle management and adopted full end-to-end automation as its goal, it could have become the centerpiece for transformation.  I actually tried to do that with my CloudNFV project and my (constant, and yes sometimes nagging) protests on scope on the NFV message boards.  In any event, it didn’t happen.  Nor did it happen for SDN (I did a presentation to the ONF leadership on the same points, which also got nowhere).  Instead, both SDN and NFV focused on their own incremental issues, and that’s what’s got us to where we are today.

Because vendors didn’t push broad, benefit-rich, service lifecycle management and that forced operators to take a greater role.  AT&T’s ECOMP is the perfect example.  Since operators can’t form exclusive groups to develop stuff for the broad interest of the market (I saw two situations where lawyers told the operators they were in violation of anti-trust rules in the EU and the US by doing just that), they turned to open source.  So now the benefits that will drive the future are going to be realized by open source, which means that the vendors have played a major role in their own commoditization.

Once operators realize an open-source-driven service lifecycle management transformation, everything that SDN and NFV is and does gets subducted into that open-source effort.  It makes no sense, then, for vendors to be pushing the technologies—why not just get on the open-source-driven transformation bandwagon?  HPE’s dropping of SDN may presage their reducing their NFV-specific efforts too.  SDN and NFV will both succeed “above” the two technologies themselves, and whatever benefit vendors reap will probably come simply by selling into the opportunity created by the operators’ open-source initiatives.

We had a half-dozen vendors who could have made a complete NFV business case by 2014, including HPE who arguably had one of the most complete stories out there.  There is absolutely no technical reason why at least these half-dozen vendors couldn’t have stepped up to drive transformation forward, so what we’re seeing is the perfect example of a victory of tactical thinking over strategic planning.  Computer players like HPE had everything to gain by pushing hard for transformation of infrastructure that favored hosted features and software automation of service lifecycles.  Instead, they focused on very limited applications, and now that decision can’t be rolled back.

For years, I’ve favored the notion that vendor competition for NFV excellence would drive the industry toward the transformation goal.  Now, reluctantly, I have to admit that the vendor option is off the table.  Open source will win things, or lose them.

Don’t think this is a victory for operators, or for the market.  We’re a long way from even a useful answer to transformation in any open-source SDN or NFV project, and there’s a good chance that we’ll never get to optimality.  All along, what SDN and NFV have needed is competitive pressure driving multiple vendor solutions.  That’s why multiple open-source initiatives in the transformation and orchestration space don’t bother me; survival of the fittest could be a good thing about now.  But evolutionary forces are like geological ones, they take a long time to operate.  We could have gotten this right, from the very first.  I think a lot of vendors, and operators, will be sorry that didn’t happen.

In Search of the Founding Principles of the New Information Age

We need a new way to look at information technology, given the number of different things that are driving change.  The categories like “hardware” and “software” don’t suit a virtual world where it’s actually rather difficult to tell what or where something is.  Talking about “computing power” is complicated when the computer is virtual and is made up of variable, distributed, resources.  And consider “serverless” computing (see my blog of yesterday).  Is the network indeed the computer?  Is the computer actually the network?  We’re groping around in definitions, often rooted in the past, as we try to come to terms with where we are and where we’re going.

I propose an information-centric approach, one that takes the general concept of information and divides it into three dimensions.  First is the what-we-know dimension, meaning the scope of information that we have to process.  This is expanded by contextual analysis, event-processing, IoT, or whatever you like.  Second is the where-we-know-it dimension, meaning the place where information technology is applied to information.  Think edge, core, cloud, fog, etc.  Finally, we have the information-to-insight dimension, which is the analytics and AI dimension.  Our challenge is that all of these information dimensions is in a period of major change, and the combinatory possibilities are endless.  We’re not good at dealing with endlessness, especially for tech and business planning.  That’s why we keep trying to classify things, to evolve things we know already.

The what-we-know space is growing in no small part because of the increased portability of technology.  Our phones, for example, know a lot about where we have been, how fast we’re moving, and even in some cases whether we have our “normal” gait or a different one.  This information might have been available in a subjective way in the past, but recording it in detail and transcribing it into a technology-useful form could have taken all the hours of the day.  Imagine if you had to accurately trace the walk you’re taking both in terms of route, elevation gain and loss, pace, etc.  Because we have tech devices with us, and in things, we have IT access to a bunch of stuff that we never could have had, in a practical way, before.

The where-we-know-it dimension is essentially defined by the fact that the Internet and associated broadband and mobile technology build a kind of parallel universe that is truly parallel.  It spreads where we are, and go.  Because portable technology is usually “online”, that means that we now have a parallel IT universe that extends across the globe (and of course beyond, though most of us aren’t going that far).  Those with portable devices life in the usual real world, and also live in this parallel online world, and the fact that information is being generated all over that online world, it makes sense to presume that the same ubiquitous online-ness would be used to distribute work-processing more broadly.  Do we want to send information about us on the east or west coast to St. Louis for processing?  Why not process where we are?

The final information-to-insight dimension is about making use of the information we now have.  You can look at this dimension in two ways—by mission or by tools.  The tool-oriented perspective is one of analytics and AI, which tells you how you make the conversion from raw data to something that has personal or business utility.  The mission perspective is about how you apply information to create insight, a perspective which can then guide tool selection.  The best example of this perspective is what I’ve called contextual relationships.  I ask “What’s that?” and I have a question that can be answered only in context.

If context is so important, then it may be that we can add some meat to the requirements for the future of information technology by starting there.  We have five physical senses (sight, hearing, smell, taste, touch), and we use them to judge our place in the world.  Ideally, contextual processing would allow us to transport IT-equivalent information into our online world.  Since vision is our strongest sense, it’s almost imperative that we be able to recognize where we are and what we’re seeing, but in the online world.  We already see software that can identify landmarks from pictures, so the technology here is available and only needs enhancement in accuracy and speed of recognition.

The other senses have their own fairly clear paths forward.  Sound recognition, in the form of speech, is evolving quickly but still needs work.  Beyond speech, we have music recognition but we have only limited capability to use either in a typical social setting.  We have technology today that can analyze “smells”.  We could envision wearable “gloves” that could convey a sense of touch into the online world, and there are examples of at least the beginning of this technology in play today.

It’s easy to see (no pun intended) the impact of sensory-driven context augmentation.  In fact, TV commercials have illustrated the not-quite-possible-yet scenarios of “looking” via augmented reality at a street and “seeing” business names appended to the shops.  Google has an app that will (sometimes) identify landmarks.  Applying this to the workplace opens the door to having devices or glasses that let a worker find the right switch or valve, compare what’s there to what should be, etc.

One of the big questions is where sensory-translation happens.  Right now, we do voice and image processing more centrally, but the location of sensory translation depends on where it’s to be used, and for what purpose.  I think it’s likely that over time, as “contextual” starts to become more actionable in real time, things will migrate toward the edge.

IoT is a good example of this, though not one that (apparently) everyone accepts these days.  Raw sensory data is very unlikely to be made available or consumed, for a whole variety of practical, social, and regulatory reasons.  Instead, sensor information would likely be framed into a variety of contexts and made available as contextual information.  Imagine trying to decide what sensors represent information about an intersection you’re coming to, interpreting them, and then guiding a self-drive vehicle!  More logically, you would have “intersection slots” into which vehicles could be slotted, developed by “intersection processes” that interpreted the necessary data.

The three dimensions I’ve talked about here are all interdependent, which perhaps is why we have so much difficulty defining what a “revolution” in IT would mean, or how it would come about.  Supply and demand combine to make a market, and so things like cloud computing will fall short of their potential as long as we don’t frame our needs in a cloud-friendly way.  But what reason would we have to postulate new information relationships absent any way to fulfill them?  The groping process that’s underway now is all part of our new age.

There is a lesson here, of course.  We are talking about very fundamental changes in how we use information, how we get it, and where we host and process it.  Any of those things alone would be difficult to get everyone’s heads around, and in combination they create a new model of IT.  Everyone is going to struggle with it, but those who manage to get onboard quickly may have an enormous competitive advantage in the future.

Serverless Computing, the “No Machine”, and the Cloud/Network Relationship

What is “cloud computing?”  There have been two implicit, competing, contradictory, definitions up to now.  The first is that it’s “traditional computing hosted in the cloud”.  That implies that the value the cloud brings is largely cost displacement.  The other is that it’s “a computing paradigm designed to support virtual, dynamic, processes using a pool of virtual resources hosted anywhere, on anything.”  That implies that computing in the cloud has to be made for the cloud.  From the inception of the cloud, the first definition prevailed.  This year, it’s losing its hold quickly, and by the end of the year that old notion will be gone.

This is great if you’re a writer/editor, because it gives the whole cloud thing a new lease on life.  Nothing is worse to a reporter than having to rehash the same tired old points because nothing new is coming along.  “News”, after all, means “novelty”.  For the technologists on the seller and buyer side, though, the new cloud definition poses major problems in realization.  Eventually even the reporters and editors will have to contend with that issue, and nobody is going to find it easy.

“The cloud”, in our second definition, defines a truly virtual future.  Applications are made up of virtual processes or components.   Things run on virtual resources.  The mapping between the abstractions that are manipulated both to build and run things are essentially things that connect smoke and fog.  What we do today to build and run applications connect real things, entities that are somewhere specific and are addressed by where they are.

In an application sense, up to now, virtualization has focused on making something abstract look real by making the connection to it agile enough to follow the abstraction to where it happens to be.  We have a whole series of location-independent routing concepts that accomplish that goal.  The true cloud model would have to think differently, to identify things by what they do and not where they are.  Call it “functional mapping”.  We see an example of function mapping in a very simple way in content delivery networks.  The user wants Video A.  They have a URL that represents it, but when they click the URL they actually get an IP address of a cache point that’s optimal for their particular location, etc.

A more generalized approach is offered by Linkerd, a concept that’s been described as Twitter-scale operability for microservice applications.  Linkerd (pronounced “Linker-Dee”) was the focus of a recent announcement by startup Buoyant, and it provides what Buoyant calls a “service mesh”.  The idea is to provide a process communications service that acts as a proxy between the request for a microservice/process and the fulfillment.  Instead of just doing API management, Buoyant/Linkerd adds policy control and load balancing to the picture.

By integrating itself with instance-control or scalability APIs in the container or cloud software stack, Buoyant/Linkerd can allow an application to scale up and down, while distributing work to the various instances based on a variety of strategies.  The load-balancing process also lets an application recover from a lost instance, and since the technology came from Twitter, it’s proven to be efficient and scalable to massive numbers of users.

There are still a few issues, though.  At this point, the Linkerd model doesn’t address the critical question of state or context.  Many software components are written to support multi-message exchanges where the processing of the entire sequence is needed for any given message to be processed correctly.  These components can’t be meshed unless they’re designed not to save state information internally, and that opens the question of “functional programming” that doesn’t allow for saved state and thus can support service meshing and the instantiation or replacement of services without any context issues.

Both Microsoft and Amazon have functional programming support in their cloud, and Amazon’s is called “Lambda” because the programming/technical term for functions that don’t save state is “lambda function.”  You can run a Lambda to process an event, and it can run anywhere and be replicated any number of times because it’s stateless.  Amazon charges for the activation and the duration of the execution, plus for the memory slot needed, and not for a “server” which gives rise to the notion of serverless computing.

It’s easy to get wrapped around the notion of serverless computing because it’s a different pricing paradigm, but that’s not the news in my view.  I think “serverlessness” is a key attribute of the future cloud because it breaks the notion that software has to be assigned to a server, even a virtual one.  Amazon may think so too, because it’s recently announced wrapping Lambda in a whole ecosystem of features designed to make application hosting in the cloud serverless and at the same time deal with the nagging problem of state.  The future of the cloud is less “virtual machines” than no machines.

Amazon’s Serverless Computing has about nine major elements, including Lambda, orchestration and state management, fast data sources, event sources, developer support, security, and integration with traditional back-end processes.  There’s an API proxy there too, and new integration with the Alexa speech and AI platform.  It’s fair to say that if you want to build true future-cloud apps Amazon provides the whole platform.

You can see where this is heading, and probably some of the issues as well.  We really do need some new compute models for this, and one possibility is a variation on the notion of source routing and domain routing that came along (of all places!) with ATM.  Envision this lonely event popping into the process domain somewhere.  A lambda function picks it up because it’s associated with the event type, and based on some categorization it prepends a rough process header that perhaps says “edit-log-process”.  It then pushes the event (and header) to another “edit-front-end” lambda.  There, the “edit” step is popped off and a series of specific steps, each represented by a lambda, are pushed on instead.  This goes on until the final microstep of the last “process” has completed, which might generate a reply.

In this approach, the “application” is really a set of steps that are roughed out at the event entry point and refined by a gatekeeper step for each of the major rough steps.  Nobody knows where anything is, only that there are events and processes that are swirling around out there in the aether, or waiting to be instantiated on something in the Great Somewhere when needed.  Function-source-routing, perhaps?

We are a long way from understanding and utilizing this sort of approach fully, but clearly not far from dabbling in it to the point where it generates enough opportunity to interest the cloud providers like Amazon.  You can use the model as a front-end to traditional applications, to link them to mobile empowerment or IoT for example.  That will probably be the first exposure of most companies to the “new cloud” model.  Over time, highly transactional or event-driven applications will migrate more to the model, and then we’ll start decomposing legacy models into event-driven, lambda-fulfilled, steps.

This is where the real money comes in, not because of the technology but because what the technology can do.  Event-driven stuff is the ultimate in productivity enhancement, consumer marketing, contextual help, and everything else.  We, meaning humans, are essentially event processors.  When we can say the same thing of the cloud, then we’ve finally gotten computing on our own page.