Is There a Value in a “Software-Defined Internet?”

How personal should a network be?  The vast majority of things I could find on the Internet, I never want to see.  The vast majority of people who could reach me, or who could reach, are those I never want to talk with.  Enterprises tell me that the great majority of the possible user-to-application or worker-to-data relationships their networks make possible are barred for security/compliance reasons.  Spam is defeating the utility of email for many, and search advertising is making finding useful stuff almost impossible.  Are we doing the right thing here?  Is there an alternative?

How flexible should a network be?  We surely have applications today that are fine with best-efforts services.  We surely have applications that demand some fairly rigorous SLA.  Can we build an efficient infrastructure to satisfy both these goals?  Is the extremely low cost of Internet bandwidth creating a kind of destructive competition for better-grade services, and preventing them from developing?

I’ve been looking over enterprise responses to questions on the Internet, email and messaging, virtual and private networks, and it’s interesting to see what the decision-makers think.  It’s also interesting that they respond differently to issues and questions, depending on whether they are talking as representatives of their business or as consumers.  The differences themselves may tell us a lot about the future.

As consumers, decision-makers are concerned about loss of privacy and what they see as the distortion created by ad sponsorship.  Every decision-maker thinks that too much of their personal information has been captured.  Do not track doesn’t work, they say.  Most cite examples that I can identify with; I do a search on a camera that I happen to own and next time I go to a news website I’ll see an ad for the camera, do-not-track notwithstanding.

As decision-makers, their big problem is bias in information.  While nearly everyone agrees that there is more information on the Internet than they’d ever have had access to before, most also believe that the information can be trusted less.  Back in 1991 when I started surveying what influenced buyers of technology, there were at least two technical publications that were in virtually everyone’s top five.  There are none there today.  People believe that online opinions, even consumer reviews, are bought and paid for.

Of course, the same people who worry as consumers about privacy are eager to exploit online advertising on behalf of their own companies, and most defend paying in some way or another for editorial mentions or analyst opinions.  They also say today that it’s smarter to spend to promote what you have than to pay to figure out, then do, the right thing in the market.

It’s obvious that you could make a sociological thesis about this topic, but that’s probably not helpful to technologists who read my blog.  Two tech questions suggest themselves; has consumerism and the Internet contaminated our whole model of communication and information dissemination to the point where it has to be fixed, and what might a fix look like?

Skype has an interesting approach to communication that might offer a starting point.  While you can set up your Skype account to permit calls or messages from anyone, you can also say that you’ll accept only contacts from someone in your contact list.  That forces others to request to join your list as a condition of communication.  LinkedIn lets people you’ve connected with send you messages but limits what others can send you.  Explicit communications, based on what is in effect an invite or closed user group, has been around for a long time.

One fair question to ask is whether systems like this should be used for email, or at least be made available.  Yes, you can block email except from a safe senders list, but how does somebody get added to that if they can’t contact you?  It’s obviously possible to do better at controlling email access, and were that done it’s possible that email would be less of a risk and an intrusion than it is now.

On the network side, there are both subtle and obvious questions.  In the latter category we have the question of whether virtual networks should be composable on a personal level.  Could I, for example, build a virtual-Internet-of-one that contains only the sites I admit?  Could I then, based on a search, find other sites and elect to admit them?

The subtle question, which also relates to virtual networking, is whether the fact that the Internet is a low-cost and ubiquitous underlayment for virtual-network services is effectively limiting the virtual-network space by creating what amounts to a polarized option set.  You can pay little and get an Internet overlay, or you can pay a whole lot to get a true private network.  In the former case you get best-efforts services, you still have DDoS issues, etc.  In the latter you can have a real SLA and more security.  Wouldn’t it be nice to have a more graduated set of options, opening more-than-best-efforts to a larger community?

There’s obviously no technical barrier to offering SLAs on the Internet, since we can do SLAs on private IP.  The problem is one of public policy, which relates to my opening question of whether our consumer vision for the Internet is impacting our overall vision of networking.  Settlement and “paid prioritization” are seen as being anti-consumer, but they’re mandatory if the Internet as a ubiquitous data dialtone is going to be meaningful.

Operators tell me that the biggest problem in profit compression is the Internet.  Internet bandwidth is low-margin to begin with, and it’s also broadly available as the foundation for virtual network services and SD-WAN.  This means that it becomes more difficult to develop an independent QoS-capable network with the Internet’s magnetic low costs.  It’s also difficult to personalize the Internet because that would, to many, smack of censorship even if the users themselves implemented the subsetting.  If we presume that the technical pathway to a true IP dialtone lies in the expansion of Internet infrastructure to be IP-dialtone infrastructure, the barriers are probably insurmountable.

Should we be allowed to “subset” the Internet both in terms of virtual subnetworking and in terms of QoS?  Should the fabric of the Internet support any valid business mission, and the application of that fabric then conform to public-policy goals?  The only way to make everything work is to make the Internet into a virtual network too, into a “software-defined Internet” or SDI.

An SD-WAN is an overlay network with virtual endpoints set as needed.  SDI would be the same thing, and the underlayment could then be either a global IP network or the MEF’s Third Network…or any combination of underlayment that offers you the scope and QoS that you want.  Since the Internet is defined by who’s on it and how it’s addressed rather than by the technology used, this would let it continue to conform for consumer-driven regulatory policy and even offer only best-efforts services.  But this approach would also let you personalize your view of the Internet, and other virtual-network services for business could coexist on the underlayment.

There has actually been a project to address this vision, started by Huawei almost a decade ago and codified by the IEEE in p1903, Next-Generation Service Overlay Network or NGSON.  The architecture for NGSON is described here, and the project is still active, though I’ve not seen much publicity on the concept.  What NGSON seeks to do, technically, is to create an overlay that can bind applications, underlayment features, and user/provider policies into a single element that can then serve as an exchange point for all of these components and stakeholders.

NGSON joins the MEF’s Third Network as a kind of generalized overlay model, and there are a half-dozen IETF proposals that introduce virtualization concepts to bind an overlay and IP underlayment, obviously to the benefit of both the IP router vendors and those with large investments in routers.  I think that in theory any of these could be used to build a SDI, but the mechanism for market adoption would be difficult.

Regulatory policy on consumer networks has shifted to a more consumeristic bias over the last five years in both the US and Europe.  The current picture would appear to put operators in a difficult position were they to adopt an overlay/underlay model that explicitly allowed for parallelism of consumer services and “the Internet”.  That’s certainly true in the US, for example.  In addition, a transformation to an SDI presents some major issues in terms of sunk costs and evolution.

I think it’s clear that the Internet isn’t going to serve all of our network needs, and that the Internet as currently structured forces unfavorable privacy trade-offs and also limits service quality.  However, transforming it directly would demand a major shift in policy, something that’s not likely to gain support in a polarized political environment.  What might have to happen is for SDN to transform networks from the bottom, and implementation of an overlay model could then evolve within that transformation.

Making “Digital Transformation” Real

Brocade did an interesting paper on the topic of digital transformation, something you’ll recall was also a fixture of the Netcracker launch of its Agile Virtualization Platform.  Reading it, it occurred to me that the concept of “digital transformation” is widely accepted and not usefully defined.  That was one of my conclusions on the Netcracker report that also used the term, you’ll recall, and it’s clearly demonstrated in the Brocade survey.  It also occurred to me that some statistical market research I did several years ago might point to a definition, and perhaps even a path forward to achieving the goal.

The Brocade paper documents a global survey that, to me, shows IT organizations groping with the critical question of relevance.  They’re facing budget constraints, “shadow IT” and other things that demonstrate that line departments and senior management want something from IT, something that IT can’t deliver.  Is that something “digital transformation?”  Perhaps it is, if we define the concept appropriately.

Facts and industry history might be a good place to start our quest.  What I found in my research is that the rate of change in IT spending relative to changes in GDP over the last 50 years, when graphed, form a sine wave.  If you then lay in key IT innovations along the timeline, you see that the peaks of the wave (when IT spending grows faster than GDP) came when a paradigm shift in IT opened new benefit sources for exploitation.  The curve then rose for four to eight years as businesses absorbed the benefits of the new paradigm, then dipped down as things fell into cost consolidation when all the gains had been realized.  Then they picked back up again in the next cycle.  There were three major IT cycles prior to about the year 2000.  There have been none since.

I thought when I uncovered this relationship that we were now awaiting the next cycle, a cycle that would reignite benefit-driven IT.  I realize now that’s not the answer.  What we’re looking for is an agile IT model that doesn’t demand big technical paradigm shifts at all.  Every year, we get more invested in technology, and cyclical transformations that involve new computing models (minicomputers, PCs) are simply not practical.  What’s needed from “digital transformation” is a restructuring of IT to allow it to couple continuously with business operations through all the twists and turns.  Four or eight-year realization?  Forget it.  Digital transformation has to facilitate almost instant adaptation, and then re-adaptation.  Nothing singular and simple is going to generate it.  It’s a whole new IT and networking model.

That starts at the top.  If you want “digital transformation” you have to transform the relationship between applications and the business, which starts by redefining how you conceptualize the role of applications in supporting workers.  In the past, we’ve used software to create a framework for production, and built human processes to fit the software model.  That’s why the advent of minicomputers and personal computers and online services have been transformative—they let us build the IT model “closer” to the worker.

The line departments think this “framework” has become a cage, though.  Because software processes are highly integrated with each other and increasingly just touch the worker through a GUI, they aren’t particularly responsive to changes or opportunities that demand a more fundamental shift.  We can see this in the way businesses struggle to support mobile work.  Mobile-worker productivity support is effectively contextual, personalized, IT.  That eliminates an application-driven flow in favor of linking IT support to worker-recognized events.

It’s not happening today.  In my most recent exchange with enterprises, I found that almost three out of four said their mobile strategy was to make their applications work on a mobile device.  It should have been focused on defining the right information relationship to support a mobile worker.  But how?  By making applications completely event-driven.

Making something event-driven means taking the workflow structure out of applications, and instead looking at any activity as a series of events or “happenings” that suggest their own process support tasks at the application level.  This, you’ll undoubtedly recognize, is an enterprise version of the event-driven OSS/BSS trend in the operator space.  The realization of digital transformation must be to first create an event-driven software structure, then create an agile platform to host it, and finally to create a set of network services that properly connect it.

Microservices, meaning small functional atoms that use web-like (“RESTful”) interfaces to obtain the data they operate on, are generally considered to be the key to making OSS/BSS event-driven and that’s also true for enterprise software.  However, it’s not a simple matter to change software to create microservices.  Many software components are “stateful” meaning that they store data across multiple interactions, and that forces them to be used in a flow-driven rather than event-driven way.  However, it’s almost certain that microservices would be the first step in supporting a realistic model for digital transformation.

Underneath this first step there are a few things that could be done at the developer level.  One is the embodied, in Java, as the Open Services Gateway Initiative, or OSGi.  OSGi lets components be developed and deployed either as locally connected pieces of something or as distributed elements.  If you were to develop all software components using OSGi principles (in whatever language you like), you could host stuff where it makes sense, meaning either as network-connected or locally bound processes.  This is critical because you can’t build efficient software by putting every little function on the network as a separate microservice, but what can be made network-hosted and what for efficiency reasons has to be local would vary depending on the specifics of the application.

Another essential transformation in software development is functional programming.  Functional programming basically means that software elements (“functions” in development terms) take all their arguments from the thing that invoked them and deliver all their results back the same way.  Everything is “passed” explicitly, which means you don’t collect stuff internally and can’t accidentally become stateful.  Microsoft and Oracle (with C# and Java 8, respectively) support this today.  Functional programs could easily be shifted from internally connected to network-connected, and they’re also much more easily adapted to changing business conditions.

Hosting this framework is what justifies containers versus virtual machines.  The more you componentize and microservicize something, the more inefficient it is to host a dedicated version of OS and middleware for every software instance you deploy.  In fact, in this framework it’s really important to understand the transit delay budget for a given worker’s support and then distribute the components to insure that you can meet it.  Data center design and connection is paramount, and data center interconnect is a close second.  Networking has to create and preserve a distributed and yet responsive virtual platform to run our agile microservices.

All of these technical shifts are recognized today.  Most developers know about OSGi and functional programming.  Everyone knows about containers and microservices.  What seems to be lacking is an architecture, or even a philosophy, that connects them all.  Could it be that the concept of digital transformation is shrouded in definitional shadows at the high level, and obscured by technical specialization below?  The pieces of digital transformation are all laid out before us, but “us” is organizationally divided and unaware of the high-level goal.  That’s why we can’t even define what “digital transformation” means.

Knowledge is power, they say, but only if it’s a precursor to an informed decision.  Surveys can tell us there isn’t a consensus on digital transformation today, but since there isn’t one to know, we can’t survey to find it.  I think event-driven, agile, IT that supports rather than defines work is what digital transformation has to be.

Presuming my definition here is the right one, it’s a compelling goal but it has to be justified for each company by a business case and not just by consensus, and that business case has to be decomposed into specific steps that can approved and adopted.  Surveys show commitment and confusion, but confusion has to be resolved if investment is to come.  Now that we know the buyers want something, we have to define and then take the appropriate steps.  If Netcracker or Brocade (or Cisco, Dell, IBM, HPE, Microsoft, Oracle, Red Hat…) can create that chain from concept to benefits to technology steps, then they’ve started the march along the path to digital transformation, to their benefit and ultimately to the benefit of the industry.

Does Google’s New Personal/Home Assistant Change the OTT Game?

Google opened a lot of interesting topics with its developer conference this week, and I think there’s a common theme here that aligns with other industry moves and foretells something even more important.  We are moving closer to the concept of the digital assistant as our window on the world, and that could open a big pathway to a much greater use of cloud computing.  In fact, it could become the largest cloud application.

The mobile apps phenomenon was built, perhaps primarily, on the basic truth that a mobile user does things differently than a desktop user.  Not only are they restricted in terms of interaction real estate, they are more tactical in their information needs.  They want answers and not information, and while that goal may have been formulated in mobility, it’s maturing even in the home.  Amazon’s Echo and Alexa seem the inspiration for Google’s assistant and Home products.

Home control based on buttons and dials is self-limiting simply because there’s a limit to the number of these direct controls a user can keep track of.  Voice control is a much more intuitive way of handling something—tell your digital assistant to turn off a light or ask it what conditions are outside.  The number of things you can control may expand, but your pathway to exercising control is always that singular assistant.

Google seems to have drawn inspiration from another source—Facebook—for the second big move, which is the Allo app.  Allo is a chat-driven agent, similar to Facebook’s enhanced Messenger.  What seems important in Allo’s positioning is the enhanced notion of context, which you can see in the quote on Google’s blog, “Because the assistant understands your world….”

Context is critical in exploiting the interest in wearable technology, the support for which is also being increased according to Google.  Wearables offer a closer-to-life interface than a phone or tablet, which means that they naturally expect a more contextual level of support.  The notion of suggested responses to texts, for example, demonstrates contextual value and at the same time makes a watch accessory more useful.

Google’s emphasis on these points isn’t new, either to Google or to the market, but I think it makes it clear that a contextual personal agent war is going to happen, involving Amazon, Apple, Google, and Microsoft.  That’s going to accentuate both context and personal agency, and that’s what could be a game-changer, in terms of network and cloud infrastructure and even in terms of the OTT revenue model.

Logically speaking, both contextual input collection and personal agent analytics would be more effective if they were hosted locally to the user.  The most demanding contextual analysis is surely based on geography, both of the specific user and the user’s social and retail frame of reference.  IoT is an example of a contextual resource, and if you’re going to analyze the conditions in New York it makes sense to do that in New York because otherwise you’re hauling telemetry a very long way.  Similarly, if you have a New York user asking questions they’re probably relating to that user’s home, work, or immediate environment.

All of this argues for a wider distribution of cloud resources, and I think this is magnified by any context-and-agent wars among vendors.  Google probably has greater geographic scope than the others, so wouldn’t they want to play on that benefit?  And if there’s wider distribution of cloud resources then there’s more resources local to any user, any mission, which could encourage competition among cloud users for “local” facilities whose propagation delays are smaller.

The hosting of agent and contextual processes is clearly a cloud opportunity, but it also has implications in networking.  If we assumed that every user went to an agent approach, then search-related and even casual web access might all migrate there, which would mean that non-content traffic could be short-circuited into a cloud agent rather than sent to a user/device.  While content traffic is the great majority of traffic overall and the largest source of traffic growth, most content traffic really doesn’t transit the Internet, but lives inside content delivery networks.  Something that frames web flows differently might have a very large impact on how “the Internet” really gets connected.

One of the major challenges that this happy transformative trend has to face is that it potentially undermines the whole OTT model.  An agent doesn’t deliver search results, at least not a good one.  Obviously a verbal request demands a terse verbal response.  That means that the market model shifts away from search, which means that Google in particular has to be thinking about how to become a leader in delivering “contextual advertising”.  This is yet another incentive for context and agency to expand greatly in importance.

I’ve said before that neither contextual communications nor personal agency can work well if you assume all the data is collected and correlated inside a mobile device.  What has to happen is a combination of two things.  One, a personal agent has to evolve to become a personal cloud process that’s doing the analyzing and responding, through either a mobile device or home-control portal.  Second, contextual information has to be collected and subject to ad hoc analytics to generate useful insights into the meaning of a request, based on social, environmental, and retail conditions.  These notions can be expanded to provide ad support.

Contextual services of any sort could be presented through an API for use, and access could be either paid or based on ad sponsorship.  Environmental and geographic context readily adapt to this approach; you can imagine merchants promoting their sales as part of an application that guides someone to a location or along a path, particularly if they’re walking.  Even the personal agent processes could be sponsored or paid on a subscription basis.

Google may be at a disadvantage versus the other players (Amazon, Apple, and Microsoft) who might address personal agents and contextual programming, simply because the shift we’re discussing would directly impact their primary revenue stream.  The other players have a more retail-direct model that relies not on sponsorship but on user payment for the services, directly or through purchase of a device.  However, because these trends are largely promoted by the increased dependence on mobile devices, it’s hard to see how Google could simply refuse to go along.

Facebook might be the determinant here.  Context is social as well as geographic, and Facebook has a direct link to the social context of most mobile users.  They know who’s connected to whom, what’s happening, and increasingly they can tell who’s communicating directly.  Adding geographic context to this is probably easier than adding social context to geography, at least in the simple sense of locating the user(s) in a social relationship.  IoT would be a difficult addition for any player.  Could Facebook launch an IoT approach?  It seems outlandish, but who would have thought they’d have launched Open Compute?  And Facebook’s core application isn’t subject to disintermediation like search is.

All face-offs don’t end up in active conflict, so this contextual-agency thing may or may not mature to the point where it could really impact the market.  If it does, though, it’s going to bring about an almost-tectonic shift in the business model of OTT and a truly tectonic shift in the organization and connection of online information.

Can Second-Tier Vendors Win in a DCI-Centric Model of Infrastructure Evolution?

Juniper had a Wall Street event earlier this week and analysts used terms like “constructive” and “realistic” to describe what the company said.  The central focus in a technical sense was SDN and the cloud, not separately as much as in combination.  Juniper’s estimates for growth through 2019 were slightly ahead of Street consensus, so the question is whether the characterizations of Street analysts are justified, not only for Juniper but for other wannabe network vendors still chasing the market leaders.

Juniper isn’t alone in saying that the cloud, and the SDN changes likely to accompany it, are going to drive more revenue.  The view of most network vendors is that sure, “old” model spending is under pressure, but “new” model spending will make up for it.  There are of course variations in what constitutes the old and new, but I think it’s fair to say that most vendors think the cloud and SDN will be a growth engine.  How much of an engine it will be, IMHO, depends on how effectively vendors address the drivers that would have to underpin the change.

Let’s start with the optimistic view.  If my model is correct, carrier cloud (driven by SDN and NFV) would add over 100,000 new data centers globally.  All of these data centers would be highly connected via fiber, and obviously they’d be distributed in areas of heavy population and traffic.  If we were to see these data centers as deployed purely for cloud computing, that in itself would generate a decent opportunity for data center interconnect (DCI).

If we presumed these data centers were driven more by SDN and NFV it could get even better.  For example, if all wireline broadband were based on cloud-deployed vCPE, then all wireline access traffic would be aggregated into a data center, which means that it would be logical to assume that almost everything in wireline metro aggregation would become a DCI application.  And given that mobile infrastructure, meaning IMS and EPC, would also be SDN/NFV-based, the same would be true on the mobile side.  All of that would combine to create the granddaddy of all DCI opportunities, to the point where most other transport fiber missions except access would be unimportant.

If I were a vendor like Juniper with a commitment to SDN and the cloud, this is the opportunity I’d salivate over, and shortly after having cleaned myself up, I’d be looking at ways to promote this outcome in infrastructure/market evolution terms.  It’s here that problems arise, for Juniper any anyone else who wants to see a DCI-centric future.

The media loves the cloud, but the fact is that even Amazon’s cloud growth hasn’t been able to get cloud computing much above the level of statistical significance in terms of total global IT spending.  We still have a lot of headroom for growth, but if we assume that enterprises’ own estimates of cloud penetration are accurate, we would probably not see cloud computing generating even a fifth of that hundred thousand data centers.  Most significantly, cloud computing doesn’t drive the edge-focused deployment of data centers that SDN and NFV do, and thus doesn’t compel the same level of interconnection.  You have fewer bigger data centers instead.

There is nothing a network equipment vendor can do to promote “traditional” enterprise cloud computing either.  This arises from the transfer of applications that fit the cloud profile out from the data center, and how a network vendor could influence that is unclear to say the least.  For network vendors, in fact, the best way to promote cloud computing growth would be to get behind a cloud-centric (versus mobile-connection-centric) vision of IoT.  Network vendors don’t seem psychologically capable of doing that, so I think we’d have to put encouraging cloud computing as the driver for our DCI explosion off the table.

SDN as a driver, then?  Surely SDN and the cloud seem to go together, but the connection isn’t causal, as they say, in both directions.  If you have cloud you will absolutely have SDN, but you can’t promote cloud computing just by deploying SDN unless you use SDN to frame a more agile virtual-network model.

This is a place where Juniper could do something.  Nokia’s Nuage SDN architecture is in my view the best in the industry as a unified SDN-connection-layer model but Juniper’s Contrail could be the second-best.  Juniper even has controller federation capability to allow for domain interconnection.  The problem for both vendors seems to be that SDN used this way would transform networks away from traditional switching/routing, and so it could hasten the demise of legacy network revenues.  Would SDN make it up?  Perhaps it would change market leaders, but it’s hard to say why operators would adopt SDN on a large scale as a replacement for traditional L2/L2 if it were more expensive.

Which gets us to NFV.  NFV as a means of creating agile mobile infrastructure is the most credible of the evolutionary-NFV applications.  The challenge is whether a vendor who isn’t a mobile-infrastructure player can drive the deployment, especially given that Ericsson, Huawei, and Nokia all have NFV stories to tell.  Obviously, any major mobile-infrastructure NFV could create an explosion in the number of cloud data centers and drive DCI, but fortune would likely favor vendors who were actually driving the deployment.

The big thing about NFV data centers is the potential they’d be widely distributed and that they’d be natural focus points for service traffic.  That, as I said up front, is what would make them revolutionary in DCI terms.  The obvious question is whether the mobile-infrastructure players who could drive the change would benefit enough from it—data centers would house servers after all, and DCI replacement of traditional metro infrastructure would impact most of the big vendors by cutting switching/routing spending even faster (and further).

Ericsson and Cisco would seem to have an edge here because they have a server and data center strategy that would give them an opportunity to gain revenue from a shift to hosted, DCI-centric, metro infrastructure.  Ericsson has also been a strong player in professional services, and Cisco’s quarterly call this week showed they had a significant gain in professional services and that they are stressing data center (UCS and the cloud) infrastructure in their profit planning.  In fact, Cisco is making a point of saying they are shifting to a software and subscription revenue model even for security.

Conceptually, smaller players in an industry should have first-mover advantages, but in networking in general (and with Juniper in particular) the smaller players have been at least as resistant to change as the giants.  Juniper actually launched a software-centric strategy at a time when Cisco was in denial with respect to just about every network change—the recognized transformation and the cloud at least two years earlier than the industry at large and they had some product features (like separation of control and data plane) that could have given them an edge.  They just didn’t have the market mass or insight to make good on their own thought leadership.

That’s what will make the DCI opportunity difficult for any second-tier vendor.  The drivers of the opportunity are massive market shifts, shifts that will take positioning skill, product planning, and just plain guts to address.  Especially now, because the giants in the space have awoken to the same opportunity.

Netcracker’s AVP: Is This the Right Approach to SDN and NFV?

I had an opportunity this week to look over some material from Netcracker on their notion of a “digital service provider”, part of the documentation that relates to their Agile Virtualization Platform concept.  I also reviewed what was available on the technology and architecture of AVP.  I find the technology fascinating and the research and even terminology a little confusing.

Netcracker is an OSS/BSS player, of course, and as such they represent an interesting perspective on the transformation game.  My surveys say that the OSS/BSS vendors are obviously more engaged with the CIO, but they are also better-connected with the CFO and CEO and less connected with the COO and CTO.  That makes them almost the opposite of the network equipment vendors, and that alone means that their vision could be helpful in understanding what’s going on.  It’s also a chance to compare their views with what I’ve learned in my own contacts with operators, so let’s start at the top.

What exactly is a “Digital Service Provider” or “digital transformation” to use the TMF term?  Obviously not just a provider of digital services because “digital” in a strict sense is all there is these days.  I think what both concepts are getting at is that operators need to be able to create a wider variety of services more efficiently and quicker, which means that software and IT have to play a larger role—perhaps even the dominant role.  So the notion of AVP is to facilitate that.

What drives operators to want a digital transformation, says the material, is almost completely reactive.  Customers demand it, revenue gains depend on it, competition is doing it…these are indications of a market driven by outside forces rather than one trying to get ahead of the curve.  It’s not that operators are being dragged kicking and screaming into the transformation, perhaps, but they are surely not romping off on their own accord.

The barriers to achieving the transformation are equally interesting, or at least one point is.  Operators told Netcracker that technical factors like operations and integration was the most important inhibitor only about a quarter of the time.  Factors like staffing and skills and culture were far more important in the survey, and perhaps most interesting of all was the fact that only about 15% of operators seemed to be groping for solutions—the rest said they either had transformed or were well on their way.

I have to confess I have a bit of a problem with these points, for two reasons.  First, it would seem the survey shows that AVP is too late and doesn’t address the main issue set, which is skills and culture and not technology.  Second, it’s hard to see how Netcracker or anyone else would have much of a shot at solving market problems if 85% of the buyers don’t need a new approach.

My own surveys have yielded different responses.  The overwhelming majority of operators tell me that their driver for change is profit compression for connection-oriented services.  Only a small percentage (and almost all of them Tier Ones or MSPs) have an approach lined up, and an even smaller percentage says they’ve made substantial progress implementing one.  Thus, my own data seems to make Netcracker’s case for opportunity more strongly.

Interestingly, a different Netcracker document, the AVP brochure, frames it differently.  There the big problem is network resource and configuration, staff and culture second and third, with cost and operations processes and systems trailing.  This brochure also lays out three reasons for the “slow process” (recall that the other one says only 15% are lagging).  These are commercialization uncertainty, operational complexity, and organizational misalignment.  The last of these corresponds to the staff/culture point and I’d say that the other two are different perspectives on the “resources and configuration”, “cost”, and “operations processes and systems”.  I don’t think the inconsistencies here are fatal issues, but they do create a bit of confusion.

My surveys say that operators are generally committed to a two-prong approach.  In the near term, they believe that they have to make operations processes more efficient and agile, and they believe this has to be done by introducing a lot of software-driven automation.  In the longer term, they believe that they need to find revenue beyond connection-based services.

AVP is interesting from a technology perspective, perhaps even compelling.  Netcracker says it’s made up of composable microservices, and that sounds like the approach that I think is essential to making OSS/BSS “event-driven”.  Unfortunately, there aren’t enough details provided in any of the material for me to assess the approach or speculate on how complete it might be.  For the record, I requested a complete slide deck from them and I’ve not received one.

AVP is a Netcracker roadmap that has many of the characteristics (and probably all of the goals) that operators’ own architectures (AT&T and Verizon’s recent announcements for example) embody.  Their chart seems to show four primary elements—a professional services and knowledge-exchange practice, enhanced event-driven operations processes, a cloud deployment framework that would host both operations/management software and service elements, and the more-or-less expected SDN/NFV operations processes.  Netcracker does have its own E2E orchestration product, but the details on the modeling it uses and how it links to the rest of the operations/management framework aren’t in the online material.

If operators’ visions of a next-gen architecture are valid (and after all the operators should be the benchmark for validity) then the Netcracker model is likewise, but it does have some challenges when it’s presented by a vendor and without specific reference to support for operator models.  My surveys say that the big problems are the state of SDN/NFV and the political gap that’s inherent in the model itself.

Remember who OSS/BSS vendors call on?  The CIO is surely a critical player in the network of the future, and might even be the critical player in both the operations-efficiency and revenue-agility goals.  However, they aren’t the ones that have been pushing SDN and NFV—that’s been primarily the CTO gang.  Operators are generally of the view that if there is any such thing as a “digital transformation” of infrastructure, SDN and NFV are the roots of it.  Interestingly they are also of the view that the standards for SDN and NFV don’t cover the space needed to make a business case—meaning fulfill either the cost-control or revenue goal I’ve already cited.  So we have CIOs who have the potential to be the play-makers in both benefits, the OSS/BSS vendors (including Netcracker) who could engage them…and then across a gap the CTOs who are driving infrastructure change.

Properly framed, the Netcracker model could not only link the layers of humans and technology that have to be linked to produce a unified vision of the network of the future.  Properly framed, it could even harmonize SDN and NFV management from within, and then with operations management.  It’s easier for me to see this being done from the top, from the OSS/BSS side, than from the bottom.  But it’s not going to happen by itself.  Vendors, operators, and even bodies like the TMF at whose event Netcracker made its presentation, need to take the process a little more seriously.  Absent a unified, credible, approach from benefits to networks, operator budgets will just continue their slow decline under profit pressure.

I think OSS/BSS vendors have a great opportunity.  My research and modeling shows that an operations-centric evolution of network services could produce most of the gains in efficiency and agility that have been claimed for SDN and NFV.  Without, of course, the fork-lift change in infrastructure.  That story should be very appealing to operators and of course to the OSS/BSS types, but what seems to be happening is a kind of stratification in messaging and in project management.  Operations vendors sing to CIOs, network equipment vendors to CTOs, and nobody coordinates.  Maybe they all need to take an orchestration lesson themselves.

Service Assurance in the Network of the Future

One of the persistent questions with both SDN and NFV is how the service management or lifecycle management processes would work.  Any time that a network service requires cooperative behavior among functional elements, the presumption is that all the elements have to be functioning.  Even with standard services, meaning services over legacy networks, that can be a challenge.  It’s even more complicated with SDN and NFV.

Today’s networks are multi-tenant in nature, meaning that they share transmission/connection facilities to at least some degree.  Further, today’s networks are based on protocols that discover state and topology through adaptive exchanges, so routing is dynamic and it’s often not possible to know just where a particular user’s flows are going.  In most cases these days, the functional state of the network is determined by the adaptive processes—users “see” in some way the results of the status/topology exchanges and can determine if a connection has been lost.  Or they simply don’t see connectivity.

QoS is particularly fuzzy.  Unless you have a mechanism for measuring it end-to-end, there’s little chance that you can determine exactly what’s happening with respect to delay or packet loss.  Most operator guarantees of QoS are based on performance management through traffic engineering, and on capacity planning.  You design a network to offer users a given QoS, and you assume that if nothing is reporting a fault the users are getting it.

It’s tempting to look at this process as being incredibly disorderly, particularly when you contrast it with TDM services that because they were dedicating resources to the user could define the state and QoS with great precision at any point.  However, it’s not fair to SDN or NFV to expect that they will do better than the current state of management, particularly if users expect lower prices down the line, and operators lower opex.

The basic challenge posed by SDN in at least replicating current management knowledge is the fact that by design you’re saying that adaptive exchanges don’t determine routes, and in fact don’t happen.  If that’s the case, then there is no way of knowing what the state of the devices is unless the central controller or some other central management element knows the state.  Which, of course, means that the devices have to provide that state.  An SDN controller has to know network topology and has to know the state of the nodes and trunks under its control.  If this is true, then the controller can construct the same knowledge of overall network conditions that the network acquired through adaptive exchanges, and you could replicate management data and SLAs.

NFV creates a different set of problems.  With NFV the service depends in part on functions hosted on resource pools, and these are expected to offer at least some level of “automatic recovery” from faults, whether that happens by instantiating a replacement copy, moving something, reconnecting something, or scaling something under load.  This self-repair means that a fault might exist at the atomic function level but you don’t want to recover from it at the service level till whatever’s happening internally has been completed.

The self-remediation model of NFV has, in the NFV ISG and IMHO, led to a presumption that lifecycle management is the responsibility of the individual virtual network functions.  The functions contain a local instance of a VNF management process and this would presumably act as a bridge between the state of resources and their management and the state of the VNFs.  The problem of course is that the service consists of stuff other than that single VNF, and the state of the service still has to be composited.

The operators’ architectures for NFV and SDN deployment, now emerging in some detail, illustrate that operators are presuming that there is in the network (or at least in every domain) a centralized service assurance function.  This function collects management information from the real stuff, and also provides a means of correlating the data with service state and generating (in some way) the notifications of faults to the service processes.  It seems that this approach is going to dominate real SDN and NFV deployment, but the exact structure and features of service assurance aren’t fully described yet.

What seems to have emerged is that service assurance is a combination of three functional elements, aggregation of resource status, service correlation, and event generation.  In the first of these, management data is collected from the things that directly generate it, and in some cases at least the data is stored/cached.  An analytics process operates on this data to drive what are essentially two parallel processes—resource management and service management.  The resource management process is aimed at remedying the problems with physical elements like devices, servers, and trunks.  The service management process is designed to address SLA faults, and so it could just as easily replace a resource in a service as require it be fixed—in fact, that would be the normal course.

Service management in both SDN and NFV is analogous to end-to-end adaptive recovery as found in legacy networks.  You are going to “fix” a problem by reconfiguration of the service topology and not by actually repairing something.  If something is broken, that becomes a job for the resource management processes.

Resource management doesn’t appear to present any unusual challenges.  You have device state for everything, and so if something breaks you can fix it.  It’s service management that poses a problem because you have to know what to reconfigure and how to reconfigure it.

The easiest way to determine whether a service has faulted is to presume that something internal to the service is doing it, or that the service users are reporting it.  Again this may seem primitive but it’s not really a major shift from what happens now.  If this approach is taken, then the only requirement is that there be a problem analysis process to establish not what specifically has happened but what can be done to remedy the fault by reconfiguration.  The alternative is to assume that the service assurance function can identify the thing that’s broken and the services that are impacted.

Both these options seem to end up in the same place.  We need to have some way of knowing when a virtual function or SDN route has failed.  We need to have a recovery process that’s aimed at the replacement of that which has broken (and perhaps a dispatch task to send a tech to fix a real problem).  We need a notification process that gives the user a signal of conditions comparable to what they’d get in a legacy network service.  That frames the reality of service assurance.

I think that the failing of both SDN and NFV management to date lies in this requirement set.  How, if internal network behavior is not determined by adaptive exchange, does the service user find out about reachability and state?  If SDN replaces a switch/router network, who generates the management data that each device would normally exchange?  In NFV how do we reflect a virtual function failure when the user may not be directly connected to the function, but somewhere earlier/later in the service chain?

The big question, though, is one of service configuration and reconfiguration.  We cannot assume that every failed white box or server hosting a VNF can be recovered locally.  What happens when we have to change the configuration of the service enough that the elements outside the failing domain have to be changed to reflect the reconfiguration?  If we move a VNF to another data center, don’t we have to reconnect the WAN paths?  Same with SDN domains.  This is why the issue of recovery is more than one of event generation or standardization.  You have to be able to interpret faults, yes, but you also have to be able to direct the event to a point where knowledge of the service topology exists, so that automated processes can reconnect everything.  Where is that point?

In the service model, or it’s not anywhere.  Lifecycle management is really a form of DevOps, and in particular of the declarative model where the end-state of a service is maintained and compared with the current state.  This is why we need to focus quickly on how a service is modeled end-to-end and integrate that model with service assurance, for both legacy and “new” technologies.

Overlay/Underlay Networking and the Future of Services

Overlay networks have been a topic for this blog fairly often recently, but given that more operators (including, recently, Comcast) have come out in favor of them, I think it’s time to look at how overlay technology might impact network investment overall.  After all, if overlay networking becomes mainstream, something of that magnitude would have to impact what these networks get overlaid onto.

Overlay networks are virtual networks built by adding what’s essentially another connection layer on top of prevailing L2/L3 technology.  Unlike traditional “virtual networks” the overlay networks are invisible to the lower layers; devices down there treat them as traffic.  That could radically simplify the creation of virtual networks by eliminating the need to manage the connectivity in a “real” network device, but there are other impacts that could be even more important.  To understand them we should start at the top.

There are two basic models of overlay network—the nodal model and the mesh model.  In the nodal model, the overlay includes interior elements that perform the same functions that network nodes normally perform in real networks—switching/routing.  In the mesh model, there are no interior nodes to act as concentrators/distributors of traffic.  Instead each edge element is connected to all the others via some sort of tunnel or lower-level service.

The determinant in the “best” model will in most cases be simply the number of endpoints.  Both endpoints and nodes have “routing tables”, and as is the case with traditional routing, the tables don’t have to include every distinct endpoint address, but rather only the portion of an address needed to make a forwarding decision.  However, if the endpoints are meshed then the forwarding decision has to be made for each, which means the endpoint routing tables get large and expensive to process.

Interior node points can simplify the routing tables, particularly since the address space used in an overlay network need not in any way relate to the underlying network address space.  A geographic/hierarchical addressing scheme could be used to divide a network into areas, each of which might have a collecting/distributing node.  Node points can also be used to force traffic along certain paths by putting a node there, and that would be helpful for traffic management.

The notion of an overlay-based virtual network service clearly empowers endpoints, and if the optimization of nodal locations is based on sophisticated traffic and geography factors, it would also favor virtual-node deployments in the network interior.  Thus, overlay networks could directly promote (or be promoted by) NFV.  One of the two “revolutionary elements” of future networking is this a player here.

So is the other.  If tunnels are the goal, then SDN is a logical way to fulfill that goal.  The advantage SDN offers is that the forwarding chain created through OpenFlow by central command could pass wherever it’s best assigned, and each flow supported by such a chain is truly a ship in the night relative to others in terms of addressability.  If central management can provide proper traffic planning and thus QoS, then all the SDN flows are pretty darn independent.

The big question for SDN has always been domain federation.  We know that SDN controllers work, but we can be pretty sure that a single enormous controller could never hope to control a global network.  Instead we have to be able to meld SDN domains, to provide a means for those forwarded flows to cross a domain boundary without being elevated and reconstituted.  If that capability existed, it would make SDN a better platform for overlay networks than even Ethernet with all its enhancements.

The nature of the overlay process and the nature of the underlayment combine to create a whole series of potential service models.  SD-WAN, for example, is an edge-steered tunnel process that often provides multiple parallel connection options for some or even all of the service points.  Virtual switching (vSwitch) provides what’s normally an Ethernet-like overlay on top of an Ethernet underlayment, but still separates the connection plane from the transport process, which is why it’s a good multi-tenant approach for the cloud.  It’s fair to say that there is neither a need to standardize on a single overlay protocol or architecture, nor even a value to doing so.  If service-specific overlay competition arises and enriches the market, so much the better.

Where there obviously is a need for some logic and order is in the underlayment.  Here, we can define some basic truths that would have a major impact on the efficiency of traffic management and operations.

The first point is that the more overlays you have the more important it is to control traffic and availability below the overlay.  You don’t want to recover from a million service faults when one common trunk/tunnel has failed.  This is why the notion of virtual wires is so important, though I want to stress that any of the three major connection models (LINE, LAN, TREE) would be fine as a tunnel model.  The point is that you want all possible management directed here.  This is where agile optics, SDN pipes, and so forth, would live, and where augmentation of current network infrastructure to be more overlay-efficient could be very helpful.

The second point, which I hinted at above, is that you need to define domain gateways that can carry the overlays among domains without forcing you to terminate and reestablish the overlays, meaning host a bunch of nodes at the boundary.  Ideally, the same overlay connection models should be valid for all the interconnected domains so a single process could define all the underlayment pathways.  As I noted earlier, this means domain federation has to be provided no matter what technology you use for the underlayment.

The third point is that the underlay network has to expose QoS or class of service capabilities as options to the overlay.  You can’t create QoS or manage traffic in an overlay, so you have to be able to communicate between the overlay and underlay with respect to the SLA you need, and you have to then enforce it below.

The final point is universality and evolution.  The overlay/underlay relationship should never depend on technology/implementation of either layer.  The old OSI model was right; the layers have to see each other only as a set of exposed services.  In modern terms, that means that both layers are intent models with regard to the other, and the overlay is an intent model to its user.  The evolution point means that it’s important to map network capabilities in the overlay to legacy underlayment implementations, because otherwise you probably won’t get the scope of implementation you need.

You might wonder at this point why, if overlay networking is so powerful a concept, operators haven’t fallen over themselves to implement it.  One reason, I think, is that the concept of overlay networks is explicitly an OTT concept.  It establishes the notion of a network service in a different way, a way that could admit new competitors.  If this is the primary reason, though, it may be losing steam because SD-WAN technology is already creating OTT competition without any formal overlay/underlay structures.  The fact that anyone can do an overlay means nobody can really suppress the concept.  If it’s good, powerful, then it will catch on.

Can We Apply the Lessons of NFV to the Emerging IoT Opportunity?

I blogged yesterday about the OPNFV project for Event Streams and the need to take a broad view of event-driven software as a precursor to exploring the best way to standardize event coding and exchange.  It occurred to me that we’re facing the same sort of problem with IoT, focusing on things that would matter more if we had a broader conception of the top-down requirements of the space.  Let me use the same method to examine IoT as I used for the Event Streams announcement—examples.

Let’s suppose that I have a city that’s been equipped with those nice new IoT sensors, directly on the Internet using some sort of cellular or microcellular technology.  It’s 4:40 PM and I left work early to get a jump on traffic.  So did a half-million others.  I decide that I’m going to use my nice IoT app to find me a path to home that’s off the beaten path, so to speak.  I activate my app, and what happens?

What I’m hoping for, remember, is a route to my destination that’s not already crowded with others, or will shortly become crowded.  That information, the IoT advocates would say, is exactly what IoT can provide me.  But how, exactly?  If the sensors count cars, I could assume that car counts would be a measure of traffic, but a car counter would count not cars but cars passing it.  If the traffic is at a standstill, how many cars are passing?  Zero, so I have a bad route choice.

However, it may not be that bad because I may never see the data in the first place.  Remember, I have a half-million sharing the road with me, and most of them probably want to get home early too.  So what are they doing?  Answer; hitting their IoT app to find a route.  If that app is querying those sensors, then I’ve got a half-million apps vying for access to a device that might be the size of a paperback book.  We have websites taken down by DDoS attacks of that size or smaller, and those sites are supported by big pipes and powerful servers.  My little sensor is going to weather the storm?  Not likely.

But even if I got through, would I understand the data?  I could presume that the sensors would be based on a basic HTTP exchange like the one that would fetch a web page.  Certainly I could get something like an XML or JSON payload delivered that way, but what’s the format?  Does the car sensor give me the number of cars in the last second, or minute, or hour, or what?  Interpreting the data starts with understanding what data is actually being presented after all.

But suppose somehow all this worked out.  I’ve made the first turn on my route, and so have my half-million road-sharing companions.  Every second, conditions change.  How do I know about the changes?  Does my app have to keep running the same decision process over and over, or does the sensor somehow signal me?  If the latter is the case, how would a specific sensor know 1) who I was, 2) what I wanted and 3) what I had to know about to get what I wanted?

OK, you say, this is stupid.  What I’d really do is go to my “iIoT service” on my iPhone, where Apple would cheerfully absorb all this sensor data and give me answers without all these risks.  Well, OK, but that raises the question of why a city-full of those IoT sensors got deployed when they’re nothing but a resource for Apple to exploit.  Did Apple pay for them?  Ask that to Tim Cook next shareholder call.  If Apple is just accessing them on “the Internet” because after all this is IoT, then is Apple and others expecting to pay for the access?  If not, why did they ever get deployed.  If so, how does that cheap little sensor know who Apple is versus some shameless exploiter of their data?

Maybe, you speculate, we solve some of our problems with the device that started them, our phone.  Instead of counting cars, we sense the phones that are nearby.  Now we know the difference between an empty street and gridlock.  Great.  But now we have thousands of low-lifes tracking women and children.  Prevent that with access controls and policies, you say?  OK, but remember this is a cheap little sensor that you’ve already had to give the horsepower of a superserver to.  Now we have to analyze policies and detect bad intent?

Or how about this.  A bunch of Black Hats says, gee we could have fun by deploying a couple hundred “sensors” of our own, giving false data, and getting a bad traffic situation to become gridlocked enough that even emergency fire and rescue can’t get through.  Or we’re a gang of jewel thieves with an IoT getaway strategy.  How do these false-flag sensors get detected?

Sometimes insight comes in small steps.  For example, the Event Stream project talks about Agents that get events, Collectors that store them in a database.  This kind of structure is logical to keep primary event generators from being swamped by all the processes that need to know the state of resources.  Isn’t it logical to assume that this same sort of decoupling would be done in IoT?  The project seeks to harmonize the structure of event records; isn’t it logical to assume that sensor outputs would similarly have to be harmonized?  Resource information in NFV and sensor data in IoT both require what are essentially highly variable and disorderly sources to be loosely coupled with highly variable and disorderly process sets that interpret the stuff.  The issues raised by each would then be comparable.

Once we presume that we need to have common coding for event analysis and some sort of database buffering to decouple the sensors in IoT from the processes, we can resolve most of these other questions because we don’t have a sensor network problem anymore, we have a database problem, and we know how to address all the concerns raised above if we presume that context.  But just as Event Streams have to trigger an awareness of the need for contextual event processing, so the existence of a database where sensor data is collected and from which it’s distributed begs the question of what apps do and how public policy goals are maintained.

We’re not there yet with IoT.  Even IT vendors who make IoT announcements are still kissing low-power protocols and transmitters and not worrying about any of the real issues.  And these are the vendors who already sell the databases and analytics products and clouds of servers, who have the technology to present a realistic model.

Way back in 2013, in CloudNFV, I outlined the set of issues that NFV would have to address, and everyone who was involved in that process knows how hard I tried to convince both vendors and operators that key issues were being ignored.  It’s now 2016, and we’re now just starting to address them.  Could we have today a complete NFV implementation to deploy if we’d accepted those issues in 2013 when they were first raised?  My point isn’t to aggrandize my own judgment; plenty of others said much the same thing.  Many are saying it now about IoT.  Will we insist on following the same myopic path, overlooking the same kind of issues, for that technology?  If so, we’re throwing out a lot of opportunity.

Is the New OPNFV Event Streams Project the Start of the Right Management Model?

One of those who comment regularly on my blog brought a news item to my attention.  The OPNFV project has a new activity, introduced by AT&T, called “Event Streams” and defined HERE.  The purpose of the project is to create a standard format for sending event data from the Service Assurance component of NFV to the management process for lifecycle management.  I’ve been very critical of NFV management, so the question now is whether Event Streams will address my concerns.  The short answer is “possibly, partly.”

The notion of events and event processing goes way back.  All protocol handlers treat messages as events, for example, and you can argue that even transaction processing is about “events” that represent things like bank deposits or inventory changes.  At the software level, the notion of an “event” is the basis for one form of exchanging information between processes, something sometimes called a “trigger” process.  The other popular form is called a “polled” process because in that form a software element isn’t signaled something is happening, it checks to see if it is.

Many of the traditional management and operations activities of networks have been more polled than triggered because provisioning was considered to be a linear process.  As networks got more complicated, more and more experts started talking about “event-driven” operations, meaning something that was triggered by conditions rather than written as a flow that checked on stuff.  So Event Streams could be a step in that direction.

A step far enough?  There are actually three things you need to make event-driven management work.  One, obviously, is the events.  The second is the concept of state and the third is a way to address the natural hierarchy of the service itself.  If we can find all those things in NFV, we can be event-driven.  Events we now have, but what about the rest?

Let’s start with “state”.  State is an indication of context.  Suppose you and I are conversing, and I’m asking you questions that you answer.  If there’s a delay or if you don’t hear me, you might miss a question and I might ask the next one.  Your answer, correct in the context you had, is now incorrect.  But if you and I each have a recognized “state” like “Asking”, “ConfirmHearing”, and “Answering” then we can synchronize through difficulties.

In network operations and management, state defines where we are in a lifecycle.  We might be “Ordered”, or “Activating” or “Operating”, and events mean different things in each state.  If I get an “Activate” in the “Ordered” state, it’s the trigger for the normal next step of deployment.  If I get one in the “Operating” state, it’s an indication of a lack of synchronicity between the OSS/BSS and the NFV processes.  It is, that is, if I have a state defined.

Let’s look now at a simple “service” consisting of a “VPN” component and a series of “Access” components.  The service won’t work if all the components aren’t working, so we could say that the service is in the “Operating” state when all the components are.  Logically, what should happen then is that when all the components are in the “Ordered” state, we’d send an “Activate” to the top-level “Service object”, and it would in turn generate an event to the subordinates to “Activate”.  When each had reported it was “Operating”, the service would enter the “Operating” state and generate an event to the OSS/BSS.

So what we have here is a whole series of event-driven elements, contextualized (state and relationship) by some sort of object model that defines how stuff is related.  It’s not just one state/event process (what software nerds call “finite-state machines”) but a whole collection of such processes, event-coupled so that the behaviors are synchronized.

This concept is incredibly important, but it’s not always obvious that’s the case.  But here’s an example.  Suppose that a single VNF inside an Access element fails and is going to re-deploy.  That access element would have to enter a new state, let’s call it “Recovering” and so the VNF that failed would have to signal with an event.  Does that access element go non-operational immediately or does it give the VNF some time?  Does it report even the recovery attempt to the service level via an event, or does it wait till it determines that the failure can’t be remedied?  All of this stuff would normally be defined in state/event tables for each service element.  In the real world of SDN and NFV, every VNF deployed and every set of connections could be an element, so the model we’re talking about could be multiple layers deep.

This has implications for building services.  If you have a three- or four-layer service model you’re building, every element in the model has to be able to communicate with the stuff above and below it through events, which means that they have to understand the same events and have to be able to respond as expected.  So what we really have to know about service elements in SDN or NFV is how their state/event processing works.

Obviously we don’t know that today, because we didn’t have even a consistent model of event exchange, which Event Streams would define.  But the project doesn’t define states, nor does it define state/event tables or standardized responses.  Without those definitions an architect couldn’t assemble a service from pieces because they couldn’t be sure that all the pieces would talk the same event language or interpret the context of lifecycles the same way.

The net of this is that Event Streams are enormously important to NFV, but they’re a necessary condition and not a sufficient condition.  We still don’t have the right framework for service modeling, a framework in which every functional component of a service is represented by a model “object” that stores its state and the table that relates to event-handling in every possible state.

The question is whether we need that, or whether we could make VNF Managers perform the function.  Could we send them events?  There’s no current mandate that a VNFM process events at all, much less process some standard set of events.  If a VNFM contains state/event knowledge, then the “place” of the associated VNF in a service would have to be consistent or the state/event interpretation wouldn’t be right.  That means that our VNF inside an access element might not be portable to another access element because that element wanted to report “Recovering” or “Faulting” under different conditions.  IMHO, this stuff has to be in the model, not in the software, or the software won’t be truly composable.

I’m not trying to minimize the value of Event Streams here.  It’s very important, providing that it provokes a complete discussion of state/event handling in network operations.  If it doesn’t, then it’s going to lead to a dead end.

Will Operators Avoid the Same Mistakes they Say Vendors Make in Transformation?

Operators want open source software and they want OCP hardware, or so they say.  It would seem that the trend overall is to stamp out vendors, but of course neither of these things really stamp out vendor relationships.  They might have an impact on the buyer/seller relationship, though, and on the way that operators buy and build networks.  If the model crosses over into the enterprise space, which is very likely, then it could have a profound impact on the market overall.

If there are multiple stages of grief, operators have experienced their own flavor in their relationship with their vendors.  Twenty years ago, operators tended to favor big vendors who could provide complete solutions because it eliminated integration issues and finger-pointing.  Ten years ago, operators were starting to feel they were being taken advantage of, and many started creating “procurement zones” to compartmentalize vendors and prevent one from owning the whole network.  Now, as I said, they’re trying to do what’s really commodity or vendor-independent procurement.  That’s a pretty dramatic evolution, and one driven (say almost all operators) by growing concern that vendors aren’t supporting operator goals, but their own.

What created the divergence of goals can be charted.  Up until 2001, technology spending by both network operators and enterprises was driven by a cyclical transformation of opportunity that, roughly every fifteen years, introduced a new set of benefits that drove spending growth.  New opportunities will normally demand new features from technology products, new differentiators appear, and operators have new revenue to offset costs.  Thus, they tend to be looking for ways to realize that new opportunity quickly, and vendors are their friends.

In 2001, the cycle stalled and has not restarted.  If you go back to both surveys and articles you can see that this inaugurated a new kind of tech market, where the only thing that mattered was reducing costs.  That’s a logical response to a static benefit environment—you can improve financials only by realizing those benefits more cheaply.  But cost management usually starts with price pressure on purchases, and that’s what launches an “us-versus-them” mindset among operators and vendors.

The old saying “we have met the enemy and they are us” might then apply now.  Both open source software and “COTS” hardware are very different because they are commodities.  They present operators with a new problem, which is that vendor support for innovation depends on profit from that support.  Absent a differentiable opportunity, nobody will support transformation unless operators pay for professional services.  Ericsson probably foresaw this shift, and as a result focused more on professional services in its own business model.  While realizing the benefits of a shift to commodity network elements has been slower to develop than Ericsson may have hoped, it’s still clearly underway.

But OK, operators want open-source and COTS.  Will they be able to get it, and if so who will win out?

If you look at transformation as operators do, you see that their primary goal is what could be called “top-end engagement”.  They have benefits—opex and capex reduction and revenue augmentation—and they need to engage these things.  Traditional technology specifications and standards, starting as they normally do at the bottom, don’t even get close to the business layer where all these benefits have to be realized.  That’s why the operator approaches seem to focus “above” the standards we know.

So the most important point here is to somehow get architectures to tie back to the benefits that, while they were expanding, drove industry innovation.  A friend of mine, John Reilly, did nice book (available from Amazon in hard copy or Kindle form) called “The Value Fabric: A Guide to Doing Business in the Digital World” that might be helpful in framing business benefits for multiple stakeholders.  It’s a way of describing how a series of “digital bridges” can establish the relationship among stakeholders in a complex market.  It’s based on a TMF notion, which means it would be applicable to operator ecosystems, and vendors could explore the notion of providing digital bridges to link the stakeholders in an operator ecosystem together.  Advertising, content providers, operator partners, hosting providers, and even software providers who license their stuff, are examples of stakeholders that could be linked into a value fabric.

But all of this good stuff is likely to be software, and however valuable software is in the end, it’s not going to cost as much as a couple hundred thousand routers.  In fact, it’s reducing that hardware cost that’s the goal for operators.  Network vendors are not going to embrace being cost-reduced till they vanish to a point.  And if COTS servers and open-source software are the vehicle for diminishing network vendor influence, who’s incented to take their place in the innovation game?  In the architectures that operators are promoting, no major vendor is a long-term winner.

I’m not saying this is bad, or even that vendors don’t deserve some angst.  I’ve certainly told enough of them where things are heading, and tried to get them to address the issues while there was still time, to have lost sympathy.  What I am concerned about is how the industry progresses.  Is networking to become purely a game of advertising/marketing at the operator level, and of cheap fabrication of white boxes at the hardware level?  If so, are the operators now prepared to drive the bus by investing some of their savings in technologists who can do the deep thinking that will be needed even more in the future?

The network of the future is going somewhere other than the simple substitution game that operators envision.  You can see, for example, that personal agents are going to transform networks.  So will IoT, when we finally accept it’s not about making every sensor into a fifty-buck-a-month LTE subscriber.  The limitation in vendor SDN/NFV architectures is that they try to conserve the legacy structure that progress demands we sacrifice.  The limitation in operator architectures is that they constrain their vision of services to the current network, and so forego the longer-term revenue-driven benefits that have funded all our innovations so far.

What’s above commoditizing services?  Revolutionary services, we would hope.  So let’s see some operators step up and show more determination to innovate than the vendors they’re spurning.  Let’s find values to build a fabric around.