How to Make Services Agile

Everyone in NFV is jumping on “service agility” as the key benefit, and I don’t disagree that the value NFV could bring to launching new services could be the best justification for deploying it.  Wishing won’t make it so, though, and I don’t think we’ve had enough dialog on how one makes a service “agile”.  So…I propose to start one here.

The first point about service agility is that it’s going to be a lot like software agility, and in particular what I’ll call “functional” or “app” programming.  Traditional software is written by programmers who write specific instructions.  Modular software, a trend that goes back over 40 years, initiated the concept of reusable “modules” that were introduced into a program to perform a regularly used function.  This was enhanced about 20 years ago by the notion that a software function could be visualized as a “service” to be consumed, and that was the dawn of the Service-Oriented Architecture or SOA.  Today’s web-service and PaaS models (and many SaaS models) are another variant on this.

In all these approaches, we get back to the notion of abstraction.  A programmer consumes a service without knowing anything about it other than the API (application program interface, meaning the inputs and outputs) and functionality.  The service is a black box, and the fact that all the details are hidden from the programmer means that these services make it easy to do very complicated things.

To me, this is a critical point because it exposes the biggest truth about service creation in an NFV sense.  That truth is that there are two different “creations” going on.  One is the creation of the services, which if we follow software trends are assembled by assembling lower-level services.  The other is the generation of those lower-level services/abstractions from whatever primitives we have available.  I’ve categorized this in role terms as “service architect” and “resource architect”.

An agile service, I think, is created first by identifying or building those lower-level services/abstractions from the resources upward.  A VPN or a VLAN is an example of an abstraction, but so is “DNS” or “firewall”, or even “HSS”.  Once we have an inventory of this good stuff, we can let service architects assemble them into the cooperative functional systems that we call “services”.

There are a lot of possible slip-ups that can happen here, though.  I’ll illustrate one.  Suppose I have a need to deploy virtual CPE but I can’t do it everywhere I offer service, so I have “real” CPE as well.  I have two options.   One to define a low-level service called “CPE” and let that service sort out the difference between virtual and real.  The other is to expose a “virtualCPE” and “realCPE” service.  Let’s see how that plays out.

If I have a CPE service, then the decision of whether to use cloud principles to host and connect software elements is invisible to the service architect.  The service definition includes only CPE services, and they don’t care because the underlying service logic will sort out the provisioning.  On the other hand, if I have virtualCPE and realCPE, the service definition has to know which to use, which means that the details of infrastructure differences by geography are pushed upward to the service level.  That means a much more complicated process of service creation, which I contend means much less agile.

But even my virtualCPE and realCPE abstractions have value over the alternative, which is to define the services all the way from top to bottom, to the deployment level.  If I have a pair of abstractions I will have to reflect the decision on which to use into the service orchestration process, but the details of how it’s done will stay hidden.  I can provision different CPE, deploy on different virtual platforms, without changing the service.  That means that changes in real devices or virtual infrastructure are hidden from the service orchestration process.  If I don’t have those abstractions then any change in what I need to do to deploy (other than simple parameter changes) would have to be propagated up to the service definition, which means the change would change all my service order templates.  No agility there, folks.

The point here is that an agile service has to be agile through the whole lifecycle or it’s not really agile at all.  I cannot achieve that universality without following the same principles that software architects have learned to follow in today’s service-driven world.

If you map this to the current ETSI work and to other NFV activities you see that it means that things like OpenStack are not enough.  They can (and will) be used to decode what to do to deploy “virtualCPE”, but I still have to decompose my service into requests for realCPE and virtualCPE somewhere.  Further, if I decide to get smart and abstract two things that are functionally identical into “CPE”, I have created a level of decomposition that’s outside what OpenStack is designed to do.  Could or should I shoehorn or conflate the functionality?  I think not.

Network resources, technologies, and vendors create a pool of hardware and software—functionality-in-waiting we might say.  An operator might elect to harness some of this functionality for use by services.  If they don’t then service definitions will have to dive down to hardware detail, and that creates a service structure that will be a long way from agile, and will also be exceptionally “brittle”, meaning subject to changes based on nitty-gritty implementation details below.

Do we want to have every change in infrastructure obsolete service definitions that reference that infrastructure?  Do we want every service created to do direct provisioning of resources, probably in different ways with different consequences in terms of management properties?  Do we want horizontal scaling or failover to be mediated independently by every service that uses it?  Well, maybe some people do but if that’s the case they’ve kissed service agility goodbye.

And likely operations efficiency as well.  Abstraction of the type I’ve described here also creates consistency, which is the partner of efficiency.  If all “CPE” is deployed and managed based on a common definition, then it’s going to be a lot easier to manage the process, and a lot cheaper.

Next time you talk with a purported NFV provider, ask them to show you the service modeling process from top to bottom.  That exercise will tell you whether the vendor has really thought through NFV and can deliver on the benefits NFV promises.  If they can’t do it, or if their model doesn’t abstract enough, then they’re a science project and not an NFV story.

Does the Oracle/Intel Demonstration Move the NFV Ball?

Oracle has started demoing their new NFV/orchestration stuff, and anything Oracle does in the space is important because the company represents a slightly different constituency in the NFV vendor spectrum.  They’re definitely not a network equipment player so NFV isn’t a risk to their core business.  They do sell servers, but that’s not their primary focus.  They are a software player and with their NFV announcement earlier they became the biggest “official” software company in NFV.

The big focus of the Oracle announcement was a partnership with Intel on the Open Network Platform initiative.  This is aimed at expanding what can be done with NFV by facilitating the hosting of functions on hardware with the right features.  The demo shows that you can create “sub-pools” within NFVI that have memory, CPU, or other hardware features that certain types of VNF would need.  Oracle’s orchestration software then assigns the VNFs to the right pools to insure that everything is optimally matched with hardware.

There’s no question that you’d like to have as much flexibility as possible running functions as VNFs instead of as physical appliances, but I’m not sure that the impact is as great as Oracle might like everyone to believe.  There are a number of reasons, ranging from tactical to strategic.

Reason one is that this is hardly an exclusive relationship between Oracle and Intel.  Intel’s ONP is available to any vendor, and Intel’s Wind River open-source Titanium supports it.  HP, a rival with Oracle for NFV traction, is a (or THE) Intel partner with ONP, in fact.  I doubt that any Intel-server-based NFV implementation would not use ONP.

Reason two is that the NFV ISG has called for VNF steering to servers based on a combination of the VNFs’ needs and servers’ capabilities for ages.  It’s part of the ETSI spec, and that means that implementations of MANO that want to conform to the spec have to provide for the steering.

Reason three is that right now the big issue with NFV is likely to be getting started, and in early NFV deployment resource pools will not be large.  Subdividing them extensively enough to require VNF hosting be steered to specialized sub-pools is likely to reduce resource efficiency.  Operators I’ve talked to suggest that early on they would probably elect to deploy servers that had all the features that any significant VNF population needed rather than specialize, just to insure good resource pool efficiency.

Then we have the big strategic reason.  What kind of VNF is going to need specialized hardware for performance?  I’d contend that this would likely be things like big virtual routers, pieces of EPC or IMS or CDN.  These functions are really not “VNFs” in the traditional sense because they are persistent.  I commented in an earlier blog that the more a software function was likely to require high performance, higher-cost hardware, the less likely it was to be dynamic.  You don’t spin up a multi-gigabit virtual router for an hour’s excursion, you plant it somewhere and leave it there unless something breaks.  That makes this kind of application more like cloud computing than like NFV.

I asked an operator recently if they believed that they would host EPC, virtual edge routers, virtual core switches, etc. on generalized server pools and they said they would not.  The operator thought that these critical elements would be “placed” rather than orchestrated, which again suggests a more cloud-like than NFV-like approach.  Given that, it may not matter much whether you can “orchestrate” these elements.

Then there’s the opex efficiency point, which I think is a question of how many such situations arise.  Every user doesn’t get their own IMS/EPC/CDN, they share a common one, generally per metro.  It’s not clear to me given that limited deployment that any operations efficiencies generated would be confined to a small number of functional components, how much you could drive the NFV business case on OPN alone.

And service agility?  Valuable services that operators want to deploy quickly are almost certain to be personalized services.  What exactly can we do as part of a truly dynamic service that is first personalized for a user and second, so demanding of server resources that we have to specialize what we host it on?  Even for the business market I think this is a doubtful situation, and for the consumer market that makes up most of where operators are now losing money, there is virtually no chance that supersized resources would be used because they couldn’t be cost-justified.

Don’t get me wrong; OPN is important.  It’s just not transformative in an NFV sense.  I’ve shared my view of the network of the future with all of you who read my blog.  It’s an agile optical base, cloud data centers at the top, and a bunch of service- and user-specific hosted virtual networks in between.  These networks will have high-performance elements to be sure, elements that need OPN.  They’ll be multi-tenant, though, and not the sort of thing that NFV has to spin up and tear down.  They’ll probably move more than real routers do, but not often enough to make orchestration and pool selection a big factor.

I am watching Oracle’s NFV progress eagerly because I do think they could take a giant step forward with NFV and drive the market because they do have such enormous credibility and potential.  I just don’t think that this is such a step.  “Ford announces automobiles with engines!” isn’t really all that dramatic, and IMHO ONP or ONP-like features are table stakes.  What I’m looking for from Oracle is something forward-looking, not retrospective.

In their recent NFV announcement, Oracle presented the most OSS/BSS-centric vision for NFV that any major vendor has articulated.  There is absolutely no question that every single NFV mission or service must have, as its strongest underpinning, a way of achieving exceptionally good operations efficiency.  Virtualization increases complexity and complexity normally increases management costs.  We need to reduce them, in every case, or capex reductions and service agility benefits won’t matter because they’ll either be offset or impossible to achieve.  Oracle’s biggest contribution to NFV would be to articulate the details of OSS/BSS integration.  That would truly be a revolutionary change.

As an industry, I think we have a tendency to conflate everything that’s even related to a hot media topic into that topic.  Cloud computing is based on virtualization of servers yet every virtualized server isn’t cloud computing.  Every hosted function isn’t NFV.  I think that NFV principles and even NFV software could play a role in all public cloud services and carrier virtualization of even persistent functions, but I also think we have to understand that these kinds of things are on one side of the requirements spectrum and things like service chaining are on the other.  I’d like to see focus where it belongs, which is where it can nail down the unique NFV benefits.

Sub-Service Management as a Long-Term SDN/NFV Strategy

For my last topic in the exploration of operator lessons from early SDN/NFV activity, I want to pursue one of the favorite trite topics of vendors; “customer experience”.  I watched a Cisco video on the topic from the New IP conference, and while it didn’t IMHO demonstrate much insight, it does illustrate the truth that customer experience matters.  I just wish people did more than pay lip service to it.

Customer experience management in a service sense is a superset of what used to be called SLA management, and it reflects the fact that most information delivered these days isn’t subject to a formal SLA at all.  What we have instead is this fuzzy and elastic conception of quality of experience, which is the classic “I-know-it-when-I-see-it” concept.  Obviously you can’t manage for subjectivism, so we need to put some boundaries on the notion and also frame concepts to manage what we find.

QoE is different from SLAs not only in that it’s usually not based on an enforceable contract (which, if it were, would transition us to SLA management) but in that it’s more statistical.  People typically manage for SLA and engineer for QoE.  Most practical customer experience management approaches are based on analytics, and the goal is to sustain operation in a statistical zone where customers are unlikely to abandon their operator because they’re unhappy.  That’s a very soft concept, depending on a bunch of factors that include whether the customer was upset before the latest issue and whether the customer sees a practical alternative that can be easily realized.

Sprint and T-Mobile have launched campaigns that illustrate the QoE challenge.  If I believe that some significant percentage of my competitors’ customers (and likely my own as well) are dissatisfied with service but unwilling to go through the financial and procedural hassle of changing, they I’ll make it easy for competitors’ customers to change—even give them an incentive.  Competition is the goad behind customer experience management programs; if your competitor can induce churn then you have a problem despite absolute measurements.

Operators recognize that services like Carrier Ethernet are usually based on recognizable resource commitments, which means that you can monitor the resources associated with the service and not just guess in a probabilistic sense what experience a user has based on gross resource behavior.  In consumer services there are no fixed commitments, and so you have to do things differently and manage the pool.

NFV, according to operators, has collided with both practice sets.  For business services, dynamic resource assignment and automated operations are great, but they introduce new variables into the picture.  With business services, NFV is mostly about deriving service state from virtual resource state.  That’s a problem that can be solved fairly easily if you look at it correctly.  The consumer problem is different because we have no specific virtual resource state to derive from.

What operators would like to avoid is “whack-a-mole” management where they diddle with resource pool behavior to achieve the smallest number of complaints.  That sort of thing might work if you could converge on your optimum answer quickly, and if resource state was then stable enough that you didn’t have to keep revisiting your numbers.  Neither is likely true.

One possible answer that operators are looking at, but have not yet been able to validate in a full trial, is correlating service and resource analytics.  If you have a quirky blip on your resource analytics dashboard, you could presume with fairly low risk of error that service issues at that time were correlated with the blip.  Thus, you could work to remedy the service problems by remediation of the resource blip, even if you didn’t understand full causal relationships.  The barrier to this mechanism is not only that it’s not easy to test the correlations today, it’s not even easy to gather the service-side analytics.  Measurement of QoE, you’ll recall from earlier comments, is measuring “windy”.  It’s in the eye of the beholder.

Most of the operators I’ve talked with are now of the view that NFV management, SDN management, and probably management overall, is going to be driven by the same notions (QoE substitutes for SLA, multi-tenancy substitutes for dedicated, virtualized substitutes for real) into the same path and that they need a new approach.  A few of the “literati” are now looking at what I’ll call “sub-service management”.

Sub-service management says that a “service” is a collection of logical functions/behaviors that are individually set to at least a loose performance standard.  The responsibility of service automation is to get each functional element to conform to its expectations.  Each element is also responsible for contributing a “management view” in the direction of the user, perhaps in the simple form of a gauge that shows red-to-green transitions reflecting non-conforming to beating the specifications.

If something goes wrong with a sub-service function we launch automated processes to remediate, and at the same time we look at the service through the user-side management viewer to see if something visible has gone bad.  If so, we treat this as a QoE issue.  We don’t try to associate user service processes with resource remediation processes.

The insight of sub-service management is that if you aren’t going to have fixed, dedicated, resource-to-service connections with clear fault transmission from resource to service, then you can’t work backwards from service faults to find resource problems.  The correlation may be barely possible for business services but it’s not possible for consumer services because the costs won’t scale.

There are barriers to sub-service management, though.  One is that we don’t have a clear notion of a service as a combination of functional atoms.  ETSI conflates low- and high-level structuring of resources and so makes it difficult to take a service like “content delivery” and pick out functional pieces that are then composed to create services.  And because only functionality can ever be meaningful to a service user, that means it’s hard to present a user management view.  Another is that there is no real notion of “derived operations” or the generation of high-level management state through an expression-based set of lower-level resource states.

I don’t think that it will be difficult to address any of these points, and I think the only reason why we’ve not done that so far is that we’ve focused on testing the mechanisms of NFV rather than testing the benefit realization.  As I’ve said in earlier blogs, the focus of PoCs and trials is now shifting and we’re looking at the right areas.  It’s just a matter of who will come up with an elegant solution first.

What Operators Think about Service-Event versus Infrastructure-Event Automation

I’m continuing to work through information I’ve been getting from operators worldwide on the lessons they’re learning from SDN and NFV trials and PoCs.  The focus of today is the relationship between OSS/BSS and these new technologies.  Despite the fact that operators say they are still not satisfied with the level of operations integration into early trials, they are getting some useful information.

One interesting point clear from the first is that operators see two different broad OSS/BSS-to-NFV (and SDN) relationships emerging.  In the first, the operations systems are primarily handling what we would call service-level activities.  The OSS/BSS has to accept orders, initiate deployment, and field changes in network state that would have an impact on service state.  In the second, we see OSS/BSS actually getting involved in lower-level provisioning and fault management.

There doesn’t seem to be a strong correlation between which model an operator thinks will win out and the size or location of the operator.  There’s even considerable debate in larger operators as to which is best, though everyone said they had currently adopted one approach and nearly everyone thought they’d stay with it for the next three years.  All this suggests to me that the current operations model evolved into existence based on tactical moves, rather than having been planned and deployed.

There is a loose correlation between which model an operator selects and the extent to which that operator sees seismic changes in operations as being good and necessary.  In particular, I find that operators who have pure service-level OSS/BSS models today are most likely to be concerned about making their system more event-driven.  Three-quarters of all the operators in the service-based-operations area think that’s necessary.  Interestingly, those that do not seem to be following a “Cisco model” of SDN and NFV, where functional APIs and policy management regulate infrastructure.  That suggests that Cisco’s approach is working, both in terms of setting market expectations and in fulfilling early needs.

The issue of making operations event-driven seems to be the technical step that epitomizes the whole “virtual-infrastructure transition”.  Everyone accepts that future services will be supported with more automated tools.  The question seems to be how these tools relate to operations, which means how much orchestration is pulled into OSS/BSS versus put somewhere else (below the operations systems).  It also depends on what you think an “event” is.

Most operations systems today are workflow-based systems, meaning that they structure a linear process flow that roughly maps to the way “provisioning” of a service is done.  While nobody depends on manual processes any longer, they do still tend to see the process of creating and deploying a service to be a series of interrupted steps, with the interruption representing some activity that has to signal its completion.  What you might call a “service-level event” represents a service-significant change of status, and since these happen rarely it’s not proved difficult to take care of them within the current OSS/BSS model.

The challenge, at least as far the “event-driven” school of operations people is concerned, lies in the extension of software tools to automatic remediation of issues.  One operator was clear:  “I can demonstrate OSS/BSS integration at the high level of the service lifecycle, but I’m not sure how fault management is handled.  Maybe it isn’t.”  That reflects the core question; do you make operations event-driven and dynamic enough to envelop the new service automation tasks associated with things like NFV and SDN, or do you perform those tasks outside the OSS/BSS?

This is where I think the operators’ view of Cisco’s approach is interesting.  In Cisco’s ACI model, you set policies to represent what you want.  Those policies then guide how infrastructure is managed and traffic or problems are accommodated.  Analytics reports an objective policy failure, and that triggers an operations response more likely to look like trouble-ticket management or billing credits than like automatic remediation.  It’s not, the operators say, that Cisco doesn’t or can’t remediate, but that resource management is orthogonal to service management, and the “new” NFV or SDN events that have to be software-handled are all fielded in the resource domain.

Most operators think that this approach is contrary to the vision that NFV at least articulates, and in fact it’s NFV that poses the largest risk of change.  It’s clear that NFV envisions a future where software processes not only control connectivity and transport parameters to change routes or service behavior, the processes also install, move, and scale service functionality that’s hosted not embedded.  This means that to these operators, NFV doesn’t fit in either a “service-event” model or in a “resource-based-event-handling model.  You really do need something new in play, which raises the question of where to put it.

The service-event-driven OSS/BSS planners think the answer to that is easy; you build NFV MANO below the OSS/BSS and you field and dispatch service-layer events to coordinate operations processes and infrastructure events.  This does not demand a major change in operations.  The remainder of the planners think that somehow either operations has to field infrastructure events and host MANO functions, or that MANO has to orchestrate both operations and infrastructure-management tasks together, creating a single service model top to bottom.

I’ve always advocated that view and so I’d love to tell you that there’s a groundswell of support arising for it.  That’s not the case.  In all the operators I’ve talked with, only five seem to have any recognition of the value of this coordinated operations/infrastructure event orchestration and only one seems to have grasped its benefits and how to achieve it.

What this means is that the PoCs and tests and trials underway now are just starting to dip a toe in the main issue pool, which is not how you make OSS/BSS launch NFV deployment or command NFV to tear down a service, but how you integrate all the other infrastructure-level automated management tasks with operations/service management.  This is what I think should be the focus of trials and tests for the second half of 2015.  We know that “NFV works” in that we know that you can deploy virtual functions and connect them to create services.  What we have to find out is whether we can fit those capabilities into the rest of the service lifecycle, which is partly supported by non-NFV elements and overlaid entirely by OSS/BSS processes that are not directly linked with MANO’s notion of a service lifecycle.

I think we may be close to this, and though “close” doesn’t mean “real close”, I think that the inertia of OSS/BSS is working in favor of keeping service events and infrastructure events separated and handling the latter outside OSS/BSS.  Since that’s what most are doing now, this might be a case where the status quo isn’t too bad a thing.  The only issue will be codifying how below-the-OSS orchestration and the OSS/BSS processes link with each other in a way broad and flexible enough to address all the service options we’re hoping to target with NFV.

Feature Balance, Power Balance, and Revolutionary Technologies

Networking and IT have always been “frenemies”.  They often compete for budgeting in the enterprises, they certainly have competed for power in the CIO organization.  One of the interesting charts I used to draw in trade show presentations tracked how the two areas were competing for “feature opportunity”.  By the year 2010, my model showed, IT would have convincingly owned about 17% of the total feature opportunity, networking 28%, and 55% would still be up for grabs.  Since no market wants to differentiate on price alone, having a feature-opportunity win would be a big boost for technologies, vendors, and the associated political constituencies in the enterprise.

That forecast largely came true in 2010, and networking did gain strength and relevance.  Since then things have been changing.  In 2014, the model said that if you looked at the totality of feature opportunity, networking and IT had cemented about 19% each, and everything else was yet to be committed.  What changed things certainly included the combination of the Internet and the cloud, but these two forces don’t tell the whole story.

The Internet demonstrates resource can be turned into network abstractions.  All forms of cloud computing tend to make things more network-like for the simple reason that they promote network access to abstract IT features.  That much of the cloud trend could promote networking versus IT flies in the face of the shift actually seen.  What made the difference goes back to abstraction, and the details might explain why John Chambers seems to be saying “white boxes win”, why IBM might (as reported on SDxCentral) be investing more in SDN, and why EMC might want to buy a network company.

Even before SDN came along, we were seeing a trend toward the abstraction of network behavior, “virtual” networks like VPNs and VLANs.  This trend has tended to reduce differentiation among network vendors by creating a user-level, functional, definition for services at L2 and L3.  Sure, users building their own networks could appreciate the nuances of implementation, but functionality drives the benefit case and thus enables consumption.

SDN takes virtualization of networks in a new direction.  By proving abstractions of devices and not just services, SDN makes it more difficult to differentiate even at the level of building networks.  If we assumed that SDN  in its pure form went forward and dominated, then “white box” is inevitable at least in a functional sense.  Only what could be specified by OpenFlow could be used to build services.  That’s the ultimate in abstraction.

NFV takes another perhaps more significant step, along with cloud-management APIs like OpenFlow’s Neutron.  If you have a means of creating applications and services that consume network abstractions, then anything that realizes these abstractions is as good as anything else.  That’s the explicit goal of NFV, after all.  Properly applied, NFV says “You can resolve our abstractions of network services using SDN, but also using anything else that’s handy”.  It embraces the legacy elements, which limits how much network incumbents can do to stave off commoditization by bucking evolution to new models like SDN.

The interesting thing here is that networking, despite having a lead before, lost ground between 2010 and 2015.  Not lost in terms of investment but lost in terms of feature-value leadership.  Perhaps even more interesting is that IT didn’t gain, it also lost.  The gainer was the “in-between”, and I think that’s the most important lesson to learn here.

Virtualization is the general trend at work here.  It’s a combination of abstraction and instantiation, intended in large part to promote resource independence.  Abstraction reduces everything to functionality.  Functionality is a slave to demand, not to supply, and abstraction’s very goal of resource independence shouts “Hardware doesn’t matter!”  The important thing my modeling shows is that both IT and networking are losing, and nobody grabs for a lifeboat like a drowning man.  Thus, it’s abstraction that I think is behind the news items I cited.

EMC, whose VMware unit acquired Nicira, is in a position to abstract everything in physical networking.  A virtual overlay doesn’t care what the underlayment is.  The problem that they have is that even if the “undernet” is anonymous, it still has to be something.  So it makes sense for EMC to think about buying a network company to get some real gear.  If they don’t then a vendor who offers real equipment might well offer virtual-overlay software too.  A vendor like Cisco.  Chambers knows that overlay wins, but he dares not to say “overlay” because everyone will then think VMware/EMC.  So he says “white box”, and commits his own version of abstraction.

For a vendor on the IT side like IBM, the smart play is to abstract the network stuff as completely as possible.  So IBM is an OpenDaylight champ, and it continues to develop OpenDaylight even though it seems to have no clear SDN story or mission of its own.  It doesn’t need one to win; it only has to make sure that a network abstraction wins.

Making network abstraction win means to make sure software wins, because ultimately IBM is now a software company.  Hardware, network or IT, is more of a risk than anything else.  A giant hardware player can still hold its own against a giant software player because you can’t run software in thin air, or overlay nonexistent infrastructure.  So IBM has to fight not only Cisco but also EMC and HP, more perhaps even than it has to fight Oracle and Microsoft.  Why?  Because software plus hardware will beat software alone, mostly because the majority of spending will still be on the platform and not on the software.  The company who has both can sustain sales presence and control, at least in the near term.

Even in the long term, how commoditized can hardware be?  We’ve had standard-platform x86 “COTS” for decades and while all the vendors would love to see better margins and less competition, there are still competitors and we’re seeing commoditization and consolidation rather than the collapse of the hardware space.  Chambers’ view of a white-box future may be likened to a story about trolls to keep kids out of the woods.  He may be afraid that he can’t dynamite Cisco into a more software-centric stance without making it clear that they can’t just hunker down on the hardware forever.  Whatever he says, though, you still need vendors to make white boxes and to integrate and support the function/host combination.

The interesting thing is that while Cisco and EMC and IBM have been in the stories, they’re not the players I think will decide the issues.  Those are HP and Oracle.  HP is perhaps the last real “server” vendor left and Oracle is the real “software” player, if one focuses both categories on the abstraction, SDN, and NFV battleground.  Both HP and Oracle are looking for a strong NFV story.  Both have good middleware credentials, but Oracle has the advantage in middleware.  HP has the advantage with servers, networking and SDN.  Neither of the two have fully leveraged their assets.  If and when they do they may finally decide who gets to shape the future.

In Search of a Paradigm for Virtual Testing and Monitoring

Virtualization changes a lot of stuff in IT and in networking, for two principle reasons.  One is that it breaks the traditional ties between functionality (in the form of software) and resources (both servers and associated connection-network elements).  The other is that it creates resource relationships that don’t map to physical links or paths.  The end result of virtualization is something highly flexible and agile, but also significantly more complicated.

When SDN and NFV came along, one of the things I marveled at was the way that test and monitoring players approached it.  The big question they asked me was “What new protocols are going to be used?”  as if you could understand NFV by intercepting the MANO-to-VIM interface.  The real question was how you could gain some understanding of network behavior when all the network elements and pathways were agile, virtual.

Back in the summer of 2013 when I was Chief Architect for the CloudNFV initiative, I prepared a document on a model for testing/monitoring as a service.  The approach was aimed at leveraging the concept of “derived operations” that was the primary outgrowth of the original ExperiaSphere project and the associated TMF presentations, to provide the answer to the real question, which was “How do you test/monitor a virtual network”.  There was never a partner for that phase and so the document was never released, but I think the basic principles are valid and they serve as a primer in at least one way of approaching the problem.

Like ExperiaSphere, CloudNFV was based on “repository-based management” where all management data was collected in a repository to be delivered through management proxies and queries against that data base, in whatever form was helpful.  A server or switch, for example, would have its MIB polled by an agent who would then store the data (including time-stamp) in the repository.  When somebody wanted to look at switch state, they’d query the repository and get the relevant information.

What makes this “derived” operations was the idea that a service model described a set of objects that represented functionality—atomic like a device or VNFC or collective like a subnetwork.  Each object in the model could describe a set of management variables whose value derived from subordinate object variables using any expression that was useful.  In this way, the critical pieces of a service model—the “nodes”—could be managed as though they were real, which is good because in a virtual world, the abstraction (the service model) is the “realest” thing there is.

The real solution to monitoring virtual networks is to take advantage of this concept.  With derived operations, a “probe” that can report on traffic conditions or other state information is simply a contributor to the repository like anything else that has real status.  You “read” a probe by doing a query.  The trick lies in knowing what probe to read, and I think the solution to that problem exposes some interesting points about NFV management in general.

When an abstract “function” is assigned to a real resource, we call that “deployment” and we call the collective decision set that deploys stuff in NFV “orchestration”.  It follows that orchestration builds resource bindings, and that at the time of deployment we “know” where the abstraction’s resources are—because we just put them there.  The core concept of derived operations is to record the bindings when you create them.  We know, then, that a given object has “real” management relationships with certain resources.

Monitoring is a little different, or it could be.  One approach to monitoring would be to build probes into service descriptions.  If we have places where we can read traffic using RMON or DPI or something, we can exercise those capabilities like they were any other “function”.  A probe can be what (or one of the things that) a service object actually deploys.  A subnet can include a probe, or a tunnel, or a machine image.  Modeled with the service, the probe contributes management data like anything else.  What we’d be doing if we used this model is similar to routing traffic through a conventional probe point.

The thing is, you could do even more.  In a virtual world, why not virtual probes?  We could scatter probes through real infrastructure or designate points where a probe could be loaded.  When somebody wanted to look at traffic, they’d do the virtual equivalent of attaching a data line monitor to a real connection.

To make virtual probes work, we need to understand probe-to-service relationships, because in a virtual world we can’t allow service users to see foundation resources or they see others’ traffic.  So what we’d have to do is to follow the resource bindings to find real probe points we could see, and then use a “probe viewer” that was limited to querying the repository for traffic data associated with the service involved.

One of the things that’s helpful in making this work is the notion of modeling resources in a way similar to that used for modeling services.  An operator’s resource pool is an object that “advertises” bindings to the service objects, each representing some functional element of a service for which it has a recipe for deployment and management.  When a service is created, the service object “asks” for a binding from the resource model, and gets the binding that matches functionality and other policy constraints, like location.  That’s how, in the best of all possible worlds, we can deploy a 20-site VPN with firewall and DHCP support when some sites can use hosted VNF service chains and others have or need real CPE.  The service architect can’t be asked to know that stuff, but the deployment process has to reflect it.  The service/resource model binding is where the physical constraints of infrastructure match the functional constraints of services.

And monitoring, so it happens.  Infrastructure can “advertise” monitoring and even test data injection points, and a service object or monitoring-and-testing-as-a-service could then bind to the correct probe point.  IMHO, this is how you have to make testing and monitoring work in a virtual world.  I think the fact that the vendors aren’t supporting this kind of model is in no small part due to the fact that we’ve not codified “derived operations” and repository-based management data delivery, so the mechanisms (those resource bindings and the management derivation expressions) aren’t available to exploit.

I think that this whole virtual-monitoring and monitoring-as-a-service thing proves an important point, which is that if you start something off with a high-level vision and work down to implementation in a logical way, then everything that has to be done can be done logically.  That’s going to be important to NFV and SDN networks in the future, because network operators and users are not going to forego the tools they depend on today just because they’ve moved to a virtual world.

Fixing the Conflated-and-Find-Out Interpretation of MANO/VIM

I blogged recently about the importance of creating NFV services based on an agile markup-like model rather than based on static explicit data models.  My digging through NFV PoCs and implementations has opened up other issues that can also impact the success of an NFV deployment, and I want to address two of them today.  I’m paring them up because they both relate to the critical Management/Orchestration or MANO element.

The essential concept of NFV is that a “service” somehow described in a data model is converted into a set of cooperating committed resources through the MANO element.  One point I noted in the earlier blog is that if this data model is highly service-specific, then the logic of MANO necessarily has to accommodate all the possible services or those services are ruled out.  That, in turn, would mean that MANO could become enormously complicated and unwieldy.  This is a serious issue but it’s not the only one.

MANO acts through an Infrastructure Manager, which in ETSI is limited to managing Virtual Infrastructure and so is called a VIM.  The VIM represents “resources” and MANO the service models to be created.  If you look at the typical implementations of NFV you find that MANO is expected to drive specific aspects of VNF deployment and parameterization, meaning that MANO uses the VIM almost like OpenStack would use Neutron or Nova.  In fact, I think that this model was explicitly or unconsciously adopted for the relationship, which I think is problematic.

The first problem that’s created by this approach is what I’ll call the conflation problem.  A software architect approaching a service deployment problem would almost certainly divide the problem into two groupings—definition of the “functions” part of virtual functions and descriptions/recipes on how to virtualize them.  The former would view a VNF implementation of “firewall” and a legacy implementation of the same thing as equivalent, not to mention two VNF implementations based on different software.  The latter would realize the function on the available resources.

If you take this approach, then VIMs essentially advertise recipes and constraints on when (and where) they can be used.  MANO has to “bind” a recipe to a function, but once a recipe is identified it’s up to the VIM/chef to cook the dish.

In a conflated model, MANO has to deploy something directly through the VIM, understanding tenant VMs, servers, and parameters.  The obvious effect of this is to make MANO a lot more complicated because it now has to know about the details of infrastructure.  That also means that the service model has to have that level of detail, which as I’ve pointed out in the past means that services could easily become brittle if infrastructure changes underneath.

The second issue that the current MANO/VIM approach creates is the remember-versus-find-out dichotomy.  If MANO has to know about tenant VMs and move a VIM through a deployment process, then (as somebody pointed out in response to my earlier blog on this) MANO has to be stateful.  A service that deploys half a dozen virtual machines and VNFCs has a half-dozen “threads” of activity going at any point in time.  For a VNF that is a combination of VNFCs to be “ready”, each VNFC has to be assigned a VM, loaded, parameterized, and connected.  MANO then becomes a huge state/event application that has to know all about the state progression of everything down below, and has to guide that progression.  And not only that, it has to do that for every service—perhaps many at one time.

Somebody has to know something.  You either have to remember where you are in a complex deployment or constantly ask what state things are in.  Even if you accept that as an option, you’d not know what state you should be in unless you remembered stuff.  Who then does the remembering?  In the original ExperiaSphere project, I demonstrated (to the TMF among others) that you could build a software “factory” for a given service by assembling Java Objects.  Each service built with the factory could be described with a data model based on the service object structure, and any suitable factory could be given a data model for a compatible service at any state of lifecycle progression and it could process events for it.  In other words, a data model could remember everything about a service so that an event or condition in the lifecycle could be handled by any copy of a process.  In this situation, the orchestration isn’t complicated or stateful, the service model that describes it remembers everything needed because it’s all recorded.

There are other issues with the “finding-out” process.  Worldwide, few operators build services without some partner contributions somewhere in the process.  Most services for enterprises span multiple operators, and so one operates as a prime contractor.  With today’s conflated-and-find-out model of MANO/VIM, a considerable amount of information has to be sent from a partner back to the prime contractor, and the prime contractor is actually committing resources (via a VIM) from the partner.  Most operators won’t provide that kind of direct visibility and control even to partners.  If we look at a US service model where a service might include access (now Title II or common-carrier regulated) and information (unregulated), separate subsidiaries at arm’s length have to provide the pieces.  Is a highly centralized and integrated MANO/VIM suitable for that?

I’m also of the view that the conflated-find-out approach to MANO contributes to the management uncertainty.  Any rational service management system has to be based on a state/event process.  If I am in the operating state and I get a report of a failure, I do something to initiate recovery and I enter the “failed” state until I get a report that the failure has been corrected.  In a service with a half-dozen or more interdependent elements, that can best be handled through finite-state machine (state/event) processing.  But however you think you handle it, it should be clear that the process of fixing something and of deploying something are integral, and that MANO and VNFM should not be separated at all.  Both, in fact, should exist as processes that are invoked by a service model as its objects interdependently progress through their lifecycle state/event transitions.

If you’re going to run MANO and VNFM processes based on state/event transitions, then why not integrate external NMS and OSS/BSS processes that way?  We’re wasting enormous numbers of cycles trying to figure out how to integrate operations tasks when if we do MANO/VNFM right the answer falls right out of the basic approach with no additional work or complexity.

Same with horizontal integration across legacy elements.  If a “function” is virtualized to a real device instead of to a VNF, and if we incorporate management processes to map either VNF and host state on one hand and legacy device state on the other to a common set of conditions, then we can integrate management status across any mix of technology, which is pretty important in the evolution of NFV.

If we accept the notion that the ETSI ISG is a functional specification then these issues can be addressed readily by simply adopting a model-based description of management and orchestration.  Another mission for OPNFV, or for vendors who are willing to look beyond the limited scope of PoCs and examine the question of how their model could serve a future with millions of customers and services.

Parcel Delivery Teaches NFV a Lesson

Here’s a riddle for you.  What do Fedex and NFV have in common?  Answer:  Maybe nothing, and that’s a problem.  A review of some NFV trials and implementations, and even some work in the NFV ISG, is demonstrating that we’re not always getting the “agility” we need, and for a single common reason.

I had to ship something yesterday, so I packed it up in a box I had handy and took it to the shipper.  I didn’t have a specialized box for this item.  When I got there, they took a measurement, weighed it, asked me for my insurance needs, and then labeled it and charged me.  Suppose that instead of this, shippers had a specific process with specific boxing and handling for every single type of item you’d ship.  Nobody would be able to afford shipping anything.

How is this related to NFV?  Well, logically speaking what we’d like to have in service creation for NFV is a simple process of providing a couple of parameters that define the service—like weight and measurements on a package—and from those invoke a standard process set.  If you look at how most NFV trials and implementations are defined, though, you have a highly specialized process to drive deployment and management.

Let me give an example.  Suppose we have a business service that’s based in part on some form of virtual CPE for DNS, DHCP, NAT, firewall, VPN, etc.  In some cases we host all the functions in the cloud and in others on premises.  Obviously part of deployment is to launch the necessary feature software as virtual network functions, parameterize them, and then activate the service.  The parameters needed by a given VNF and what’s needed to deploy it will vary depending on the software.  But this can’t be reflected in how the service is created or we’re shipping red hats in red boxes and white hats in white boxes.  Specialization will kill agility and efficiency.

What NFV needs is data-driven processes but also process-independent data.  The parameter string needed to set up VNF “A” doesn’t have to be defined as a set of fields in a data model.  In fact, it shouldn’t be because any software guy knows that if you have a specific data structure for a specific function, the function has to be specialized to the structure.  VNF “A” has to understand its parameters, but the only thing that NFV software has to know to do is to get variables the user can set, and then send everything to the VNF.

The biggest reason why this important point is getting missed is that we are conflating two totally different notions of orchestration into one.  Any respectable process for building services or software works on a functional-black-box level.  If you want “routing” you insert something that provides the properties you’re looking for.  When that insertion has been made at a given point, you then have to instantiate the behavior in some way—by deploying a VNF that does virtual routing or by parameterizing something real that’s already there.  The assembly of functions like routing to make a service is one step, a step that an operator’s service architect would take today but that in the future might be supported even by a customer service portal.  The next step, marshaling the resources to make the function available, is another step and it has to be separated.

In an agile NFV world, we build services like we build web pages.  We have high-level navigation and we have lower-level navigation.  People building sites manipulate generic templates and these templates build pages with specific content as needed.  Just like we don’t have packages and shipping customized for every item, we don’t have a web template for every page, just for every different page.  Functional, in short.  We navigate by shifting among functions, so we are performing what’s fairly “functional orchestration” of the pages.  When we hit a page we have to display it by decoding its instructions.  That’s “structural” orchestration.  The web browser and even the page-building software doesn’t have to know the difference between content pieces, only between different content handling.

I’ve been listening to a lot of discussions on how we’re going to support a given VNF in NFV.  Most often these discussions are including a definition of all of the data elements needed.  Do we think we can go through this for every new feature or service and still be agile and efficient?  What would the Internet be like if every time a news article changed, we had to redefine the data model and change all the browsers in the world to handle the new data elements?

NFV has to start with the idea that you model services and you also model service instantiation and management.  You don’t write a program to do a VPN and change it to add a firewall or NAT.  You author a template to define a VPN, and you combine that with a “Firewall” or “NAT” template to add those features.  For each of these “functional templates” you have a series of “structural” ones that tell you, for a particular point in the network, how that function is to be realized.  NFV doesn’t have to know about the parameters or the specific data elements, only how to process the templates, just like a browser would.  Think of the functional templates as the web page and the structural ones as CSS element definitions.  You need both, but you separate them.

I’d love to be able to offer a list here of the NFV implementations that follow this approach, but I couldn’t create a comprehensive list because this level of detail is simply not offered by most vendors.  But as far as I can determine from talking with operators, most of them are providing only those structural templates.  If we have only structural definitions then we have to retroject what should be “infrastructure details” into services because we have to define a service in part based on where we expect to deploy it.  If we have a legacy CPE element in City A and virtual CPE in City B, we’d have to define two different services and pick one based on the infrastructure of the city we’re deploying in.  Does this sound agile?  Especially considering the fact that if we then deploy VCPE in City A, we now have to change all the service definitions there.

Then there’s management.  How do you manage our “CPE” or “VCPE?”  Do we have to define a different data model for every edge router, for every implementation of a virtual router?  If we change a user from one to the other, do all the management practices change, both for the NOC and for the user?

This is silly, people.  Not only that, it’s unworkable.  We built the web around a markup language.  We have service markup languages now, the Universal Service Definition Language or USDL is an example.  We have template-based approaches to modeling structures and functions, in TOCSA and elsewhere.  We need to have these in NFV too, which means that we have to work somewhere (perhaps in OPNFV) on getting that structure in place, and we have to start demanding vendors explain how their “NFV” actually works.  Otherwise we should assume it doesn’t.

How Buyers See New Network Choices

Networking is changing, in part because of demand-side forces and in part because of technologies.  The question is whether technology changes alone can have an impact, and for that one I went to some buyers to get answers on how they viewed some of the most popular new technology options of our time.  The results are interesting.

One of the most interesting and exciting (at least to some) of the SDN stories is the “white box” concept.  Give an enterprise or a service provider an open-source SDN controller, OpenFlow, and a bunch of “white box” generic/commodity switches and you have the network of the future.  Since “news” means “novelty” rather than “truth” it’s easy to see why this angle would generate a lot of editorial comment.  The questions are first, “Is it true?” and second, “What would it actually mean?”

The white-box craze is underpinned by two precepts.  First, that an open-source controller could create services using white-box switches that would replicate IP or Ethernet services of today.  Second, that those white-box switches would offer sufficiently lower total cost of ownership versus traditional solutions to induce buyers to make the switch.  The concept could deliver, but it’s not a sure thing.

Buyers tell me that in the data center the white-box concept isn’t hard to prove out.  Any of the open-source controllers tried by enterprises and operators were able to deliver Ethernet switching services using white-box foundation switches.  This was true for data centers ranging from several dozen to as many as several thousand servers.

However, buyers were mixed on whether the savings were sufficient.  Operators said that their TCO advantage averaged 18%, which was they said less than needed to make a compelling white-box business case if there was already an installed base of legacy devices.  Most said it was sufficient to justify white-box SDN in new builds.  Enterprises reported TCO benefits that varied widely, from as little as 9% to as much as 31%.  The problem for enterprises was that they had little expectation of new builds and most set a “risk premium” of about 25% on major technology changes.  Thus most enterprises indicated that they couldn’t make the business case in the data center.

Outside the data center it was even more negative.  Only 8% of operators’ projects outside the data center were able to even match the data center’s 18% TCO benefit, and operators expressed concerns that white-box technology was “uproven” (by a 2:1 margin) or offered too low a level of performance (by 60:40) to be useful at all, savings notwithstanding.

Interestingly, virtual switching/routing fares a lot better outside the data center.  Almost 70% of operators thought that virtual switching/routing could if hosted on optimal servers deliver at least 20% TCO benefit relative to legacy devices.  For enterprises the number was just over 75%.  Inside the data center, both operators and enterprises believed vSwitch technology could reduce their need to augment data center switching substantially (offering savings nearly 40% in TCO) but they didn’t see it displacing current switches or eliminating the need for new switches if new servers were added.  The consensus was that vSwitches were good for VMs, not for servers.

Operators believe that agile optics can supplement vSwitch technology and selected white-box deployments to displace as much as 70% of L2/L3 spending by 2025.  This suggests that white-box SDN and virtual switching/routing is best employed to supplement optical advances.  They see white-box data centers emerging more from NFV deployments, interestingly, than they do from directly driven SDN opportunities.  The reason seems to be that they believe NFV will generate a lot of new but smaller data centers where white-box and virtual technology is seen as suitable in performance.

Buyers are in general not particularly enthusiastic about white-box support or vendor credibility.  Three out of four enterprises and almost 90% of operators think their legacy vendors are more trustworthy and offer more credible support.  Virtually 100% of both groups think that they would want “more contractual assurances” from white-box vendors to counter their concerns about reputation and historicity.

What about white-box devices from legacy vendors.  Almost half of both buyer groups think that will “never happen” meaning no chance for at least five years.  Everyone saw legacy vendors entering the white-box space in earnest only when there was no option other than to lose business to others.  Nobody saw them as being leaders, though almost all buyers say that they can get SDN control for legacy devices from their current vendors.

Another option that generates mixed reviews is the overlay SDN model popularized by Nicira (now part of VMware).  While nearly all network operators and two-thirds of enterprises see benefits in overlay-based SDN, they’re struggling to assign an economic value to their sentiment.  The most experienced/sophisticated buyers in both groups (what I call the “literati”) believe that virtual-overlay technology combined with white-box basic physical switching in the LAN and agile optics in the WAN.  They say that the potential benefits are not promoted by vendors, however.

Interestingly, both network operators and enterprises are more hopeful about the Open Compute switching model than about white-box products based on SDN.  Almost 80% of enterprises say they would purchase OCP switches from “any reputable vendor” and almost 70% say they would buy commodity versions of these products.  Operators run slightly lower in both categories.  The difference, say buyers, is that OCP is a “legacy switch in a commodity form” where white-box SDN switches are based on a “new and less proven” technology combination.

What I get from all of this is that buyers need a more holistic statement of a new switch/routing paradigm than they’re getting.  It would seem that a combination of white-box physical switching and overlay SDN might be very attractive, but in the main buyers don’t see that being offered as a combination and they see do-it-yourself integration of two less-than-proven technologies as unattractive.  They’d love to see a major computer vendor (HP or IBM) field that combination; they’re not convinced that network giants will do that, and they’re still a little leery of startups, though less so than they’d been in the past.

The lesson is that there’s no such thing as a “point revolution”.  We have to expect rather significant and widespread change if we’re going to see much change at all, and users need a lot of reassurance about new technologies…and new partners.

A Deep Look at a Disappointing Neutrality Order

The FCC finally released its neutrality order, causing such a run on the website that it crashed the document delivery portion.  Generally, the order is consistent with the preliminary statement on its contents that was released earlier, but now that the full text is available it’s possible to pin down some of the issues I had to hedge on before.

First, the reference.  The official document is FCC 15-24, “REPORT AND ORDER ON REMAND, DECLARATORY RULING, AND ORDER” issues March 12, 2015.  Not surprisingly in our current politicized age, it was passed on a 3:2 partisan vote.  It’s 400 pages long in PDF form, so be prepared for a lot of reading if you intend to browse it fully.

This order was necessitated by the fact that the previous 2010 order was largely set aside by the DC Court of Appeals.  The problem the FCC had stemmed from the Telecom Act of 1996, which never mentioned the Internet at all and was woefully inadequate to serve as guidance in what was the dawn of the broadband era.  I won’t rehash all the past points, but in summary we spend about seven years trying to come up with a compromise reading of the Act that would let broadband investment continue but at the same time provide some regulatory clarity on the Internet itself.  The formula they arrived at was that the Internet was “an information service with a telecommunications component.”  That exempt it from common-carrier regulation, which is defined by Title II of the Communications Act.

When in 2010 the FCC tried to address some of the emerging neutrality issues, they were trapped by their own pronouncement.  If the ISPs were common carriers there was no question the FCC could do what it wanted, but the FCC had said they were not.  The order of 2010 was largely an attempt to salvage jurisdiction from that mess, and it failed—that’s what the Court of Appeals said.  So the fact is that unless you wanted no neutrality order at all, the FCC had no option but Title II regulation.  Fortunately for the FCC, it is not bound legally by its own precedent, which means it can simply change its mind.  It did.

The essence of the 2015 order is simple.  The FCC declares the ISPs to be common carriers with respect to broadband Internet service, making them subject to Title II.  They then exercise the once-famous-now-forgotten provision of the Telecom Act, Section 706, which allows the FCC to “forebear” from applying provisions of the act to assure the availability of Internet services to all.  In this basic sense, the order is following the recipe that the DC Court of Appeals offered in its opinion on the 2010 order, and so this part of the order is fairly bulletproof.

What the FCC proposes to do with the authority it has under Title II is a bit more complicated.  At a high level, the goal of the order is to draw what the FCC calls a “bright line”, a kind of classic line-in-the-sand that would tell everyone where they can’t go.  The basic principles of that bright line are:

  • No blocking of lawful traffic, services, devices, or applications.
  • No throttling of said traffic, except for clear network management purposes.
  • No paid prioritization.

Unlike the order of 2010, the FCC applies these rules to both wireless and wireline.  They exempt services based on IP or otherwise that are separate from the Internet, including VoIP, IPTV, and hosting and business data services.  I interpret the exemptions as including cloud computing services as well.  The key point is that an exempt service is one that does not provide access to the Internet overall, and uses facilities that are separate from those of broadband Internet access.

The last point is important to note.  Broadband Internet access is a Title II service.  The Internet itself is not.  However, the FCC does reserve for itself with this order the right to intervene on interconnect issues, though it declines to do that at present.  The order says that regulators lack the history of dealing with Internet interconnect issues and is not comfortable with prescriptions without further data and experience.  Thus, the order neither affirms nor rules out paid settlement among ISPs of the Netflix-Comcast type.

A point that cuts across all of these other issues is that of transparency.  The FCC wants broadband Internet providers to say what they mean and then do what they say.  My interpretation of this means for example that a mobile provider can’t offer “unlimited” data and then limit it by blocking or throttling or by adding hidden charges based on incremental usage.

To me, the order has one critical impact, perhaps not what the FCC intended.  Operators want to make a favorable return on investment.  If they don’t have a pathway to that through paid prioritization, then it is unlikely that Internet as a service will ever be truly profitable to them.  The best they could hope for would be to earn enough to cover the losses by selling other over-the-top services.  That’s a problem because the OTTs themselves wouldn’t have those losses to cover, and so could likely undercut operators on price.  Thus, the operators may look to “special services” instead, and I think that works against everything the FCC says it wants.

The order gives the distinct impression that the FCC believes the distinguishing point about the Internet is its ubiquity.  A “special service” has the defining criteria of not giving access to all of the Internet.  You can use IP and deliver some specific thing, not Internet access, and call it a special service, immune from regulations.  Certainly the universality of Internet access is a valid criteria, but in an investment sense the fact is that most paying services travel very short distances—less than 40 miles—and involve largely content delivery or (in a growing amount) cloud computing.  Does the order allow operators to separate out the profitable stuff—even encourage them?  Already it’s clear that profitable services are largely special services and the prohibition on paid prioritization guarantees that will be truer in the future.

Video is delivered both on- and off-Internet today, but channelized viewing is a special service.  Most for-fee operator VoIP is also a special service.  Business data services are special services.  Were there paid QoS on the Internet, might there be pressure to move these special services back onto the Internet?  Might the FCC even be able to take the position that they should be combined?  As it is, I see no chance of that happening, and in fact every chance that operators will look to special services, off the Internet, to insure they get reasonable returns.  Home monitoring, cloud computing, IoT, everything we talk about as being a future Internet application could, without paid prioritization, end up off the Internet not on it.

We might, with paid prioritization and a chance for Internet profit, see VC investment in the Internet as a network instead of in other things that increase traffic and complicate the ISP business model.  Certainly we’d give traditional L2/L3 devices a new lease on life.  The order, if it stands, is likely to put an end to those chances and accelerate the evolution toward virtual L2/L3 and minimization of “access” investment.

Will it stand?  The FCC has broad powers on Title II services; do they have the power to say that some commercially viable options cannot be presented, or that operators have to provide services with limits on their features?  I don’t know the answer to that one, but I suspect that there will be pressure now for Congress to step in.  In this day and age that’s a doubtful benefit, but there’s plenty of doubt in what we have now.

The problem here is that we don’t have a real middle ground in play.  Compromise, even when it’s politically possible, is easier to achieve if there is a position between the extremes.  With neutrality we’ve largely killed off moderation, leaving the best position one none of the partisan advocacy groups occupy.  There is then no constituency on which to build a compromise because a middle-ground view simply offends all the players.

“Internet” is an information network on top of a telecommunications service.  We have to treat the latter like all such services, meaning we have to regulate it and apply rules to settlement and interconnect.  We have to include QoS (where have we had a commercial service without SLAs?)  I think that Chairman Wheeler was on the right track with neutrality before the Administration intervened.  Sadly, we can’t take that back and sadly Congressional intervention will only create the other extreme polar view.  Now, I guess, we’ll have to wait until some symptoms develop and rational views can prevail, or so we can hope.