Some Truths About Net Neutrality

The headline Bloomberg published on Wednesday was “FCC Chief Sets Up Clash With Call to Repeal Net Neutrality”, certainly one to generate clicks.  Most of the TV and online coverage of the current FCC action followed the same path.  But is it the right path?  You have to do a little history-dipping to decide.

The FCC, now under Chairman Pai since the change in administration, has struggled with Internet regulation for literally decades.  As an FCC watcher for even longer, I’ve seen two basic issues come to the fore.  First, how do you regulate consumer data services, meaning the Internet?  Second, how do you secure fair regulatory treatment (meaning equivalent treatment) for the same service (the Internet) when multiple communications delivery options (wireline telco, cable, wireless) exist?

The logical answer to these questions should never have been left to the FCC to start with.  Telecom regulations fall under the Communications Act of 1934 as revised by the Telecommunications Act of 1996.  Neither of these pieces of legislation even mentions the Internet.  Thus, the FCC has been trying to regulate what has become the most important network service of all time, using legislative standards that were devised without giving that service a thought.  You have to keep this in mind as we assess what the “new” FCC might now do.

A fundamental principle of law is jurisdiction.  Federal commissions like the FCC are essentially courts.  Federal law classifies them as “quasi-judicial agencies” because they are charged by law with the court-of-fact responsibility for a specialized area—communications, in the FCC’s case.  What jurisdiction means to the whole question of Internet regulation is simple.  In order for the FCC to do anything with respect to the Internet, it must establish jurisdiction.

The long, frustrating, argumentative, contradictive, and downright ugly history of what we call “net neutrality” today arises from false starts created by the jurisdiction issue.  Specifically, where we are and what Chairman Pai proposes to do both relate to the question.

Every neutrality order the FCC passed up to the most recent was overturned by the courts because the FCC failed to establish it had jurisdiction to issue the order.  Thus, we’ve operated as an industry with no effective order at all, until the Wheeler FCC finally took the step that the courts had suggested they might have to take.  They declared that Internet services were telecommunications services and thus regulated under Title II of the Communications Act, which established FCC jurisdiction over virtually all aspects of the services.  Absent Title II, there was no jurisdiction, no order, no neutrality.

Title II, of course, would let the FCC do all manner of things.  Many people objected to the classification for that reason, which was a bit unfair given that absent the classification the courts were saying there was no authority to regulate neutrality at all.  However, while all the jurisdictional jostling was going on, the FCC was adding things to “neutrality” (which legally didn’t amount to much up to the last order).  In addition to the necessary ruling that ISPs couldn’t block sites or throttle competitors, the final Wheeler order added a prohibition against paid prioritization of traffic or settlement among ISPs for carriage.  That’s what’s in force today.

The Pai position from the first was that Title II should not have been used because the Internet wasn’t the kind of monopoly industry that the Bell System was back in 1934.  What Pai proposes to do now is to withdraw the Title II classification (the FCC is not bound by its own precedents so it can change its position pretty much at will).  That would effectively make it impossible for the FCC to regulate the Internet in the traditional sense.  Would that kill neutrality?  Let’s see.

The Republican commissioners on the FCC have taken the position that no ISP would dare to throttle traffic of a competitor or block access to a website.  Public pressure alone would prevent it.  Comcast actually did interfere with some site traffic and was ordered to cease, but the courts overturned the ruling.  However, Comcast did cease the practices because of bad publicity, which could argue in favor of the public-pressure-works position.  I don’t think that “basic” neutrality would be impacted with or without a Title II ruling.

Where public pressure almost certainly would not work is in the paid-prioritization or settlement side, and I think this is what Pai has in mind.  The notion of “neutrality” has broadened significantly over the last decade, from protecting access to sites to protecting the current Internet business model.  You might have argued (I would, for example) that the FCC could have achieved basic neutrality regulation even with the Internet declared an “information service”.  I think there is no question that extending the neutrality notion to prioritization and settlement payments can’t be done except through common-carrier (Title II) regulation.

What I believe Pai wants is for the Internet to be able to experiment with a more varied model for financing than the current bill-and-keep, and a broader model than best-efforts services.  That was something that Genachowski, the Chairman of the FCC prior to Wheeler and still a Democratic choice, was willing to consider—his neutrality order would not have foreclosed paid prioritization or settlement.  Those of you who read my blog regularly know that I’ve favored both paid prioritization and settlement all along.

If you believe, as I do, that we should let the prioritization and settlement issues at least attempt to resolve themselves in the open market before we close off the options, then what’s going to happen in the next neutrality order shouldn’t bother you.  If you think there’s a problem here, then you have to accept something that this industry seems unable to accept—it’s not the FCC’s problem to solve.  This whole, frankly stupid, debate is coming because people want the FCC to do something it is not legally empowered to do, something it’s been told it can’t do without taking the step of Title II.  And while the courts have said that the FCC can declare the Internet to be regulated under Title II, it probably shouldn’t be because it clearly is not a “communications service”, it’s an information service.

Everyone wants to believe, or at least to say, that without neutrality regulations the Internet as we know it will die.  The Internet as we know it has operated without regulations longer, far longer, than with them because none of the orders but the most recent was ever upheld on appeal.  Do we need to talk about Internet regulatory policy?  Surely, but with Congress.  We should have regulations that at least mention what we’re regulating, right?  Only Congress can do that.  What we should not have is debates on what the FCC should do instead of Title II, which has been answered already by the courts.  Nothing.

Network Slicing and its SDN/NFV Impact: A Real Issue or the Raising of Old Ones?

One of the 5G features that gets lots of attention is network slicing.  Most of the press on the topic has been at least positive if not gushing, but operators have some specific questions about the business case, and more about technology issues.  The big question on the technology side is how network slicing might play with SDN and NFV evolution.  It’s often said that it will promote both technologies, but it might actually limit them in the near term.

The basic idea of network slicing in 5G is to create service or even VNO partitions on a network to separate things that need either different resource policies or separation of service and tenant controls.  Each slice would operate (perhaps) like it was a totally independent network, with its own virtual elements (perhaps).  The reason for all the qualification here is that the specific nature of slicing and separation isn’t fully accepted at this point, and even some of the early notions have issues that might result in their being dropped down the line.

Slicing is a form of network virtualization, which is of course a form of virtualization.  All virtualization-based technologies consist of a real-resource layer, a mapping layer, and a set of abstractions that are the virtual equivalents of things in the real world below.  Virtualization is inherently “multi-tenant” or sliced, if it’s done right, so at one level the 5G slicing initiative is valuable because addressing network virtualization in wireless infrastructure in an explicit way would be a step toward getting it right.

The virtualization definition above points out both the benefits and the potential risks with network slicing.  You want to create a set of abstractions that represent what a service, an MVNO, or some other administration might see as their own services or resources, living independently.  You would then want to map the collection of those independent service/resource abstractions to a common real world.  Two fundamental questions emerge.  First, what exactly is in the “resource layer”?  Is it just connectivity or do we have higher-layer features, like IMS and mobility management?  Second, how does the mapping work?  Is it rather static partitioning, policy-based division, or what?

Let’s start with a basic question; how “independent” are the slices?  Does each slice represent a truly independent network, meaning that the slicing creates virtual-wire low-layer technology on which each slice-owner builds their own L2/L3?  Does each slice own its own hosting pools, or do they perhaps share some or all their hosting resources in some mediated way?  These two questions relate to the way that 5G and SDN/NFV would then relate.

Independent slices mean that the connectivity layers (L2 and L3) of the slice networks are created on totally independent elements, and that as far as each other are concerned they are totally separate networks.  This is analogous to networks that are built on separate (real or virtual) Level 1 elements.  You might also build slices on tunnel technology over IP or Ethernet, which is less “independent” since there are higher-layer devices shared among the slices.  Finally, you could build slices by partitioning the service-layer devices themselves.  This is the place where 5G evolution meets SDN and NFV.

If the slices are totally independent, then L2/L3 technology is duplicated for each slice, and each slice is able to manage its devices (real or virtual) in its own way.  Any traffic management or resource allocation is made below, by allocating L1 resources to the slices.  SDN and NFV adoption in this model would be little impacted by 5G; every slice owner could do their own thing in their own way, and any interconnection of slices or access by slices to common facilities like the Internet would be handled the same way as they would have been in an independent network.

If the slices share an underlying device network—IP/MPLS tunnels, Ethernet “Third Network” technology, or even SDN to create virtual wires—then the tenant slices are dependent on these common facilities, and they might compete for resources there.  This could make the behavior of the slices less deterministic and it might mean that management state and even some management/control processes would have to be coupled between the slices and the shared resources.  However, most SDN/NFV and L2/L3 processes at the service layer could still be based on independent real devices or virtual elements, and only minimal SDN/NFV impact would be likely.

If slices are based on partitioning at the service layer (L2/L3) or by a single SDN infrastructure complex, then we are dealing with slices as rather tightly connected tenants rather than as fully independent ones.  Service control within a slice would be a subset of service control overall, which means that isolation of tenants/slices and assurance of slice SLAs is now a service management function exercised not episodically (by allocating L1 resources) but continuously as connectivity and transport needs change.

The resource layer is one place where slices have to somehow converge, but another place is the device set, which in most cases mean (at least to a degree) the access part of the network.  Here, as with the resource layer, what happens will depend on just how we define a “slice”, and we have several options here as well.

The first option is to partition the access network itself.  5G wireline connections, or fixed 5G wireless tail connections off wireline fiber, could be considered hard partitions of access.  Thus, an access slice plus a resource slice equals a network slice, and all these slices are based on independent technology elements.  This is a simple approach, but for it to work, each slice owner would have to provide their own subscriber and mobility management elements.

The opposite model is one where the access infrastructure is not sliced per se, but rather is shared based on subscriber management principles applied in a common subscriber and mobility management framework.  Once a subscriber is “admitted” they’re assigned slice resources.  There is no replication of subscriber or mobility management here.

Where we have a “shared” element of service connectivity or subscriber/mobility management, we obviously have to design the element to support multi-slice/multi-tenant use.  In theory, at least in my view, different network slices are pretty much like different users/services for both SDN and NFV.  That means that service multi-tenancy processes for both technologies would probably serve slicing if they served their primary mission.  I don’t think they do, or at least they don’t do so provably.  We don’t have enough detail on SDN/NFV service lifecycle management to understand how strictly resources and service processes are partitioned.

Where there is no explicit need for sharing of elements—where we have “independent” slices in a true sense—there could still be a need to think about the SDN/NFV impact of 5G slicing.  First, it’s likely that an MVNO or “slice customer” would want to avail themselves of some sort of service structure within the slices, so they didn’t have to capitalize that part of their network any more than they wanted to capitalize the main part.  Too much “independence” defeats the purpose of slicing.  Second, even if they wanted to add technology to their slices, they can hardly haul traffic out of the 5G network owner’s infrastructure to route it between users, then stuff it back.  The virtual network of the slice would probably have to map to the topology of the 5G owner.  That would argue that the slice customer might want to lease resource capacity, and hosting virtual elements on the 5G owner resource pool would be a very logical strategy.

The most likely overall impact of network slicing is on resource multi-tenancy.  The next-most-likely is on “federation” or coordination of multiple service and resource domains to create a large-scale cohesive retail/wholesale offering.  Thus, what we need to be looking at in 5G in terms of SDN and NFV impact isn’t new at all, it’s something that’s been needed all along.  Hopefully, network slicing in 5G will make it harder to ignore.

What Now Gets NFV On Track? Open Source? Standards? Testing?

We are again seeing stories and comments around “what’s wrong with NFV”.  That’s a good thing in that it at least shows awareness that NFV has not met the expectations of most who eagerly supported it four years ago.  It’s a bad thing because most of the suggested ills, and therefore the explicit or implied remedies, are just as wrong.

Before I get into this I want to note something I’ve said before when talking about NFV.  I got involved with the ETSI activity in March of 2013 and I’ve remained active in monitoring (and occasionally commenting on) the work since then.  I have a lot of respect for the people who have been involved with the effort, but I’ve been honest from the first in my disagreement with elements of the process, and therefore with some of the results.  I have to be just as honest with those who read this blog, and so I will be.

The first thing that’s wrong is less with NFV than with expectations.  We cover technology in such a way as to almost guarantee escalation of claims.  If you review the first white papers and attended the early meetings, you see that NFV’s intended scope was never revolutionary, and could never have supported the transformational aspirations of most of its supporters.  NFV was, from the first, focused on network appliances that operated above Level 2/3, meaning that it wasn’t intended to replace traditional switching and routing.  Much of the specialized equipment associated with mobile services, higher-layer services, and content delivery were prime targets.  The reason this targeting is important is that these devices collectively amount to only about 17% of capex overall.  NFV in its original conception could never have been a revolution.

The second thing that’s wrong is NFV’s scope (in no small part because of its appliance focus) didn’t include operations integration.  Nobody should even think about questioning the basic truth that a virtual function set, hosted on cloud infrastructure in data centers, and chained together with service tunnels, is more complicated than an equivalent physical function in a box, yet the E2E diagrams of NFV propose that we manage virtual functions with the same general approach we use for physical ones.  There has been from the first a very explicit dependence of NFV on the operations and management model associated with virtual function lifecycles, but the details were kept out of scope.  Given that “process opex” or operations costs directly related to service fulfillment, already accounts for 50% more cost than capex, and that unbridled issues with virtual function complexity could make things even worse, that decision is very hard to justify, or overcome.

The third issue with NFV is that it was about identifying standards and not setting them.  On the surface this is completely sensible; all we need is more redundant and potentially contradictory standards processes.  The problem it caused with NFV is that identification of standards demands a clear holistic vision of the entire service process, or you have no mechanism with which to make your selection from the overall standards inventory.  What’s a good candidate standard, other than the best one to achieve the overall business goal.  But what, exactly, is that goal?  How do standards get molded into an ecosystem to achieve it?  If you had to write standards, the scope of what you did and the omissions or failures could be fairly obvious.  If you’re only picking things, it’s harder to know whether the process is on track or not.

So what fixes this?  Not “servers capable of replacing switches and routers”, because a broader role for NFV first tends to exacerbate the other issues I pointed out, and because you don’t really need NFV to deploy static multi-tenant network elements like infrastructure components.  You don’t really even need cloud computing.  “Standards” or “interoperability” or “onboarding” are all reasonable requirements, but we’ve had them all along and have somehow failed to exploit them.  What, then?

First you have to decide what “fixing” means.  If you’re happy with the original goals of the papers, the above-the-network missions in virtual CPE and so forth, then you need to envelope NFV in a management/operations mode, which the ETSI ISG declared out of scope.  There’s nothing wrong with the declaration, as long as you recognize that declaring it out of scope doesn’t mean it isn’t critically necessary.  If you do want service and infrastructure revolution, it’s even easier.  Forget NFV except as a technical alternative to physical devices and focus entirely on automating service lifecycle management.  That can’t be done within the scope of the ETSI work—not at this point.

This is where open-source comes in.  In fact, there are two ways that open source could go here.  One is to follow the NFV specifications, in which case it will inherit all of the ills of the current process and perhaps add in some new ones associated with the way that open-source projects work.  The other is to essentially blow a kiss or two at the ETSI specs and proceed to do the right thing regardless of what the specs say.  Both these approaches are represented in the world of NFV today.

The specs as they exist will not describe an NFV that can make a business case.  The specs as they exist today are incomplete in describing how software components could be combined to build NFV-based service lifecycle management, or how NFV software could scale and operate in production networks.  That is my view, true, but I am absolutely certain it is accurate.  This is not to say that the issues couldn’t be resolved, and in many cases resolved easily.  It’s not to say that the ETSI effort was wasted, because the original functional model is valid as far as it goes, and it illustrates what the correct model would be even if it doesn’t describe it explicitly.  What it does say is that these issues have to be resolved, and if open source jumps off into the Great NFV Void and does the work again, they can get it right or they can get it wrong.  If the latter, they can make the same mistakes, or new ones.

The automation of a service lifecycle is a software task, so it has to be done as a software project to be done right.  We did not develop NFV specifications with software projects in mind, and they are not going to be optimal in guiding a project for that reason.  The best channel for progress is open source, because it’s the channel that has the best chance of overcoming the lack of scope and systemic vision that came about (quite accidentally, I think) in the ETSI NFV efforts.  The AT&T ECOMP project, now combined into the ONAP project (with Open-O), offers what I think is the best chance for success because it does have the necessary scope, and also has operator support.

Some people are upset because we have multiple projects that seem to compete.  I’m not, because we need a bit of natural selection here.  If we had full, adequate, systemic specifications for the whole service lifecycle management process we could insist on having a unified and compatible approach.  We don’t have those specs, so we are essentially creating competitive strategies to find the best path forward.  That’s not bad, it’s critically necessary if we’re to go forward at all.

The big problem we have with open-source-dominated NFV isn’t lack of consistency, it’s lack of relevance.  If open-source solves the problems of service lifecycle automation, and if it has the scope to support legacy and cloud, operator and federation, then it will succeed and NFV will succeed along with it.  But NFV was never the solution to service lifecycle automation; it declared most of the issues out of scope.  That means that for NFV, “success” won’t mean dominating transformation, it will simply mean playing its truthfully limited role.

Most network technology will never be function-hosted, but most operator profits will increasingly depend on new higher-layer cloud elements.  Right now, NFV isn’t even needed there.  If I were promoting NFV, and I wanted it to be more dominant, I’d look to the cloud side. There’s plenty of opportunity there for everyone, and the cloud shows us that there’s nothing wrong with open-source, or with multiple parallel projects.  It’s fulfilling the mission that counts, as it should always be.

Is Verizon Behind in the Telco Race?

Verizon certainly raised a ruckus in the industry with their views on consolidation.  The sense of their CEO’s comments was that Verizon was open to a merger that offered them content ownership, and that says a lot about the industry overall.  Here we have a giant telco saying that without content ownership their position is at risk, and there’s some support for that negative view in their most recent quarter.  So why is that, is it true, and what does it mean for us all?

To set the business stage, Verizon had subscriber loss issues in the mobile space—over three hundred thousand according to their quarterly report.   The company lost revenue in wireline, and FiOS video net losses were about 13,000 connections.  While Verizon is seen as having the “best” wireless and FiOS is seen as the best wireline Internet and video, the company faces competitive pressure on all fronts, and it’s increasingly doubtful that buyers will pay for premium service.

The core issue facing Verizon (and other operators) is the explosion in video delivery to mobile devices.  It’s not that this represents a massive shift away from channelized real-time TV viewing, but it does demonstrate a shift in video behavior, two in fact.  First, mobile broadband gives people video access when they have no opportunity to use traditional tethered TV.  Second, viewers are much more into time-shifting than before and if you’re not going to watch what’s on when it’s on, you are open to watching it differently, on a different device.

Mobile video has been a problem for operators because competitive pressure prevents them from usage pricing in a way that would realize much incremental revenue from the shift.  They’re stuck with another reason for revenue per bit to decline, sinking into the realm of dumb, cheap, plumbing.  And, of course, if the road is becoming free, then you have to make money on what’s traveling the road, which is video content.  To make things more complicated, TV advertisers want strong mobile video presence.

With market trends challenging enough, rival AT&T has messed things up further for Verizon, in two ways.  First, it’s been offering video from its DIRECTV property to its mobile customers, without having the viewing count against usage, a move Verizon had to follow.  Second, it’s done much better at moving its services and infrastructure toward that elusive telco goal of “transformation.”  Verizon had two natural advantages over AT&T at the start of this decade, and now they’re far less relevant.  We’ll look at Verizon’s lost edge first, then at the two ways AT&T helped them lose it.

What I call “demand density” was the first of Verizon’s natural assets.  The value of network infrastructure, its ability to return on investment, is proportional to the economic value of the homes and businesses that infrastructure passes.  My own modeling showed decades ago that this was very roughly related to the GDP per square inhabitable mile, which I called “demand density”, and by that measure Verizon had nearly a 7x advantage over AT&T.  That’s why Verizon could jump on FTTH for at least a good-sized chunk of its market, and AT&T had to be satisfied with a hokey IPTV-over-DSL approach.

The second of Verizon’s natural assets was on the business side.  The easiest place to make a business sale of telco services is at the corporate HQ.  Verizon had more corporate HQs than AT&T, a lot more in fact.  Their edge has eroded because of a general shift of industries from the northeast to California and Texas, but they still hold a lead.  The problem is that business services are under incredible pricing pressure, and the winner in the race to the bottom will always win in that sort of situation.

The mobile-broadband focus of the market gave AT&T an opportunity to focus its “video” on a combination of satellite TV and their my-content-isn’t-counted-against-usage approach to mobile video.  Mobile services in general bypass the demand density issue because you’re not stringing wire, and the no-usage-charge video model promotes synergy between their satellite TV approach and mobile services.  You can see from Verizon’s price hikes on FiOS TV (and customers squatting in unexpected numbers in their cheapest offerings) that demand density and FiOS aren’t guaranteeing victory any more.

The transformation win AT&T is now posting is even more troublesome.  Process opex, the operations costs directly attributable to network services, accounts for about 28 cents per revenue dollar today across operators of all types, and if left unchecked that will grow to over 33 cents in five years.  Transformation, in theory, could reduce process opex by fifty percent or more.  With a cost advantage that large, AT&T could kill a non-transformed competitor—like Verizon.  AT&T’s ECOMP is becoming a de facto model for operators globally, but one Verizon can’t easily adopt for competitive reasons, and so Verizon is grappling with the question of how to counter ECOMP.

The AT&T proposal to buy Time Warner is the potential nail in the coffin.  Here’s AT&T already pushing hard against Verizon’s market advantages.  Then Comcast jumps in, first to buy NBCU and then to announce their own MVNO service that, no surprise, will deliver Comcast’s own video to mobile users without usage charges.  Then AT&T wants to buy a content company, giving it even more power in the mobile video space and better margins “above the network” to offset declines in profit per bit.  Given that AT&T could transform itself out of an immediate profit-per-bit problem, this is not good news.

Hence, the Verizon CEO’s comments on selling out to Disney or someone like them.  This is more than “Hey, Mom, every kid on the block is buying a content company!”  Verizon probably knows it would be difficult for it to actually do an acquisition of a good one, and harder to get regulatory approval.  Getting acquired by a content company might be easier, though regulatory approvals might still depend on the more permissive view of regulations held by the current administration.

Regulatory policy may hold the big wild card here.  Recall that the original neutrality order promulgated by the FCC under Genachowski didn’t close the door on settlement or paid prioritization.  Those were added by the Wheeler FCC.  Might the current Chairman, Pai, revert to a more ISP-friendly view and open the door to one or both?  That could open the possibility of Verizon obtaining revenue from OTT vendors, and even to new Internet-based services.

The question here is whether, even if regulatory relief were granted, it would be sufficient to redress the negative long-term issues Verizon faces.  Cost advantages held by competitors who are more aggressive with automated service lifecycle management aren’t wiped out, regulations would just open new areas to compete in.  And if you can charge for priority content delivery and your competitors who own content assets elect not to, how do you respond other than by not charging?

Verizon seems to be doubling down on fiber deployment, based on its Corning deal, and that could suggest either a doubling-down on “quality” service or a hope that their cost base could be improved by shifting capex more to transport.  It may also hope that somehow buying OTTs (Yahoo, recently) will give it a leg up in that space.  Perhaps all of this will help, but I really think that the weak spot for Verizon isn’t content ownership, but service management automation.  They need to outdo competitors like AT&T in that space, and I don’t see any clear signs that they’re on the road to doing that.

IBM: Is It a Problem or a Symptom?

We are confronted now the need to talk about IBM, not for the first time.  The company beat estimates on EPS slightly, with a set of one-time moves that the Street didn’t like much.  They missed yet again on revenue, and their shares took a hit (again) as a result.  Here is a company who has weathered more technology storms than any other, has pulled victories out of every defeat.  Why is it unable to do that this time?  What should be done now?  I don’t want to reprise past blogs on how they got into their current mess, but focus instead on getting out.

The best place from which to launch a recovery is usually a place where you have your greatest strengths.  In IBM’s case, that greatest strength is strategic influence.  Back in 2013, IBM was a runaway winner in strategic influence on enterprise buyers, scoring between double and triple its rival computer vendors.  Since then, IBM’s influence has fallen by half, but so has the influence of its competitors and so IBM still leads the IT pack in terms of its influence on enterprise buyers.  The question is how to play that card effectively.

I hinted at what’s probably the best answer to this a couple of blogs back.  IBM has the direct account influence needed to launch that elusive next productivity wave that could create an IT investment explosion the like of which we haven’t seen for decades.  The problem, I think, is that IBM doesn’t seem to have any better idea of what might drive that wave than anyone else, or perhaps has no ability to communicate what it knows.

Watson, or AI-linked analytics, isn’t the answer—it’s too indirect.  Knowledge may be power eventually, but worker empowerment gets you to the finish line immediately.  Watson holds out to senior management the promise that somehow getting better information will make them successful.  Empowerment of workers makes you successful, period.  IBM actually took some steps in that direction with its deal with Apple, but if there was a new paradigm in the deal I didn’t see it, nor did the enterprise buyers I’ve talked with since.

If there is a company that has all the tools needed to create contextual point-of-activity empowerment of workers through exploiting mobile broadband, IBM is that company.  Given that they also (still) have the influence, I think it’s clear that this should be IBM’s strategic priority.  Watson is useful only in that context.

The next answer in how to play their assets comes from the financial community’s criticisms of IBM.  According to the Street, IBM is in trouble because of mainframe exposure and cloud impact.  The implication is that cloud computing is eating up IBM’s sales of mainframes.  In point of fact, mainframes have been a sweet spot for IBM and mainframe accounts are the places where IBM retains the most strategic influence.  Further, the cloud has had less than a 4% impact on IT spending overall.  Most cloud revenue comes from web startups and web-front-ends to current applications, so it’s money that was never spent in house.  The cloud, in fact, could be an incremental asset to any vendor at this point, because it could be the largest source of new server deployments.  How does IBM get the benefit of that shift?

Acknowledging it might be helpful.  IBM hasn’t articulated a differentiated vision of the cloud.  If we were to totally fulfill cloud potential across all possible markets, we would raise net IT spending by almost 90%.  Just getting on the right path to achieving full cloud transformation of business could, in the next five years, raise spending by over 30%.  You need to have three things to get that money—position, an ecosystemic story, and a product set that realizes the goals.  IBM has two of the three; what it lacks is the story.  Well, gosh, IBM—how much inertia does a story represent?  How long would it have taken the IBM of the past to sing a pretty song about the future?

That brings is to the final point in playing IBM’s cards effectively; marketing.  If you’ve read my past blogs on IBM, you know that I’ve painted their current problem as arising largely from a lack of marketing.  I know from many of my friends and contacts in IBM that there’s support for that view internally.  OK, then, if you needed marketing in the past and didn’t have it, need it in the present and future and still don’t have it, what do you now need?  Answer: marketing.  I would bet that IBM has people who know exactly what to say to enterprise buyers about that next wave of productivity improvement.  They are just not allowed to say it.

Marketing isn’t about telling a simplistic story to get the attention of a reporter or a click on a URL.  It’s about telling a compelling story.  Compelling enough to force reporters to pay attention, and to induce buyers to read about it even if it’s not packaged in 140 pithy characters or a 500-word article.  Does anyone out there seriously believe that a company who found a strategy that could boost their productivity by thirty or forty percent would refuse to learn it because it was too wordy or complicated?

Selling is all about trajectory management, as my research has shown for thirty years or so.  Media mention sells web site visits, web site visits sell sales calls, sales calls sell products—a trajectory.  IBM could surely package a compelling story to start that flow off, but remember that they have account presence in the largest enterprises, who spend the most money.  They can shortstop the trajectory early on, and build on that success to then generate new stories that engage those IBM isn’t directly influencing.

There are almost certainly people in IBM who see all of this, but somehow they don’t seem to get things moving in the right direction.  Is IBM content to focus on cost-cutting, admitting it will never turn revenue around?  Does IBM think that the opportunities and technologies will somehow come together on their own?  Or does IBM think that the Street rewards the current quarter, not the long haul?  It may be that some people think all of these things, and that the combined disorder is enough to stall meaningful progress.

Progress is clearly needed.  IBM got beat up rather badly in the markets after their quarterly earnings were announced.  Certainly you can’t justify taking a short-term view to please the Street when the result isn’t Street-pleasing.  It’s also hard to keep hoping that somehow natural forces will fix all your problems, when that hasn’t happened up to now, for almost three years.  That leaves accepting failure.  The IBM I’ve known would never do that.  Will they now?  I don’t know.

But there’s another option, even worse.  Is IBM simply the leading edge of a negative IT trend?  Have we, as an industry, broken those cycles of productivity-driven IT investment forever, and we’re now doomed to commoditization?  I think there’s a risk that will happen, and the most important lesson we may be learning from IBM’s problems is what happens to anyone in an industry that’s lost its mojo.

New SLAs and New Management Paradigms for the Software-Defined Era

There is no shortage of things we inherit from legacy telecom.  An increasing number of them are millstones around the neck of transformation, and many of those that are drags are related to management and SLA practices.  Those who hanker for the stringent days of TDM SLAs should consider going back in time, but remember that 50 Mbps or so of access bandwidth used to cost almost ten thousand dollars a month.  For those with more price sensitivity than that, it’s better to consider more modern approaches to management and SLAs, particularly if you’re looking ahead to transformation.

All management boils down to running infrastructure to meet commitments.  When infrastructure and services were based on time-division multiplexing or fixed and dedicated capacity, the management processes were focused on finding capacity to allocate and then insuring that your allocation met the specified error-and-availability goals.  Packet networking, which came along in the ‘60s, started the shift to a different model, because packet networks were based on statistical multiplexing of traffic.  Data traffic doesn’t have a 100% duty cycle, so you can fit peaks of one flow into valleys in another.  That can multiply capacity considerably, but it breaks the notion of “allocation” of resources because nobody really gets a guarantee.  There’s no resource exclusivity.

A packet SLA has to reflect this by abandoning what you could call “instantaneous state”, the notion that at every moment there’s a guarantee, in favor of “average conditions”.  Over a period of time (a day, a week, a month), you can expect that effective capacity planning will deliver to a given packet user a dependable level of performance and availability.  At any given moment, it may (and probably will) not.

TDM-style SLAs have to be based on measurement of current conditions because it’s those conditions being guaranteed.  Packet SLAs have to be based on analysis of network conditions and traffic trends versus the capacity plan.  It’s more about analytics than about measurement, strictly speaking, because measurements are simply an input into the analysis aimed at answering the Great Question of Packet SLAs: “Will the service perform, on the average and over the committed period, to the guarantees?”  Remediation isn’t about “fixing the problem” at the SLA level, as much as bringing the trend back into line.

Another management issue that has evolved with packet network maturity is that of “stateful versus stateless core” behavior.  Packet protocols have been offered in both “connectionless” and “connection-oriented” modes.  Connection-oriented packet networks, including frame relay and ATM, offer behavior in SLA terms that’s a bit of an intermediary between TDM and IP, which is connectionless.  When a “connection” is made in a packet network, the connection reserves resources along the path.

The problem is that if a core network element breaks, that breaks all the connections associated with it, resulting in a tear-down backward toward the endpoints and restoration of the connections using a different route.  In the core, there could be tens of thousands of such connections.  Connectionless protocols don’t reserve resources that way, and there’s no stateful behavior in the core.  Arguably, one reason why IP has dominated is that a core fault creates a fairly awesome flood of related management issues and breaks a lot of SLAs, some because nodes are overloaded with connection tear-down and setup.

We’ve had packet SLAs for ages; nobody writes TDM SLAs for IP or Ethernet networks.  Yet we seem stuck on TDM notions like “five nines”, a concept that at the packet service level is hardly relevant because it’s hardly likely to be achieved unless you define what those nines apply to rather generously.  We’ve learned from the Internet to write applications that tolerate variable rates of packet loss and a range of latencies.  It was that tolerance that let operators pass on connection-oriented packet protocols so as to avoid the issues of stateful core networks, issues that could have included a flood-of-problems-induced collapse of parts of the network and a worse and more widespread SLA violation.

We now have to think about managing networks evolving to central management (SDN) of traffic, and hosted device instances that replace physical devices.  There, it’s particularly dangerous to apply the older TDM concepts, because service management almost has to be presented to the service buyer in familiar device-centric terms, and many faults and conditions in evolved networks won’t relate to those devices at all.  We need, at this point, a decision to break the remaining bonds between service management and service SLAs, and the explicit state of specific resources underneath the services.

In the best of all possible worlds, if you want management to be the easiest and service operations costs the lowest, you’d build infrastructure according to a capacity plan, exercise basic admission control to keep things running within the design parameters, and as long as you were meeting your capacity plan goals, nobody would see an SLA violation at all.  That happy situation would be highly desirable in transformed infrastructure, because it’s far easier than trying to link specific services, specific resources, and specific conditions to form an SLA.  As I pointed out yesterday, though, there are issues.

Adaptive IP networks do what their name suggests, which is adapt.  When you have a network that’s centrally traffic managed like SDN, or you have resources in NFV that have to be deployed and scaled on demand, you have a resource commitment to make.  Where real resources are committed—whether they’re SDN routes or NFV hosting points—you have a consumable that’s getting explicitly consumed.  You can’t have multiple activities grabbing the same thing at the same time or things are going to get ugly.  That forces serialization of the requests for this sort of change in infrastructure state, which creates single points of processing that can become bottlenecks in responding to network conditions or customer requests.  These same points can be single points of failure.

In the long run, we need to work out a good strategy for making more of SDN and NFV control processes scalable and resilient.  For now, we can try to narrow the scope of control for a given OpenStack instance or SDN controller, and “federate” them though a modeling structure that can divide up the work to insure things operate at the pace needed.  As SDN and NFV mature, we’re likely to need to rethink how we build controllers and OpenStack instances that are themselves built from components that adhere to cloud principles.

If you’d have tried to sell a two-nines service to business thirty years ago, you’d have tanked.  Today, almost all large companies rely heavily on services with about that level of quality.  We had a packet revolution.  Now we’re proposing a software-centric revolution, and it’s time we recognized that constraining services to the standards of even the recent past (much less falling back to “five nines”) is no more likely to be a good strategy now than it was at the time of the TDM/packet transition.  This time, the incentive to change may well be to improve operations efficiency, and given process opex is approaching costs of 30 cents per revenue dollar, that should be incentive enough.

A Transformed Service Infrastructure from Portal to Resources

Transformation, for the network operators, is a long-standing if somewhat vague goal.  It means, to most, getting beyond the straight-jacket of revenue dependence on connection services and moving higher on the food chain.  Yet, for all the aspirations, the fact is that operators are still looking more at somehow revitalizing connection services than transforming much of anything.  The reasons for this have been debated/discussed for a long time, including here in my blog, so I don’t want to dig further into them.  Instead I want to look at the technology elements that real transformation would require.

I’ve said in the past that there were two primary pieces to transformation technology—a portal system that exposes service status and ordering directly to customers, and a service lifecycle management system that could automate the fulfillment of not only today’s connection services and their successors, but also those elusive higher-layer services that operators in their hearts know they need.  This two-piece model is valid, but perhaps insufficient to guide vendor/product selection.  I want to dig further.

We do have long-standing validation of the basic two-piece approach.  Jorge Cardoso did a visionary project that combined LinkedUSDL, OpenTOSCA, and SugarCRM to produce a transformed service delivery framework.  It had those two pieces—TOSCA orchestration of service lifecycle management and SugarCRM portal technology, bound by LinkedUSDL.  This was a research project, a proof of concept, and it needs a bit of generalizing before it could become a commercial framework capable of supporting transformation.

While there are two major functional elements in the transformative model we’ve been talking about, each of these elements are made up of distinct pieces.  To really address transformation, we have to make all these pieces fit, and make sure that each performs its own mission as independently as possible, to prevent “brittle” or silo implementations.  That’s possible, but not easy.

The front-end of any portal, we know from decades of experience, should be a web-like process, based on RESTful APIs and designed to deliver information to any kind of device—from a supercomputer data center to a smartphone.  This web front-end hosts what we could call the “retail APIs”, meaning the APIs that support customer, partner, and customer-service processes.  To the greatest extent possible, these should be as general as a web server is, because most of the changes we’re going to see in new service applications will focus on this layer.

Behind the web-process piece of the portal is what we could call the cloud-support layer.  You want the front-end to be totally about managing the user interface, so any editing, validation, and rights brokerage should be pulled into something like a cloud process.  I’m calling this “cloud” to mean that the components here should be designed for scaling and replacement, either by being stateless or by using back-end (database or data model) state control.  This is particularly important for portal functions that are inquiries—management status—rather than service orders or updates.  That’s because historically there are more of these status-related requests, because users expect quick responses from such requests, and finally because there’s no long-cycle database updates at the end of the flow to limit how much QoE improvement scalable processes up front can make.

For the entire portal flow, from web-side to cloud-side, it’s more important to have an agile architecture than to have a specific product.  You should be able to adapt any web-based process to be a portal front-end, and you should be wary of selecting a cloud layer that’s too specific in terms of what it does, because the demands of future services are difficult to predict.  It’s also important to remember that the greatest innovations in terms of creating responsive and resilient clouds—microservices and functional (Lambda) computing—are only now rolling out, and most front-end products won’t include them.  Think architecture here!

The “back end” of the portal process is the linkage into the service lifecycle management system, and how this linkage works will depend of course on how the service lifecycle management process has been automated.  My own recommendation has always been that it be based on service modeling and state/event processing, which means that the linkage with the portal will be made by teeing up a service model (either a template for a new service or an instance representing an old one) and generating an event.  This is a useful model even for obtaining current service status; a “service event” could propagate through the service model and record the state/parameters of each element.

If a service model is properly defined (see my Service Lifecycle Management 101 blog series), then any instance of a process can handle it, which means that the structure is scalable as needed to handle the work.  This is important because it does little good to have a front-end portal process that’s elastic and resilient and then hook it to a single-thread provisioning system.  In effect, the very front end of the service lifecycle management system is inherently cloud-ready, which of course is what should be done.

As you dive deeper into service lifecycle management, though, you inevitably end up hitting the border of resource-bound processes.  You can have a thousand different service lifecycle flows in the service layer of the model, where the state and parameters of a given service are always recorded in the data model itself.  Deeper in, you hit the point where resources set their own intrinsic state.  I can allocate something only if it’s not already allocated to the maximum, which means that having parallel processes that maintain their state in a service model are now constrained by “real” state.

The problem of “serialization” of requests to prevent collisions in allocation of inherently stateful resource elements is greatest where specific resources are being allocated.  As an example, you can have a “cloud process” that commands something be deployed on a resource pool, and that process may well be parallel-ready because the pool is sized so as to prevent collision of requests.  But at some point, a general request to “Host My Stuff” will have to be made specific to a server/VM/container, and at that point you have to serialize.

The only good solution to this problem is to divide the administration of pooled resources so that resource-specific processes like loading an app into a VM are distributed among dozens of agents (running, for example, OpenStack) rather than concentrated in a single agent that supports the pool at large.  That means decomposing resource-layer requests to the level of “administrative zone” first, then within the zone to the specific server.

I’ve seen about 20 operator visions on the structure of retail portals that feed service lifecycle management, and I have to honestly say that I’ve not yet seen one that addresses all of these issues.  Most of the operators are saying that they’re only doing early prototypes, and that they’ll evolve to a more sophisticated approach down the line.  Perhaps, but if you base your early prototypes on a model that can’t do the sophisticated stuff, your evolution relies on that model evolving in the right direction.  If it doesn’t, you’re stuck in limited functionality, or you start over.

The thing is, all of this can be done right today.  There are no voids in the inventory of functional capabilities needed to do exactly what I’ve described here.  If you pick the right stuff up front, then your evolution to a full-blown, transformed, system will be seamless.  I argue that it won’t take significantly longer to start off right today, and it darn sure will take a lot longer to turn around a project that can’t be taken to a full level of transformation because the early pieces don’t fit the ultimate mission.

What Will it Take to Drive Tech Transformation for Operators and Enterprises?

We tend to think of transformation as something that network operators, particularly telcos, have to go through.  In point of fact, transformation, meaning technology transformation, is going to happen to everyone, buyers and sellers, operators and enterprises.  That truth leaves two questions—what kind of transformation will happen, and how will the players respond to the challenges and opportunities created.  Most of us realize that tech is far from its “golden age”, that we seem to be focused more on reducing cost than enhancing capabilities.  There’s a reason for that, and a resolution is also possible once we understand what’s behind the unfavorable trend.

Technology is like everything else that acts on society and economies as a force.  It changes things either because it takes on a new mission, or because it gets a lot cheaper and more pervasive.  If we look at information technology from its infancy in the early 1950s to today, we see signs of both forces.  Certainly, IT has gotten cheaper.  When I learned to program, the computer I learned on had 16 kilobytes of RAM, worked at a speed measured in milliseconds, and was the size of about ten filing cabinets.  You can get wearables today with better performance.  The computer I learned on also cost several hundred thousand dollars, and today a good laptop might cost one thousand, and many are half that price.

The plummeting price/performance ratio for computing allowed it to spread.  We used to have data centers in which giant systems lived, tended by operations personnel who fed them records of activity that had been keypunched.  These would be turned into “reports”.  As computing got cheaper and more powerful, we started doing online transaction processing, then distributed computing, personal computers, tablets and smartphones, the Internet.  All of these things can be linked to the exploitation of the lower cost of computing.

If cost were the only factor at play here, we could expect to see IT transforming things in a fairly linear way over time, as reducing cost expanded the things you could do with computing technology, but that’s not the case.  If you examine public data on IT spending, it’s clear that we have waves of IT investment (we’ve had three so far).  These represent the new mission piece of the puzzle.  As IT things got cheaper, we found new ways to use them that boosted productivity for workers, and that justified faster growth in spending—30% to 40% more than the average rates of growth.  When we’d fully capitalized the new paradigm, IT spending fell to perhaps 80% of average levels.

This poses the transformation question for everyone.  What would happen if IT commoditization (price/performance) continued to drop, and nothing came along to generate a new mission?  The value of the IT tool would be static.  Every generation, replacing it would get cheaper and so spending would decline.  We’d end up with low-cost, no-differentiation, products.  Eventually, cost alone wouldn’t be enough to drive new applications.  Just because hammers get cheaper, you don’t drive more nails.  Price will transform IT into a commodity space.  New missions will generate new benefits, new demand for new features, new business cases, and raise spending.  That’s the reality for buyers and sellers, operators and enterprises, even consumers.

I’ve suggested in past blogs that the new mission likely to drive future IT growth would be one of two things—mobile broadband empowerment and “contextual processes” of the Internet of Things (IoT).  I still think that one or both of these will have to anchor the engine of mission growth.  Exploiting them is the pathway out of commoditization, but exploiting them isn’t a matter of inventing some dazzling new tech element.  We already have the hardware and software tools needed to make more than just a good start at both these new missions.  What is it we lack, then?

Call it “vision” or “insight”.  Say that we’re mired in quarter-at-a-time tactical thinking when new missions obviously demand a strategic sense.  Say we’ve lost our capacity to communicate complex things, or lost a way to monetize them in their early phase.  Say any of these things and I think you’d be right.  The last time we had a mission-driven cycle, the application of technology to the new mission was very clear.  Personal computers did stuff that we could immediately see, and they did it in a way that didn’t demand a lot of insight to understand.  The Internet gave us a universal market for consumer data services, and it was clear almost instantly, from the launching of the Web, what could be done with it.

Contrast that with our thinking about mobile broadband or IoT.  How many companies see “mobile empowerment” as giving the user a mobile-readable screen from the same applications that they ran all along?  Is giving someone a different view of the same data going to revolutionize their productivity?  Does a worker with a powerful information appliance in their pocket as they move about doing their job, do it the same way as one chained to a desk?  Why would deskbound applications anticipate the things a mobile worker might want, or do?  And IoT?  We focus only on getting sensors and controllers “on the Internet” and not on what incremental value they’d create once there (or how we’d pay for their deployment, secure them, protect from privacy intrusions, and so forth).

So who fixes this?  Probably, eventually, we’ll blunder into a path that gets us to another wave of mission-driven IT investment.  I’m frankly surprised that hasn’t happened already, given that it’s been fifteen years since we had an IT growth cycle, far longer than we’ve ever waited for one before.  Given that unhappy bit of history, we probably can’t count on spontaneous resolution.

What do we then count on?  Some vendor could jump-start the process.  If we look back at prior tech cycles, we can see that they happened because technology filled a specific niche in business, a niche that vendors themselves saw and exploited.  Talking with both service providers and enterprises has given me some insight into what buyers think would be necessary for this happy niche-filling to happen again.

First, it’s about account presence.  Almost everyone in the buyer space believes that transformational technology solutions couldn’t be propagated except by direct sales contact.  “It’s a matter of trust,” one said.  “If I’m going to put myself on the line for some big shift, I want the vendor who’s promoting it to be there holding my hand.”  The contacts need to be at a high level, too.  Way over three quarters of buyers say that there has to be solid, ongoing CIO-level contact, and just slightly less said that truly transformational technology shifts would require COO and CEO engagement.  All this speaks to a long-standing account presence, which only the largest vendors can hope to have.

The second requirement is a clear solution ecosystem.  Nobody wants to piece together transformation by summing the parts.  A few buyers who have tried, or even done, just that say that if they had to do it again, they’d demand a holistic architecture up front.  It’s not that a vendor would be expected to be the supplier of every piece in the puzzle, just that they be offering it all, and taking responsibility for it.  This means that integrators could be a player in transformation in theory, but buyers also indicated that they were fairly skeptical of pure integrators.  They think that the vendor should have enough product skin in the game to be able to draw profit from their contribution.  Otherwise, buyers think they’re paying all the product retail margins, and integration besides.

The third requirement is an open approach.  Buyers are a bit embarrassed by this requirement; they know that it’s a contradiction to expect a vendor to have everything, sell everything, support everything, and at the same time preserve them from vendor lock-in, but hey, who says you have to be reasonable?  The point is that the approach that transformation takes has to include a framework into which a buyer can integrate the products they already have and such future products as their business needs dictate.  As a practical matter, this means a clear architecture, no proprietary interfaces or flows, and explicit provision for supplemental functionality.

I’m not surprised by what buyers say they want here.  In fact, it’s not really a massive change from what they’ve always wanted.  The challenge of transformation is that this is a big initiative, and the more that’s required the greater the risk and the more stringently buyers hold to their requirements to mitigate those risks.  I think the barrier can be broken, but I think that it’s going to take a vendor faced with incredible profit pressure from current technologies.  We may not be quite at the tipping point in that regard, in which case we can expect a year or two (or three) of doldrum spending till someone sees the elephant.

Google Yet Again Teaches Operators a Lesson

The network of the future might have to evolve from the network of today, but it has to evolve and not take root there.  Google has consistently looked first at the future, in network terms in particular, and only then worry about linking the future back to the present.  Their latest announced concept, Expresso, has been in use for a couple years now, and I hope that operators and vendors alike will learn some lessons from it.

“Early on, we realized that the network we needed to support our services did not exist and could not be bought.”  That’s a Google quote from their blog on Expresso, and it’s absolutely profound.  It’s first and foremost an indictment of the notion that the goal of evolution in networking is creating incremental, safe-feeling, change not facing the future.  I want to sell something, and networking as it exists will constrain my ability to do that.  Therefore, I will change networking.  How many network operators could have and should have done that?

Expresso is an evolution of a concept that Google introduced to allow its data centers to be linked with a highly efficient SDN core while at the same time maintaining the look of an IP network.  SDN allows for very deterministic traffic management and in its pure form it eliminates the adaptive routing that’s been a fixture of the Internet for ages.  Google’s goal was to make its data-center-connect SDN network (B4) and its data center network SDN (Jupiter) work inside an abstraction, something that looked like an IP core network but wasn’t.  Expresso is what does that.

A company like Google connects with a lot of the Internet; they say they peer with ISPs in over 70 metro areas.  At this “peering surface” Google has to look like an IP network because the ISPs themselves don’t implement B4, they implement IP/BGP.  What Expresso does is create a kind of abstract giant virtual BGP router around its entire peering surface, creating a core SDN network and data center structure inside.  All of Google’s services appear to be peered with all its partners at each of the peering points.  Inside, SDN links user traffic to services without a lot of intermediary router hops that generate latency.  This is what makes it possible for Google to offer low-latency services like it’s “Hey, Google” voice assistant.

Expresso creates what’s effectively a two-tier Internet.  The “slow lane” of the Internet is the stuff that’s based on IP end-to-end, and it might take a half-dozen hops to get through the BGP area in which a service is offered and be linked to the server that actually provides it.  With Expresso, once you pass through the peering surface you’re in the SDN “fast lane”.

Expresso also does some “process caching” in effect.  A user can be linked to the service point that offers the best performance without changing the IP address the user sees as the service URL.  Think of it as providing services a “process IP address” that Google then maps to the best place to run that process.

Traffic management is based on the pure-SDN notion of centralization.  There is no adaptive routing, no “convergence time” for the network to adapt to changing conditions.  A central process makes route decisions based on a broad set of utilization, availability, and QoE metrics, and that same process coordinates traffic on every trunk, every route.  The result is a very high level of utilization combined with very deterministic performance, a combination difficult to achieve in a real IP network except where utilization is held to 50% or so.  In effect, Google doubles the capacity of its trunks.

Taken as a whole, Expresso demonstrates how SDN should have been considered, which is also how NFV should have been considered, and how everything should be.  Google was goal-oriented, top-down, in their approach.  That let Google define the way Expresso had to work based on how their B4 data center interconnect (DCI) worked, and how they wanted their services to work.  What they came up with is abstraction, the notion of making a cooperative system of any given technology or set of technologies, look like a single virtual piece of a different technology.  BGP in effect creates an “intent model” of an Internet area, inside which the property that’s visible is the property BGP makes visible.  How the property is fulfilled is nobody’s business but Google’s.

Another interesting aspect of Expresso is that it could be pushed ‘way out toward the network edge.  Google is already metro-peering.  As access networks and operator metro infrastructure changes, it’s easy to see Expresso sitting right inside the access edge grabbing Google traffic and giving it the best possible performance in support of Google’s services.  The further forward Expresso gets, the more useful it is because the more traditional inefficient, higher-latency, IP routing is displaced by SDN.

So suppose that you had an “Expresso Agent” right at the edge itself?  Suppose you could tap off Google traffic and tunnel it right into B4?  One of the lovely properties of an abstraction like the one Expresso creates is that it doesn’t have to be the only face that Google shows to the sun.  You could take the same set of Google networks and features and push them through a custom SDN abstraction, one more aware of services and less location-specific.  Could Google then define not only the core of the future, but the service edge of the future?

Perhaps, but promoting any new edge protocol would be a challenge in an age where all anyone knows about is IP.  The big value Expresso reveals is the notion of the abstract proxy.  You don’t have to proxy IP/BGP, you could proxy any convenient protocol with an Expresso-like approach.  Stick Expresso-like technology in an edge router or an SD-WAN box and you can connect to anything you like inside.  You can transform the way the network works while leaving the services intact.

That’s what operators need to be thinking about, particularly for things like 5G.  Why wouldn’t the goals of 5G be best satisfied with an inside/outside Expresso model?  Here we have something carrying an estimated 25% of all the traffic of the Internet, so it’s surely proved itself.  We should have been paying more attention to Google all along, as I’ve said in prior blogs.  We darn sure should be paying attention now.

Service Lifecycle Management 101: Integrating with Management Processes

One of the questions certain to arise from discussions of service lifecycle management is how VNFs are managed.  The pat answer to this is “similar to the way that the physical network functions (PNFs) that the VNFs replace were managed.”  Actually, it’s not so pat a response, either.  It is very desirable that management practices already in place be altered by NFV or any new technology only when the alternation is a significant net benefit.  Let’s then start with the management of the PNFs.

PNFs, meaning devices or systems of devices, are represented in the service model by a low-level (meaning close-to-resource) intent model.  That model exposes a set of parameters, some of which might relate to an SLA and others simply to the state of the operation the model represents.  The general rule of model hierarchies is that these parameters are populated from data exposed by the stuff within/below.  In the case of our hypothetical PNF-related intent model, the stuff below is the device or device system and the set of management parameters it offers, presumably in a MIB.

Every intent-model element or object in the service model instance has a parameter set, and each can expose an interface through which that parameter set can be viewed or changed.  This mechanism would allow a service management process to alter the behavior of the PNF that was embedded in the object we’re using to represent it.  Presumably the PNF’s own MIB is still accessible as it would normally be, though, and this raises the risk of collision of activities.

One way to prevent PNF management from colliding with service management is to presume that the PNF isn’t “managed” in an active sense by the service management processes.  That would mean that the PNF asserts an SLA and either meets it or has failed.  The PNF management system, running underneath and hidden from service management, does what’s required to keep things working according to the SLA and to restore operation if the PNF does break.

This isn’t a bad approach; you could call it “probabilistic management” because service management doesn’t explicitly restore operation at the level we’re talking about.  Instead, there is a capacity-planned SLA and invisible under-the-model remediation.  For a growing number of services, it’s the most efficient way to assure operation.

If you don’t want to do stuff under the covers, so to speak, then you have to actively mediate the management requests to ensure that you don’t have destructive collisions.  The easiest way to do that is to require that the PNF’s EMS/NMS work not with the actual interfaces but through the same intent-model as the service management system.  That model would then have to serialize management changes as needed to insure stable operation.

The serializing could be done in two ways—directly via the intent model, or at the process level.  Process-level serialization means that the intent model asserts a management API (by referencing its process) and that API is a talker on a bus that the real management process listens to.  All the requests to that API are serialized.  The intent-model-level approach says that management requests are events, and that the management event is generated by whatever is trying to manage.  Events have to be queued in any event because they’re asynchronous, so this is an easier approach.  Event-based management also lets you change how you handle management commands based on the state of the object—you could ignore them if you’re in a state that indicates you’re waiting for something specific.

All of this is fine, providing that we have an EMS/NMS that’s managing the PNF.  When we translate the PNF to a VNF, what happens?  It’s complicated.

A VNF has two layers of management; the management of the function itself (which should look much like managing the PNF from which the VNF was derived) and the management of the virtualization of the function.  There are some questions with the first layer, and a lot with the second.

Arguably it’s inconvenient in any management framework to have differences in management properties depending on the vendor or device itself.  For automated management in any form, the inconvenience turns into risk because it might not be easy to harmonize the automated practices across the spectrum of devices.  Thus, it would certainly aid the cause of service lifecycle management if we had uniform VNF functional management.  That could be accomplished simply by translating all the different PNF MIBs into a single MIB via an “Adapter” design pattern.

For the virtualization side of VNF management we have to think differently, because PNFs aren’t being hosted in clouds and service chaining of functions replaces having them live in a common box.  We cannot expose virtualization parameters and conditions to management systems that don’t know what a host is or why we’re addressing subnets and threading tunnels.

The convenient way to address this all is to think of VNF management as being a set of objects/elements.  The top one is the function part, and the bottom the virtualization part.  It’s my view that the boundary between these (the abstraction) should separate two autonomous management frameworks that are working to a mutual SLA.  So in effect, the function is an intent model and the virtual realization another.  In that second model, we always presume that the management process is working under the covers to sustain the SLA, not exposing its behavior or components to what’s above.  That means that what the NFV ISG calls “MANO” is largely invisible to the higher level of service lifecycle management, just as a YANG model of device control would be invisible—both are inside an intent model.

The whole of the vast, disorderly, often-criticized VNF onboarding process can be viewed as connecting the VNF to this two-level model of a lifecycle element.  You need to define the state/event handling at the top layer, and some mechanism to coordinate the MANO behavior in the virtual part.  You could create a “stub” of those Adapter design patterns in the specialized, VNF-resident, piece of the VNF Manager, to be accessed by a central management process that builds the connection.

You “could” do that, but should you?  I’m concerned that literal adherence to the ETSI model would actually tend to defeat service lifecycle management principles and make software automation and VNF onboarding more difficult.  The only purpose of a “stub” cohabiting with the VNF should be to adapt the management interfaces to a standard structure for the generation of events.  The service model should define the states of the related service elements and how they integrate events with processes.  That way, the service model defines the service, period.  If you have management logic inside a VNF, or if you have a global management process outside the VNF that is shared across VNFs, then you have a traditional transactional structure, one that has a fixed capacity to process things.  That’s kind of anachronistic when one of the goals of NFV is to provide scalable processes that replace non-scalable physical devices.

Functionally, there’s nothing wrong with a model that says that there are a set of boxes inside NFV that connect with abstract interfaces.  Literally, meaning at the software level, that can lead you to implementations that won’t in the long run satisfy market needs.  Automated service lifecycle management is what is needed for NFV to work.  We can get there using proven principles, even proven models, and I’m confident that somebody is going to get it right.  I just wish it would go a bit faster, and exposing the issues is the best way I know to advance progress.