Golden Globes, Nielsen, Net Neutrality, OTT Video…and NGN

We’re not there yet, not in the age of online video where TV networks are dinosaurs and Netflix reigns supreme.  Even though (not surprisingy) Netflix’s CEO thinks that TV as we know it will be extinct in five years or so.  But we are clearly in a period of change, driven by a bunch of factors, and what we’re changing to, video-wise, will obviously do something, at least, to shape the future of SDN and NFV.

We can start our journey of assessment of video’s future with the Golden Globe awards and the impressive showing by Transparent and House of Cards shows that OTT video players (Amazon and Netflix, respectively) are capable of producing content that means something.  So the question is how much it will end up meaning, and what exactly the meaning will be.

TV is ad-sponsored broadcast-linear distribution, and that still makes up the great majority of viewing.  Nielsen’s surveys say that people actually watch more hours of TV now; OTT video is filling in time when they don’t have access to traditional channelized video and giving people something to watch when “nothing’s on”.  However, TV is its own worst enemy for two reasons; the law of averages and the law of diminishing returns.

The law of averages in TV means that you have to appeal to an increasingly “average” audience in order to justify the programming to advertisers.  How many great shows have turned to crap?  It’s not an accident, it’s demographics.  You want a mass market and it has to be made up of the masses.  This means that a good segment of the viewing population is in a stage of successive disenchantment as one show after the other becomes banal pap for the masses.  These people are targets for VoD and OTT.

The law of diminishing returns says that when you add commercial time to shows to increase revenue you drive viewers away, which means you have to add even more time and drive away even more viewers.  ABC’s “Galavant” seemed to me to have about a third of its time dedicated to commercials, making a decent show hard to tolerate.  People disgusted with commercials can turn to VoD and OTT.

Both these points bring out another because both relate to advertising—commercials.  Ad sponsorship’s problem is that there’s only a finite and shrinking pie to slice and an ever-growing number of people slicing it.  For every Facebook gain there’s a Google or TV network loss.  One thing that Netflix and Amazon prime have going for it is simple—retail.  You sell stuff and people pay you.  How much depends on your perceived value to them, and it is already clear that there’s a lot more money in selling service than in selling ads.

So more OTT video is in the cards, which means more traffic that ISPs don’t make money on.  In fact, many of those ISPs (all the cable companies and many telcos) are going to lose money if broadcast video dies off.  And to make matters worse, political pressure is converting regulators to the view that settlement for carriage of video on the Internet is non-neutral because voters don’t like it.  Never mind whether the ISPs can be profitable enough to build and sustain infrastructure.

The most likely result of this combination is the increased investment in off-Internet services.  In the US, we already have an exemption from regulation for CDNs and IPTV.  In Germany, there’s a suggestion that there be two “Internets”, one neutral enough to satisfy advocates and the other where commercial reality guides settlement.  I leave it to you to guess which would survive, and Cisco is already opposing the idea because it violates the Cisco principle that operators have a divine mandate to buy (Cisco) routers to carry all presented traffic, profits be hung.

Profit-driven IP, as opposed to the Internet, would be a totally metro affair, with access connected to cloud-hosted services.  Given the strong cloud affinity, it’s very likely that we’d build this Profit Internet with SDN and NFV (another reason for Cisco not to like the idea!) and virtual technology in general.  We’d use some electrical grooming to separate it from the Internet, and gradually “OTT video” would move to it, complete with settlement for carriage and QoS and all the things regulators say can’t be part of the Internet scene.

Why SDN and NFV?  First, you have to keep the regular Internet and the Profit Internet separate or regulators will balk.  Second, you have to constrain costs because you can’t overcharge content providers or you kill off the content.  Third, full connectivity isn’t even useful in the Profit Internet because everyone is trying to get to a video cache or the cloud and not each other.  You don’t need most of routing here.

Advertising can still have a place here, particularly on the “Internet” where there’s no settlement or QoS.  Best efforts is fine for free stuff.  But if the two principle TV faults are insurmountable in the context of broadcast TV like Netflix thinks, then broadcast itself is at risk.  You’d have “VoD commercials” like you have in on-demand TV services from cable or telco providers today.

NGN was going to consume SDN and NFV no matter what, and we were going to end up with some virtualized-L2/L3 model of NGN even if we don’t succeed with SDN and NFV.  Metro was always going to be where most money was invested too.  So it’s fair to say that shifts in OTT video won’t drive all these changes, but it could accelerate them.  In particular, a decision by ISPs to partition profitable services outside the Internet (or create two Internets as has been proposed in Germany) would funnel more money to NGN faster, which could not only accelerate SDN and NFV but put the focus more on service agility.

But remember, the cheapest way to get a new show in front of 60 million viewers is still linear RF.  The thing is, we don’t get an audience of 60 million viewers if we’ve trained people to demand just exactly what they want and not what’s the best thing actually on at a given point in time.  VoD and DVR are conditioning audiences to be consumers of what they want when they want it, a mission linear RF broadcasting will never be able to support.

I watched a lot of TV as a kid.  I still watch a lot of TV.  When I was a kid, I had two networks to choose from, and I picked the best of an often-bad lot.  Today I have fifty networks or more to choose from, but the “lot” is still getting bad.  The very success Nielsen points out is working against the TV industry because the more I view, the more shows move into the “I’ve seen that” category.  The ones I wanted to see are being averaged out into drivel.  That drives me to other viewing resources.

It’s not all beer and roses for the OTT video people, though.  At the root of the video food chain is the producer of the video, who has to buy rights, hire cast and crew, and produce the stuff.  Amazon and Netflix proved they can produce content, but not that they can produce enough.  The big question on the video revolution is where we can get non-revolting video, and we don’t have an answer to that yet.  OTT can still fall on its face, and if it does a lot of my childhood shows may become new again.

A New Policy Managed Model for SDN (and NFV?)

One of the challenges that packet networks faced from the first is the question of “services”.  Unlike TDM which dedicates resources to services, packet networks multiplex traffic and thus rely more on what could be called “statistical guarantees” or “grade of service” than on specific SLAs.  Today, it’s fair to say that there are two different service management strategies in play, one that is based on the older and more explicit SLAs and one based on the packet norm of “statisticalism”.  The latter group has focused on policy management as a specific mechanism.

One reason this management debate has emerged recently is SDN (NFV has some impact on this too, which I’ll get to in a minute).  From the first it’s been clear that SDN promises to displace traditional switching and routing in its “purist” white-box-and-OpenFlow form.  It’s also been clear that software could define networks in a number of lower-touch ways, and that if you considered a network as my classic black box controlled from the outside, you’d be unable to tell how the actual networking was implemented—pure SDN or “adapted Ethernet/IP”.  Cisco’s whole SDN strategy has been based on providing API-level control without transitioning away from the usual switches and routers.

Policy management is a way to achieve that.  We’ve had that for years, in the form of the Policy Control Point and Policy Enforcement Point (PCP/PEP) combination.  The notion is that you establish service policies that the PCP then parses to direct network behavior to conform to service goals.  Cisco, not surprisingly, has jumped on policy management as a kind of intermediary between the best-efforts service native to packet networks and the explicit QoS and traffic management that SDN/OpenFlow promises.  Their OpFlex open policy protocol is a centerpiece in their approach.  It offers what Cisco likes in SDN, which is “declarative control”.  That means that you tell the network what you want to happen and the network takes care of it.

How does this really fit in the modern world?  Well, it depends.

First, policy management isn’t necessarily a long way from purist SDN/OpenFlow.  With OpenFlow you have a controller element that manages forwarding.  While it’s not common to express the service goals the controller recognizes as formal policies, you could certainly see a controller as a PCP or as a PEP depending on your perspective.  It “enforces” policies by translating service goals to forwarding instructions, but it also “controls” policies by providing a forwarding changes in forwarding down to physical devices that shuffle the bits.  If applications talked to services via APIs that set goals, you could map those APIs to either approach.

The obvious difference in the purist model versus the policy model as we usually see it is that the purist model presumes we have explicit control of devices where the policy model says that we have coercive control.  We can make the network bend to our will in any number of ways, including just exercising admission control to keep utilization at levels needed to sustain SLAs.  That’s our second point of difference, and it leads to something that could be significant.

With explicit control, we have a direct link between resources and services.  Even though the control process may not be aware of individual services, it is aware of individual resources because it has to direct forwarding by nodes over trunks.  With coercive control, we know we’ve asked for some behavior or another, but how our desired behavior was obtained is normally opaque.  That’s a virtue in that it creates a nice black-box abstraction that can simplify service fulfillment, but it’s a vice in a management sense because it isolates management from services.

In an ordinary policy-managed process you have network services and you have offered services, with a policy controller making a translation between the two.  Your actual “network management” manages network services so your management tendrils extend “horizontally” out of the network to operations processes.  Your offered services are consumers of network services, but it’s often not possible to know whether an offered service is working or not, or if a network service breaks whether that’s broken some or all of the offered services.

What separates network and offered services is a mapping to specific topology that can relate one to the other.  One possible solution to the problem is to provide topology maps and use them not only to make decisions on management and create management visibility but also to facilitate control.  A recent initiative by Huawei (primary) and Juniper called SUPA (Shared Unified Policy Automation) is an interesting way of providing this topology coordination.

SUPA works by having three graphs (Yang models).  The lowest-level one models the actual network at the protocol level.  The highest one graphs the service abstractly as a connectivity relationship, and the middle one is a VPN/VLAN graph that relates network services to the physical topology.  The beauty of this is that you could envision something like SUPA mapping to legacy elements like VPN and VLAN but also to purist OpenFlow/SDN elements as well.  You could also, in theory, presume that by creating a new middle-level model to augment the current ones, extend SUPA to support new services that have forwarding and other behaviors very different from those we have in Ethernet and IP networks.

Obviously, management coordination between services and networks demands that somebody understand the association.  In SUPA, the high-level controller binds an offered service to a topology and that binding exposes the management-level detail in that it exposes the graph of the elements.  If the underlying structure changes because something had to be rerouted, the change in the graph is promulgated upward.  The graph then provides what’s needed to associate management state on the elements of the service with the service itself.

This is an interesting approach, and it’s somewhat related to my own proposed structure for an “SDN”.  Recall that I had a high-level service model, a low level topology model, and a place where the two combined.  It can also be related to an NFV management option because you could say that the middle-level graph was “provisioned” and formed the binding between service and resources that you need to have in order to relate network conditions to service conditions.

I’m not saying that this is the final answer; SUPA is still in its early stages.  It is a hopeful sign that we’re looking for a more abstract and thus more generalizable solution to the management problem of SDN.  I’d like to see less specificity on the middle-layer graphs—a network service should be any realizable relationship between endpoints and not just IP VPNs, VLANs, or tunnels.  I’d like to see the notion of hierarchy be explicit—a low-level element (an Application Based Policy Decision or ABPD) should be decomposable into its own “tree” of Network Service Agent and ABPD and each NSA should offer one or more abstract services.  I’d also like to see an explicit way of defining management of a service through a hierarchy of related graphs.

We’re not there yet, but I think all this could be done, and so I think SUPA is worth watching.

What Could the Net Neutrality Proposals do to SDN and NFV?

With all the ink net neutrality is getting, I feel like I just have to say something about it.  Regulatory policy, after all, is perhaps the largest single factor in setting business policies for network operators and so one of the largest factors in setting capex policies too.  Since I’ve blogged before on the broad question of what might and should happen, there’s no point revisiting those points.  Instead, let’s take this first Friday of 2015 to ask what the impact of the “current proposal” on neutrality (by FCC chairman Wheeler) might be on SDN and NFV.

The first impact area is the broadest.  The operators are of the view that classifying ISPs as common carriers, even if the FCC uses Section 706 “forbearance” to shield infrastructure from wholesale requirements and other provisions of the Telecom Act, is eventually going to end up with just that—sharing infrastructure.  There is little doubt that the immediate impact would be to focus operator planners on how to invest without exposing their assets to the sharing risk.  It’s probable that this will have a negative impact on capex in 2015 but little impact on SDN/NFV because the final state of public policy won’t be known until at the earliest the end of the year.

Title II constraints will apply to only Internet services, and not to IP overall.  The second impact area is that operators will be looking at what might be in the second category and not the first, and to move stuff between the two to shield investments.  Here we find a possible impact on SDN and NFV.

It’s difficult to say whether regulators would accept an argument that Service A and Service B were both delivered on IP packets across the same access infrastructure, mingled together and indistinguishable in a technical sense, but were virtually/logically separated.  That would suggest that IP-level separation of services could be a risk to operators, which might then encourage them to adopt some form of “deep grooming” of infrastructure.  We know that you can separate Internet flows from video or voice flows at a level below that of IP and claim service separation with some confidence.  SDN might be a mechanism for doing that.

Under the FCC’s prior rules, Internet access was subject to neutrality but content delivery, CDNs, and (implicitly) cloud computing were exempt.  I think it likely that Wheeler’s intentions would be to sustain this immunity because this stuff has no place being considered a common carrier service.  That would mean that operators would be induced to consider NFV enhancements to services part of a cloud offering rather than part of a carrier service offering.  Service chaining and Carrier Ethernet are a good example.  Title II fears would suggest that operators not augment Carrier Ethernet with hosed elements, but rather offer cloud-hosting of some high-level features and connect the Carrier Ethernet service through them.

This promotes what could be called a “common-carrier-service-is-access-only” view.  Your goal as an operator is to terminate the access line in a cloud element and offer managed services through a Title II access connection.  The Internet, Carrier Ethernet, and all business services could be subsumed into a cloud-service and hidden from Title II.  The shift in policy could present challenges for NFV for two reasons.  First, it could undermine the current activity by forcing it toward a separate subsidiary (required to avoid Title II).  That would be a delay, perhaps into 2016.  Second, it could complicate the way that NFV MANO works by requiring that all services other than best-efforts Internet (which isn’t an NFV target) have integrated regulated access and unregulated hosted elements, offered by different business units.  Every deployment might then be a “federation” issue and all management might have to cross business boundaries.  Neither of these are supported in current standards or work.

The third point is that once you decide on a Title II avoidance strategy, you end up with a strong reason to virtualize everything at Level 2 and 3.  Mixing regulated and unregulated services on a common router begs for an appeal to a US District Court by a hopeful CLEC-type reseller of services.  Why not have virtual switches and routers deployed separately by both regulated and unregulated entities?  Or why not have the unregulated entity deploy all the stuff besides the access?  This shift would make current operations groups into access players and shift more and more capex to an unregulated entity that might be a set of new decision-makers adhering to new infrastructure policies.

The regulated entity (or the about-to-be-regulated one) has also paid for all the legacy stuff.  How would operators reinvest in current infrastructure without sinking more money into the Title II pit?  The question of how current infrastructure is handled if the FCC (or Congress) applies Title II has to be a factor.  Logically the assets should be redistributed, but if everything is kept in the now-Title II entity then nobody is going to add to those assets.  Uncertainty about this is obviously going to curtail spending plans while things are resolved.

If all this comes about, then the majority of incremental capex would go inside the cloud, for data centers to host cloud computing and NFV, and to fiber trunks and agile optics to create interior lines of communication.  We’d see mobile services built increasingly through the use of personal agents that disintermediated access and devices from application functionality.  We’d see IoT applications evolving to look more like big data.

Interestingly, none of this would necessarily exclude the OTTs and others from the new cloud-driven market.  Their trap is more subtle.  For the decade of the ‘90s even into the middle of the next decade (when the FCC finally wrote a CLEC order that made sense and set ISPs on the road to non-regulated operation) there were no barriers to actually investing in the next generation except the desire for a free lunch.  OTTs will face that risk again, because there will be an enormous temptation to wait to see if they can ride a Title II-created gravy train.  All that will do is to force investment even more inward, inside the cloud where regulators won’t go.

For SDN and NFV the net of all of this is short-term delay and longer-term success.  For the vendors in both spaces the decision to move to Title II and in particular to bar settlement and paid prioritization will hurt the network players and help the IT people.  It’s not that there will be a windfall of SDN/NFV spending (I’ve already noted the likelihood of a short-term deferral of major projects) but that IT players have little exposure to the market now and won’t be hurt by the hiccup that will certainly hurt the network giants.  Ironically, paid prioritization and settlement—which Internet supporters generally dislike—might be the thing that would save the current business model.  Clearly that isn’t going to happen now, so hold onto your hats and watch while the country gets another lesson in why regulations matter.

NFV’s “Performance Problem” Isn’t NFV’s Problem

PR often follows a very predictable cycle.  When you launch a new technology, the novelty (which is what “news” means) drives a wave of interest, hype, and exaggeration.  Whatever this new thing is, it becomes the singlehanded savior of western culture, perhaps life as we know it.  Eventually all the positive story lines run out, and you start to get the opposite.  No, it’s not going to save western culture, it’s actually a bastion of international communism or something.  You’re hot, or you’re not, and you can see a bit of that with NFV and performance concerns.  These concerns are valid, but not necessarily in the way we’re saying they are.

We could take a nice multi-core server with a multi-tasking OS and load it with all of the applications that a Fortune 500 company runs.  They’d run very badly.  We could take the same server, convert it into virtual machines, and then run the same mix.  It would run worse.  We could then turn it into a cloud server and get even worse performance.  The point here is that all forms of virtualization are means of dealing with under-utilization.  They don’t create CPU resources or I/O bandwidth, and in fact the process of subdividing resources takes resources, so adding layers of software to do that will reduce what’s available for the applications themselves.

The point here is that NFV can exploit virtualization to the extent that the virtual functions we’re assigning require single-tenant software components that don’t fully utilize a bare-metal server.  Where either of these two things isn’t true, NFV’s core concept of hosting on VMs (or in containers, or whatever) isn’t going to hold water.

An application component that serves a single user and consumes a whole server has to recover that server’s cost (capex and opex) in pricing the service it supports.  A multi-tenant application spreads its cost across all the tenant users/services, and so has less to be concerned about, efficiency-wise.  Thus, something like IMS which is inherently multi-tenant can’t be expected to gain a lot by sticking it onto a VM.  We’re not going to give every cellular customer their own IMS VM, after all, and it’s hard to see how an IMS application couldn’t consume a single server easily.

No matter how you overload a server, you’ll degrade its performance.  In many cases, the stuff we’re talking about as NFV applications won’t wash if we see transparent virtualization-based multi-tenancy as the justification.  They’re already multi-tenant, and we would expect to size their servers according to the traffic load when they run on conventional platforms.  The same is true with NFV; we can’t create a set of VMs whose applications collectively consume more resources than the host offers.

What we do have to be concerned about are cases where virtualization efficiency is inhibited not by the actual application resource requirements but by resources lost to the virtualization process itself.  Early on in my CloudNFV activity, I recruited 6WIND to deal with data-plane performance on virtualized applications, which their software handled very effectively.  But even data plane acceleration isn’t going to make every application suitable for virtual-machine hosting on NFV.  We are going to need some bare metal servers for applications that demand a lot of resources.

Our real problem here is that we’re not thinking.  Virtualization, cloud computing, even multi-tasking, are all ways of dealing with inefficient use of hardware.  We seem to believe that moving everything to the cloud would be justified by hardware efficiencies, and yet the majority of mission-critical applications run today are not inefficient in resource usage.  That’s true with the cloud and it will be true with NFV.  Virtualization is the cure for low utilization.

So what does this mean?  That NFV is nonsense?  No, what it means is that (as usual) we’re trapped in oversimplification of a value proposition.  We are moving to services that are made up as much or more (value-speaking) of hosted components as of transport/connection components.  You need to host “hosted components” on something and so you need to manage efficiency of resource usage.  Where we’re missing a point is that managing efficiency means dealing with all the levels of inefficiency from “none” to “a lot”.  In totally inefficient situations we’re going to want lightweight options like Docker that impose less overhead to manage large numbers of components per server.  In totally efficient application scenarios, we want bare metal.  And in between, we want all possible flexibility.

NFV’s value has to come not from simply shoehorning more apps onto a server (which it can’t do, it can only support what the underlying virtualization technology can support).  It has to come from managing the deployment of service components, including both connection and hosted applications or content, that make up the experiences people will pay for.  MANO should have been seen not as a mechanism to achieve hosting, but as the goal of the whole process.  We could justify MANO, done right, even if we didn’t gain anything from virtualization at all.

IMS and EPC are applications that, as I’ve said, gain little or nothing from the efficiency-management mechanisms of virtualization.  They could gain a lot from the elasticity benefits of componentization and horizontal scaling.  Virtual routing is easiest to apply on a dedicated server; it’s hard to see why we’d want to virtualize a router to share a server with another virtualized router unless the router was handling a relatively low traffic level.  But again, elastic router positioning and capacity could be valuable in the network of the future.

It’s unfair to suggest that NFV has resource issues; whatever issues it has were inherited from virtualization and the cloud, whose resource issues we’re not talking about.  Even data plane processing is not uniquely an NFV issue.  Any transactional application, any content server, has to worry about the data plane.  Even web servers, which is why at some point you stop sharing the hosting of websites on a single server and go to a dedicated server for a site, even several per site.  But it is fair to say that the “resource problem” discussion that’s arising is a demonstration of the fact that the simplistic view of how NFV will work and save money has feet of clay.  We can justify NFV easily if we focus on how it actually can benefit us.  If we invent easy-to-explain benefits, we invent things that won’t stand up to close examination.

There are two important things here.  First, componentized applications and features are more agile in addressing both new market opportunities and problems with infrastructure.  Second, componentization creates an exponential increase in complexity that if left untreated will destroy any possible benefit case.  NFV is the operationalization of componentized service features.  Ultimately it will morph into the operationalization of the cloud, joining up with things like DevOps and application lifecycle management (ALM) to form a new application architecture.  And that architecture, like every one before it, will put software to hardware mindful of the fact that capacity is what it is, and you have to manage it thoughtfully whether you virtualize it or not.

Ten Truths for the Future of SDN/NFV

One of the ongoing themes in both SDN and NFV is that operators need these technologies to compete with the OTT players.  We also hear that operators need a cultural transformation to do that, or that they need to form subsidiaries or buy somebody.  We could almost claim to have a cottage industry here, people sitting around explaining what operators have to do to “compete with the OTT players”.  It’s not that simple.  Let’s lay out some basic truths.

Truth one is that most of the OTTs are in a less-than-zero-sum industry.  Ad spending globally is declining, and at some point what can be made available to earn OTT profits would have to come out of what’s available to produce content or do something else essential.  Operators are at least in a market where there’s a willingness to pay for services.  A good Tier One still earns almost as much revenue as the OTT industry does.

Truth two is that operators don’t need new technology to compete with OTTs in a direct sense.  Google and Netflix didn’t get to where they are through SDN or NFV.  “Over the top” is what it says it is—on the network and not in it.  Hosting content or providing ad-sponsored services doesn’t mean inventing new standards, particularly at the network level.

So what we have here, in the net, is that most of the presumptions of what operators should do are based on a couple false premises.  Is staying the course then the right approach?  No, and here are some further truths to prove it.

Truth three is that there is absolutely no way to sustain a business in transport/connection over the long term without public utility protection.  Bits have been getting cheaper and cheaper in a marginal cost sense, and while that trend is certain to slow, it’s not going to slow enough to avoid constricting capital budgets.  The network pricing model of today is eating its own future, and something has to turn that around.  Revenues have to go up, costs have to go down, or both.

Which brings us to truth four; the network must always be profitable if operators are going to continue to invest in it.  That means that even if services “over the top” of the network come along to provide relief, they can’t just subsidize an unprofitable network foundation.  The operators, in that situation, would be competing against players who didn’t have to do that subsidizing and they’d never be able to match pricing.

Add these points up, and we can at least see what SDN and NFV have to do for us.  It is critical that the cost of transport/connection networking be managed better than we’re managing it now.  A couple decades ago, network TCO was a quarter opex and three-quarters capex; now that’s heading toward being reversed.  So more truths.

Truth five is that neither SDN nor NFV are explicitly aimed at solving the problem of operations cost bloat.  We have not proved at this point that SDN has a significant impact on opex because we’ve not proved we know how to apply it across the full scope of current services.  The NFV people have effectively declared opex out of scope by limiting their focus to the hosting of service elements that are above transport/connection.

As to where, then, opex could be managed, we come to truth number six; we can’t make revolutionary changes to opex without revolutionizing something at the technology level.  Our whole notion of network operations, from EMSs up to OSS/BSS, has to be framed in a modern context because we’ve seen opex bloat as a result of a growing disconnect between technology and business realities within the network and the support systems we’ve used.

Truth seven is that no transformation of OSS/BSS is going to work here, nor is any transformation in NMS or EMS or SMS or any other xMS.  What we need to do is to define a management model that’s attuned to the evolution of both services and infrastructure.  That model has to arise almost completely unfettered by concerns about how we’re going to get there.  We have to embrace utopia, and then worry about achieving it as efficiently as possible.  Otherwise we’ll take a series of expensive side-trips and end up in a couple years having spent more to achieve next to nothing.

How does this help the operators, though?  More truths.

Truth eight is that there is a natural base of service opportunity awaiting exploitation, and it’s almost totally owned by the operators.  They have the cell sites that support both mobile users and any realistic IoT models.  They have knowledge of the movement of users, individually and en masse.  They have, if they design their utopian operations mechanisms correctly, the best possible cost base and could produce the best cost/performance/availability combination.  I did a presentation on this issue years ago, and the theme was simple.  The network knows.  Operators could leverage what the network knows, through operations practices honed to do the near-impossible task of keeping transport/connection services profitable.  Do that and they cut their costs low enough that VCs will move on to something else and stop funding OTTs.

And truth nine is that all these future services will have to move toward fee-for-service and away from ad sponsorship to dodge that less-than-zero-sum-game problem.  Operators are already there, in a position where their customers expect to pay for something.  Would it be easier for operators to charge for services, or Google?  If operators can take the simple step of finding stuff people will pay for, they can win.

Which brings us to the final truth.  Services are an app.  Experiences are an app.  The cloud is an app.  Everything that we expect users to consume, however it’s paid for, is an app.  The notion of an app is the notion of a simple storefront hiding a massive factory and warehouse process behind.  It’s critical that we frame our future in app terms because people consume them.  It’s critical that we make the app substantive by exploiting what the network knows, that we link it to point-of-activity services like social and location-aware contextual responses to questions, because those are things that operators can do but that OTTs can do too.  Operators have to gain some leadership there.  But all this has to be done while making the network profitable.

SDN and NFV won’t change the world unless we change what we expect them to support.  It’s not what they do, but what they enable that matters, and we have to get more direct links between SDN and NFV promises and the whole set of truths I’ve identified here if we want to move either or both technologies forward.

And in doing that, it’s not technology that’s the key, it’s operations.  The future will be more complicated than the present.  That’s always meant more expensive, and that cannot happen or we’ll never get to that future at all.

How to Solve Two Problems–With One Model

There are, as you’re all aware at this point, a lot of open questions in the SDN and NFV world.  Recently I covered one of them, the operations impact of both technologies.  While that’s undoubtedly the most significant open issue, it’s not the only one.  Another is the broad question of how networks and network operators are bound together.  Protocols have been layered for longer than most network engineers have been alive, and operators have interconnected their networks for even longer.  We have to be able to understand how protocol layers and internetworking work in any successor network architecture because we’re not going to eliminate either one.

Layered protocols presume that a given layer (Level 2 or 3 in a practical sense) consumes the services of the layer below, and those services abstract the protocol stack at all lower layers.  If we look at how this process works today, we find an important point.  A protocol layer either does what its service says it will do, or it reports a failure.  Implicitly that means that a protocol layer will attempt to recover from a problem in its own domain and will report a failure when it cannot.

In the good old days of data link protocols like HDLC, this meant that if a data link problem occurred the link protocol would ask for a retransmission and keep that up until it either got a successful packet transfer or hit a limit on the number of attempts.  For SDN at any given layer we’d have to assume that this practice was followed, meaning that it’s the responsibility of the lower layers to do their own thing and report a failure upward only when it can’t be fixed.  That’s logical because we typically aggregate traffic as we go downward, and we’d not want to have a bunch of high-level recovery processes trying to fix a problem that really exists below.

This could conflict with some highly integrated models of SDN control, though.  If an SDN controller is managing a whole stack of layers, then the question is whether it recognizes the natural layer relationships that we’ve always built protocols on.  Let’s look at an example.  Suppose we have a three-layer SDN stack, optical, L2, and L3.  We have an optical failure, which obviously faults all of the L2 paths over it, and all the L3 paths over those L2s.  What should happen is that the optical layer recovers if possible, so if there’s a spare lambda or whatever that can be pressed into service, we’d like to see the traffic routed over it at the optical level.  Since the low-level path that the higher layers expect has been fixed there are no faults above (assuming reasonable responsiveness).

But will the controller, which has built all the forwarding rules at all the levels, sustain all the presumptions of the old OSI stack?  If not, then it’s possible that the optical fault would be actioned by other layers, even multiple layers.  That’s not necessarily a crisis, but it’s harder to come up with failure modes for networks if you presume there’s no formal layered structure.  Where SDN controllers reach across layers, we should require that the layer-to-layer relationships be modeled as before.  Otherwise we have to rethink a lot of stuff in fault handling that we’ve taken for granted since the mid-70s.

If layers are the big issue for SDN, then the big issue for NFV is those relationships between providers.  Telecom evolved within national boundaries, and even today there are no operators who could say that they could deliver their own connections over their own infrastructure anywhere in the world.  We routinely interconnect among operators to deliver networking on a large geographic scale, and when we add in cloud computing and NFV feature hosting, we add the complication of perhaps sharing resources beyond transport/connection.

So suppose we have a simple service, a VPN that includes five sites in the US and five in Europe.  Even presuming that every site on each continent can be connected by a single operator, we have a minimum of two operators that have to be interconnected.  We also have to ask the question whether the “VPN” part of the service is provided by one operator with sites from the other backhauled to the VPN, or whether we have two interconnected VPNs.  All of this would have to be accommodated in any automated service provisioning.

Now we add in a firewall and NAT element to all the 10 sites.  Do we host these proximate to the access points, in which case we have two different NFV hosting frameworks?  Do we host them inside the “VPN operator” if we’ve decided there’s only one VPN and two access/backhaul providers?  Does a provider who offers the hosting also offer the VNFs for the two NFV-supported elements, or does one provider “sell” the VNFs into hosting resources provided by the other?  All of this complicates the question of deployment, and if this sort of cross-provider relationship is routine, then it’s hard to claim we understand NFV requirements if we don’t address it (which, at present, we do not).

But this isn’t the only issue.  What happens now if there’s a problem with the “VPN service?”  The operator who sold the service doesn’t own all of the assets used to fulfill it.  How does that operator respond to a problem when they can’t see the assets?  But would another operator provide visibility into their network or cloud to see them?

There is one common element here, which is that the service of a network has to be viewed as a black-box product exported to its user, and that product must include a SLA and a means of reporting against it.  The first point, in my view, argues against a lot of vertical integration of an SDN protocol stack and the latter says that the user of a service manages the SLA.  The owner of the resources manage the resources, what’s inside the black box.

Making this work gets back to the notion of formal abstractions.  A “service” has to be composed by manipulating models that represent service/feature abstractions.  Each model has to define how it’s created (deployment) and also how it’s managed.  This approach is explicit in TOSCA, for example, which is why I like it as a starting point for management/orchestration, but you can do the same thing in virtually any descriptive model, including ordinary XML.  If we take this approach, then layers of protocol can be organized as successive (vertically aligned) black boxes and inter-provider connections represented simply as horizontal structures.  The “function” of access or the “function” of transport is independent of its realization at the consumer level, so we don’t have to care about who produces it.

I think we’ve missed this notion in both SDN and NFV because we’ve started at the bottom, and that sort of thing encourages too much vertical integration because we’re trying to build upward from details to principles.  While the specific problems SDN and NFV face because of this issue differ, the solution would be the same—it’s all in the modeling.

Is There Substance in the “Fog?”

Cloud computing is probably the most important concept of our time, but also likely the most misunderstood.  It will never do what proponents say it will—displace private IT.  In fact, it’s not likely it will displace more than about a quarter of today’s IT spending.  However, it will generate new spending and in the end will end up defining how all of IT is used, and done.

The biggest problem we have with the cloud is that we think it’s an IT strategy and it’s in fact an application architecture.  Cloud computing is based on the fact that as you lower the price of network connectivity you eliminate the barriers to distribution of intelligence.  Componentization of software, already a reality for reasons of development efficiency, marries to this trend of distributability to create a highly elastic model of computing where information processing and information migrate around inside a fabric of connected resources.

What we’ve seen of cloud computing up to now has been primarily not cloud computing at all but the application of hosted IT to the problems of server consolidation.  Application development lagged the revolution at the resource end of the equation, and it still does.  That gap is disappearing quickly, though, as companies and vendors alike come to realize what can be done.  However, we’re stuck with the notion that the cloud is a replacement for the data center and that notion will take time to overcome.

Cisco might be seeing something here, though.  Most of Cisco’s marketing hype innovations, like the “Internet of Everything”, don’t do anything but generate media coverage—their intended purpose both on Cisco’s part and on the part of the media.  Cisco’s attempt to jump one step beyond the cloud (as the Internet of Everything was an attempt to jump beyond the Internet of Things), “fog computing”, may actually have a justification, a value.  If we assigned the notion of “the cloud” to the original hosted-consolidation mission, perhaps “the fog” as the name for where we’re headed might be helpful in making people realize that we’re not going to change the world by hosting stuff, but by re-architecting IT around a new model of the network.  But “fog” isn’t the most helpful descriptor we could have, obviously.  We’re light on the details that might tell us whether Cisco’s “fog” was describing or obscuring something, so we’ll have to look deeper.

The drive to the future of the cloud is really linked to two trends, the first being the continued (and inevitable) reduction in cost per bit and the second being mobility.  As I noted earlier in this piece, lower transport costs reduce the penalty for distribution of processes.  In a net-neutral world, it’s possible to protect low-cost transport resources inside a ring of data centers because these interior elements aren’t part of the network.  Thus, we have a situation that encourages operators to think of “interior lines of communication” because they don’t always cannibalize their current service revenues.  And mobile users?  They are changing the industry’s goal from “knowledge networking” to “answer networking”.

I’ve noted in prior blogs that it’s easiest to visualize the mobile future as being made up of users who are, in a virtual sense, moving through a series of context fabrics based on factors like their location, social framework, mission, etc.  These fabrics represent information available about the specific thing they represent, but not available like the classic notion of IoT sensors to be read.  They’re analytic frameworks instead.  You could visualize a person walking down the street and as they move transiting an LBS fabric, their own social fabric, a shopping fabric, a dining fabric, even a work fabric.

The information from these fabrics is assimilated not by the user’s smartphone but by an agent process that represents the user, hosted in the cloud.  This process will dip into the available fabric processes for information as needed, and these processes will be a part of the “mobile service” the user sees.  Some of them will be provided as features by the user’s mobile carrier and others by third parties.

The cellular network and the mobile service network is now separated into two layers.  One is the user-to-agent connection, which would look pretty much like mobile services would look today, but with the primary traffic anchor point being not the public network gateway but the on-ramp to the cloud where agent processes are hosted.  The second layer is the inter-process links that allow agent processes to contact fabric processes for services.

Many of these fabric processes will be multi-tenant server-based applications that look a lot like the Internet elements of today, and many will be dedicated cloud components that are customized to the user.  Some of these fabric processes, like video delivery, will be able to utilize the connection to the user directly—they are brokered by the user’s agent but the agent doesn’t stand in the data path—while others will deliver information to the agent for analysis, digestion, and delivery to the user.  We could call the two classes of fabric processes Internet processes much like those of today, and Undernet processes that are structured more like back-end applications.

Things like the IoT are important in networking not because they’ll somehow multiply the number of devices on the Internet.  We all know, if we think about it, that IoT devices won’t be directly on the Internet—the process would be insecure and the notion of a bunch of anonymous sensors being queried for information is a processing nightmare.  We’ll have an IoT fabric process, probably several from different sources.  What’s important is that the IoT could be a driver of the Undernet model, which would create a form of intra-cloud connection whose character is different from that of traditional networking.  We don’t have users in the traditional sense, we have members that are application pools running in data centers.  It’s likely that these data centers will be fiber-linked and that we’ll then have a service layer created by virtualized/SDN technology on top.

Business productivity support, in the form of corporate cloud applications hosted in the public cloud or the data center, creates fabric processes in the Undernet.  Things like the SOA versus REST debate, even things like NFV and orchestration, become general questions for the binding of elements of intelligence that give a mobile user/worker what they need when they need it.  We lose the question of the at-home worker to the notion of the worker who’s equally at work wherever they are.  Collaboration becomes a special case of social fabric behavior, marketing becomes a location-fabric and mission-fabric intersection.  Service features and application components become simply software atoms that can be mixed and matched as needed.

Security will be different here too.  The key to a secure framework is for the Undernet to be accessible only to agent processes that are validated, and it’s feasible to think about how to do that validating because we’re not inheriting the Internet model of open connectivity.  You have to be authenticated, which means you have to be a proven identity and have a proven business connection to the framework so your transactions can be settled.

All of this is very different from what we have, so in one sense you can say that Cisco is right for giving it a different name.  On the other hand, it’s a major shift—major to the point where it is very possible that incumbency in the current network/Internet model won’t be too helpful in the Undernet model of the future.  We’ll still have access networks like we do now, still have server farms, but we’ll have a network of agents and not a network of devices.  So the question for Cisco, and Cisco rivals, is whether the giant understands where we’re headed and is prepared to move decisively to get there first.  Are the PR events the first signs of Cisco staking out a position, or a smoke screen to hide the fact that they aren’t prepared to admit to a change that might cost them incumbency?  We’ll probably get the answer to that late in 2015.

SDN Management: As Forgotten as NFV Management

I’ve talked a quite bit in 2014 about the operations issues of NFV, but much less about the issues associated with SDN.  We’re at the end of the year now, and so I’d like to rectify that in part at least by addressing SDN operationalization now.  There are links to NFV, of course, but also SDN-specific issues.

To open this up, we have to acknowledge the three primary “models” of SDN.  We have the traditionalist OpenFlow ONF model, the overlay “Nicira” model, and the “API-driven” Cisco model.  Each of these has its own issues, and we’ll address them separately, but some common management points will emerge.

OpenFlow SDN presumes a series of virtual or white-box elements whose forwarding is controlled by an SDN Controller component.  Network paths have to be set up by that controller, which means that there’s a little bit of a chicken-and-egg issue with respect to a “boot from bare metal” startup.  You have to establish the paths either adaptively (by having a switch forward an unknown header to the controller for instructions) or explicitly based on a controller-preferred routing plan.  In either case, you have to deal with the fact that in a “cold start”, there is no controller path except where a controller happens to be directly connected to a switch.  So you have to build the paths to control the path-building, starting where you have access and moving outward.

In an operations sense, this means that telemetry from the management information bases of the real or virtual devices involved has to work its way along like anything else.  There will be a settling period after startup, but presumably that will end when all paths are established and this should include management paths.  However, when there’s a problem it will be necessary for the controller to prioritize getting the paths from “devices” to controller established, followed by paths to and from the management ports.  How different the latter would be from the establishing of “service paths” depends on whether we’re seeing SDN being used simply to replicate IP network connectivity or being used to partition things in some way.

However this is done, there are issues related to the controller-to-device-to-management-model coordination.  Remember that the controller is charged with the responsibility for restoration of service, which means the controller should be a consumer of management data.  If a node fails or a trunk fails, it would be best if the controller knew and could respond by initiating a failure-mode forwarding topology.  You don’t want the management system and the controller stepping on each other, so I think it’s logical to assume that the management systems would view SDN networks through the controller.  In my own SDN model, the bottom layer or “Topologizer” is responsible for sustaining an operational/management view of the SDN infrastructure.  This is consumed by “SDN Central” to create services but could also be consumed on the side by an OSS/BSS/NMS interface.

The boundary between the Topologizer and SDN Central in my model, the alignment of services with resources, is also a useful place for SDN management connections to be supported.  Service management requires that a customer-facing vision align with a resource-facing vision (to use old TMF terms) to get meaningful service status.  So if we took the model I reference in the YouTube video link already provided, you could fulfill operations needs by taking management links from the bottom two layers and pulling them into a contextual processing element that would look a lot like how I’ve portrayed NFV management—“derived operations”.

If we look at overlay SDN we see that the challenge here is the one I just mentioned—aligning the customer- and resource-facing visions.  Overlay SDN simply rides on underlying switching/routing as traffic.  There is a logical overlay-network topology that can be aligned with the physical network by knowing where the logical elements of the overlay are hosted.  However, that doesn’t tell us anything about the network paths.

Logically, overlay SDN should be managed differently (because it probably has to be).  It’s easier if you presume the “real” network is a black box that asserts service connections with associated SLAs.  You manage that black box to meet the SLAs but you don’t try to associate a specific service failure to a specific network failure; you assume your capacity management or SLA management processes will address everything that can be fixed.

If we assumed that we had an SDN connection overlay on top of an OpenFlow, central-control, SDN transport network we could presume we had the tools needed to do service-to-network coordination, if a service model created the overlay network and was associated by my SDN “Cloudifier” layer with a “physical SDN” service.  This suggests that even though an overlay SDN network is semi-independent of the underlying network in a topology sense, you may have to make it more dependent by associating the overlay connection topology with the underlying routes or you’ll lose your ability to do management fault correlation or even effective traffic engineering by moving overlay routes.  Keep this point in mind when we move to our last model of SDN.

The “API model” SDN picture is easier in some sense, and harder in others.  Here the presumption is that “services” are policy-induced behavior sets applied through a chain of controllers/enforcers down to the device level.  This is in effect a black-box model because the service is essentially a set of policy invocations that are used to then drive lower-and-lower elements as appropriate.  It’s like “distributed central control” in that the policy control is central but dissection and enforcement are distributed.  When you want to know the state of something you’d have to plumb the depth of policy, so to speak.

Presumably, management variables would be introduced into this distributed-policy system at an appropriate, meaning local, level.  Presumably failures at a given level would create something that rose up to the higher level so alternatives could be picked, since the “failure” should have been handled using alternative resources at the original level had that been possible.  The problem obviously is all this presumption.  We really need to have a better understanding of how policy distribution, status distribution, and device and service state are related.  Until we do, we can’t say much about management here, but we can assume it would follow the general model of a “topology map” a “service map” and an intersection of the two to define management targets from which we have to obtain status.

The common thread here is that all the “SDN” mechanisms (not surprisingly) abstract services away from resources.  So, of course, does traditional switching/routing.  But remember that one of the goals of SDN was to create greater determinism, and that goal could end up being met in name only if we lose the management connections between the services and the resources that “determine” service behavior.  We’ve underplayed SDN management, perhaps even more than we’ve done for NFV management.

NFV management principles could save things, though.  I believe that the principles of “derived operations” that synthesize a service-to-resource management connection by recording the binding made when the abstraction/virtualization of a service is realized in NFV could be applied just as easily to SDN.  The problem, for now at least, is that nobody on either the SDN or NFV side is doing this, and I think that getting this bridge built will be the most important single issue for SDN to address in 2015.

I wish you all a very happy and prosperous New Year!

Illusion is the Enemy of Progress

There are a lot of illusions in life, and thus not surprisingly in networking.  One of the illusions is that there is great opportunity to be had by generating on-demand high-capacity services.  Another, possibly related to the first, is that there’s a high value to vertically integrating the operation of networks from the service layer all the way down to optics.  Offer fiber bits for a day and it will make you rich.

Frankly, I like illusions.  I wish most of the fables I’ve heard were true.  That’s not going to make it happen, though, and these two example illusions are no exception.  We have to look at why that is, and what might happen instead.  In the process, we’ll expose the biggest illusion of all.

High-capacity connections exist because there are concentrations of users, resources, or both and they need to be connected.  Most such concentrations occur in buildings because neither people nor servers do well out in the open.  Buildings don’t come and go extemporaneously, and so the need to connect them doesn’t either.  Businesses network facilities because that’s where their assets are, and those facilities are persistent because they’re expensive, big, and largely nailed down.

I’ve surveyed users for decades on their communications needs.  One thing that’s been clear all along is that site network capacity is a game of averages.  You get enough to support needs and even bursts on the average, and you make do.  If you were offered extemporaneous bandwidth like many propose, what you’d do is to run the math on what it would cost to continue to play the averages game, versus buying only what you need when you need it.  If you found that averages were cheaper, you’d stay the course.  Which means that on-demand capacity is not a revenue opportunity, but a revenue loss opportunity.  If you get lucky, they don’t buy it.  Does that sound like a good business plan to you?

There are situations where burst capacity is useful, but for the network operator these are found inside existing or emerging services and not at the retail level.  CDNs can always use a burst here and there, and the same will be increasingly true in evolving cloud-based services (what I’ve called point-of-activity empowerment).  In these examples, though, we’re seeing operators deciding to augment capacity and not selling on-demand services, so there’s no need to couple service-layer facilities to transport.

The service-to-transport coupling thing seems to be an invention of SDN proponents.  The sad truth is that our industry is almost entirely PR-driven.  People say what they need to say to get “good ink”, and so when something gets publicity there’s a bandwagon effect.  SDN and OpenFlow were really aimed at L2/L3 handling, but we spent a lot of time shoehorning OpenFlow principles into optics.  Think about it; here’s an architecture based on packet-level forwarding being applied to an opaque trunk or wavelength.

The worst thing is that it’s not needed, and in fact counterproductive.  As you go lower in the protocol stack you move away from “connecting” into “aggregating”, which means that your facilities are shared.  So here we are, proposing that service buyers could spin up some optics to give themselves a burst of speed?  How does that impact the capacity plans on which optical networks are based?

Every layer of a network needs to obtain service and a SLA from the layer below.  The responsibility for reconfiguration of connectivity or realignment of resources at a layer has to be within the layer itself, where the capacity plan under which the SLA is valid is known.  You don’t vertically integrate control of network resources.  You integrate a stack of services that create a retail SLA by enforcing capacity plans where aggregation takes place.  Services controlling multi-tenant resources is anarchy; everyone will want to go to the head of the line.

Why then, if so much of the view of future services is nonsense, do we hold to it?  The answer lies in that great master, central, illusion I mentioned.  The problem is that we are conditioned to think of the future of services and networks in terms of the present.  Our vision of NGN is that there isn’t going to be any—that the future network will simply be a bigger version of the current one.  We’ll use capacity exactly as we do now, but more of it.  Crap.

As bandwidth cheapens, it becomes useless except as a carrier of services.  Higher capacity introduces the potential for agility, to be sure, but we don’t have agile locations.  What changes things is mobility.  If we forget networking sites and start networking people, we get a completely different demand profile.  A single person moving about in a mobile world and asking their “personal digital assistant” for help with this or that is a very bursty user of information technology, unlike a group of fixed people in an office.  Their service could look different.

Not different in terms of “communications” though.  Sitting or standing or walking, they have the same voice and the same texting fingers.  What’s different is their information needs and the optimum way to satisfy them.  What shapes the future into something new and different and interesting is that workers and consumers will be empowered individually at the points of their activity.  That new model encourages us to view not networking differently, but IT differently.

In the future, computing and information is part of a fabric that supports movement of processing and capacity like the tides.  When people want to know something, we assemble what’s needed to answer their question and then send them the answer.  All the work is done in the IT fabric, all the connection dynamism that exists is within that fabric.  Old-fashioned connection services don’t change much, but that’s a good thing because enormous downward price per bit can’t be survived if you sell bits, but is essential if you’re selling a fabric of knowledge and decision support.

The cloud, as we conceive it today, is another stupid illusion, not because cloud computing is stupid but because we’re thinking stupidly about it.  This is not about utility computing; there’s no profit in it in the long run.  IaaS is hosted server consolidation, and the only reason anyone would “move an application to the cloud” is to get a lower price.  You make markets vanish to a point with that kind of thinking.  You certainly don’t build massive investment that change the nature of computing or networking.  The cloud will be transformational because we build the cloud with a transformational mission, and it’s that mission that will transform networking.  Not by transforming the services we now buy, but by giving us new service models at the information and knowledge level that we can only glimpse today.

Whenever somebody talks about new service opportunities, ask them how the same bit-pushing with a different pricing model is “new”.  Tell them you know what “new” really means.  Break some illusions, break some vendor hearts, and break out into the future.

Summing Up the Vendorscape For NGN

We’ve now looked at all of the classes of players who might be transformative influences on the road to NGN.  I hope that the exploration brought out some critical points, but in case they didn’t this is a good time to sum them up.  We’re heading into 2015, the year that just might be the most critical in network evolution since the ‘90s.  We won’t get to our NGN destination in 2015, but we’ll almost certainly set the path we’ll take.

Operators have told me many times that while they’re interested in evolutionary visions from people like me, they’re dependent on evolutionary visions from people with real products to change the shape of their networks.  This creates an almost-love-hate relationship between vendors and operators, a relationship you see at work in standards groups.  Operators think vendors are trying to drive them to a future model of dependency, and vendors think operators are mired in useless intellectual debates when there’s a need to do something—either advance or stop trying and make do.

All of this is taking place in the climate of steady erosion in revenue per bit.  The Internet and its all-you-can-eat pricing model has created a rich growth medium for businesses that rely on bits for delivery.  There is no question that consumers have benefitted from the result, and no question that startups and VCs have benefitted even more.  We’ve paid a price in loss of control of personal information, security problems, and so forth.  Who lost, though?  Mostly the operators, who have historically sold bits and watched that process become less profitable over time.

As I said in introducing this NGN series, things can’t go on this way and they won’t.  We have two possible ways to mitigate falling returns on infrastructure investment besides the obvious one of curtailing that investment.  One is to find a better way to reduce costs, so the revenue/cost curves don’t converge as fast, or at all.  The other is to find new services whose revenues aren’t explicitly generated by bit-pushing, services that offer a better ROI.

I think that both these two remedies are going to be explored.  The question is the extent to which we’d rely on them, and how the differences between the approaches would impact the network, now and in the future.

If we were to find a magic service touchstone that could add billions and billions to the carrier pie, we could delay or even eliminate the crossover between revenue and cost.  Had this been considered and supported properly ten years ago, it’s very possible that there would be little pressure on capex even with the current network equipment model.  Operators had “transformation” projects back then, but they were 100% dissatisfied with vendor support of them.  The “Why” is important because it’s still influencing things today.

The fact is that network equipment vendors don’t want transformation from a bit-driven revenue model to something else.  They believe the “something else”, which even then was known to be about software and servers, would simply dilute their influence.  It’s my view that we’re past the point where an “integrated” strategy would work.  We are now committed to having layered infrastructure, with a cloud layer, a grooming layer, and an optical layer.

This structure is important from a cost management perspective.  Cost management changes, of course, are easiest to make at the point where you’re making incremental investment or where you’re able to produce savings to cover the cost of the changes.  Were we to try to control network costs entirely with bits, we’d have to build upward from the optical to create a very efficient grooming layer that could then marry with the service layer.  Optical players would have to push SDN strategies far above optics to create that marriage, and vendors like Alcatel-Lucent who had both optical products and strong SDN positions would be favored.

It’s easier to see the transformation happening at the top, and moving downward.  Comparatively, we’re under-invested in the cloud layer today.  Most of the services we have today can be visualized as application- or service-specific subnetworks that unite cloud-hosted features with cloud-hosted application and service segments.  This model could create not only a more efficient service network, it could also be more secure and more agile in producing new revenues.  And all the agility everyone wants generates more variability in the mapping of features to resources.  We’ve proven with both NFV and SDN that we don’t even have the current level of resource-mapping agility properly operationalized.

The challenge that has to be faced here, no matter where you start, is that of operations efficiency.  Hosted assembled stuff is more complex than boxes, and that means more costly to run.  The difficulty you run into starting at the bottom is that you’re exposed to operational burdens of complexity slowly (which is good) but you’re also exposed to the benefits slowly (which is bad).  Evolving upward from optics is hard for operators.  They need to see benefits to offset any major changes in infrastructure.  It’s difficult for optical vendors, who have to promote a skyscraper-sized change from the basement of the building.  A requirement to support too much buyer education to make a sale makes salespeople unproductive.

So what this says, IMHO, is that operations changes really have to start with vendors in the cloud or service layer, which in the language of modern technology trends means “SDN” or “NFV.”  There’s been little progress in the details of SDN operations, and no indication that SDN operations practices could grow to envelop cloud elements.  This is why I put so much emphasis on an NFV strategy for vendors who aspire to NGN success.  However, NFV has focused so far only on the deployment differences between virtual and real infrastructure, when the real problem is that we’re not managing even real infrastructure optimally, and when ongoing management is more important than deployment.

Operations changes that start at the top can develop not only improved costs for current services and infrastructure, they can improve agility and thus theoretically improve revenues over time.  The ideal situation is that you come up with an operations approach that spreads over everything and immediately improves operations efficiency.  That would generate quick relief from the revenue/cost convergence, so much so it could fund some of your evolutionary infrastructure and service steps.

For 2015, we need to look at the higher layers of the network—the cloud and the virtual binding between the cloud and “the network” in its legacy sense.  All of the vendors I’ve mentioned could do something here, but obviously the ones to watch the most are those who either have a natural position in that critical juncture (IT giants) or who have some elements of the solution at hand and are simply searching for a business model to promote them.  I think we’ll see progress, perhaps as early as the end of Q1.