Is NaaS a Good Conceptual Framework for SDN/NFV?

I’ve commented in prior blogs that the output of what we call “SDN” or “NFV” would, in a functional sense, but a kind of network-as-a-service or NaaS.  NaaS implies that services are created through some API process (the service API) rather than being simply in place as an attribute of cooperative device behaviors.  Since you have to set up an SDN or NFV service, it’s fair to characterize that service as “NaaS”, and in theory something designed for NaaS could have both SDN and NFV implications.

It’s been hard to judge just how valuable NaaS could be as a conceptual unifier of SDN and NFV because it’s not presented that way.  That may be changing; Anuta Networks seems to be articulating a general NaaS approach to SDN and NFV.  It’s a bit rough at this point, but you can see the direction things might take.

If you wanted to idealize an NaaS model you’d start with the notion of an abstraction, a model that represents the functionality and properties of the service.  Building these abstractions might require a tool.  Another NaaS tool would then convert that abstraction into a set of resource commitments through the process you’d normally call “orchestration”.  Finally, the abstraction would describe (and by reference commit) management resources to engage with the service during the lifecycle to support operations processes.

The Anuta approach mirrors this NaaS framework.  The NCX architecture has a controller and agent element that cooperate to push centrally defined models/policies/parameters to places where network behavior is created.  The models or abstractions are built using a Service Design Engine drag-and-drop interface that lets an architect pull in functional elements and snap connections between them.  The architect can then specify the way each element is to be realized on infrastructure, including whether to use virtual or real components and how to set up various vendor devices.  The functional elements can represent connection networks (level 2 or 3) and also higher-level components like firewalls.

When you have a functional map of the logical structure of a service, it becomes orderable by a user/customer, and when it’s ordered the Service Orchestration Engine of the NCX architecture will build and connect the functional components.  This element operates through a series of low-level “plugins” that adapt the setup recipes to specific vendors and technology options, including whether the element is to be created by parameterizing a real device or by deploying a virtual component.  A Service Management Engine then monitors and enforces SLAs and a Capacity Management Engine manages the infrastructure or resource pool.

If you take this approach in the SDN direction, the interesting outcome is that you can build NaaS/SDN services using whatever infrastructure you have (you do need vendor-specific recipes, which I’ll get to).  You can also use an SDN controller and OpenFlow and you can use virtual elements, presumably either as vSwitch-like components or NFV virtual network functions.  Thus, Anuta is supporting an almost-Cisco-like vision of SDN as something that’s driven from the top, the service layer, and that can then exploit both legacy infrastructure and SDN-specific stuff.

In the NFV direction, it’s easy to see that applications like service chaining fit this model exactly, but it’s harder to say whether the full, broad, generalized goals of function virtualization would fit as well.  The issue is how the functional atoms that can be assembled are built in the first place.  If I decided I wanted to have a function called “widget” that I could assign to a virtual-hostable element, NFV principles would say that I could define it and sell services based on it.  For that to work in the Anuta NaaS model, they’d need a “widget-builder” tool that would define the detailed properties of a new object.  The company says they have such a tool for internal use, and it’s very possible this will be made public as they pursue NFV more actively.

Just what Anuta would elect to support in such a widget-builder element might be their biggest question, the doorway to their big opportunity, and not just with NFV.  It’s not desirable that a NaaS software vendor be a choke point in introducing services, but if there aren’t tools to build/onboard new elements the service scope will be limited.  Obviously Anuta has a complete tool available for internal use, and if they expose all the features and build in some safety checks, they’d enable at least partners and more savvy users to create widgets for composition into NaaS.  That would be the smart option in my view but we’ll have to wait to see how Anuta evolves their toolkit.

I think this NaaS-Anuta model demonstrates the value of taking a top-down approach.  A “service” is made up of a set of “functions” that are in turn orchestrated into resource commitments.  From this perspective it’s easy to see each of those resource commitments as a laundry list of optional directions based on topology and policy, which means that it would naturally support both “coerced functionality” from real network devices and “hosted/virtual functionality” from a software component.  This makes mixing legacy and SDN/NFV elements natural and almost automatic, which is how it should be.  If our widget tool were to allow operators to onboard legacy service functionality and then compose it, the NaaS concept could even be applied to networks with neither SDN or NFV elements—just to improve operations.  Anuta is still weak here; they haven’t announced an explicit model for OSS/BSS integration overall, much less formalized the widget tool.

The NaaS model also has the advantage of being a fit for both enterprises and operators, and being fairly elastic in terms of how big an enterprise you’d have to be to justify it.  NFV isn’t exactly a populist notion at this point, and there’s no movement in the standards process to embrace enterprise applications except by accident.  NaaS, in contrast, has wide utility but it’s a bit formless today—you need to embrace something specific (like NFV) to realize it, and the specificity then limits the scope of buyer interest.  An unfettered NaaS vision creates a larger target market and potentially more benefit to doing orchestration and management right, which could be a step toward making the right kind of orchestration and management features available.

It may be that the challenge for a NaaS-centric view of SDN and NFV has the unusual disadvantage of little media and market traction.  Those limited and specific things, SDN and NFV, have grabbed the glory but failed to address the full value proposition.  It may also be a challenge for Anuta to present a NaaS vision, for many of the same reasons.  The company’s visibility in the media and in our most recent SDN/NFV survey was very limited, and given that it may be at risk should a bigger and more articulate player (Cisco comes to mind) step up and tell the story better.

The Battle in, and for the Cloud

IBM and Google both reported their quarterly numbers, with both companies beating on the revenue line but short on profits.  The markets this morning took disconnected view; IBM’s shares were off pre-market and Google was up, though both movements were modest.

Google’s revenues beat slightly, with decent guidance, and this is probably what the Street likes.  Nobody wants to see the end of the free-stuff ad-driven model for online services, but it seems pretty clear from the call that Google is seeing the need to step beyond the traditional ad business for profit growth.  There was a lot of talk about Android, about Google’s Shopping, and the only real focus on search dealt with app search more than traditional search.  The main point here is that whatever economic recovery we’re having, it’s not driving huge improvements in Google’s numbers.

IBM also beat on the revenue line but full-year guidance fell short of expectations.  On the call, IBM talked a lot about the cloud, about the new Apple deal for enterprise mobility, and about R&D to do things like bringing Watson to the enterprise.  I think the Street is looking at these things as futures, which they almost certainly are, and therefore more likely to pull down current profits than to help.  And remember, we live quarter by quarter up on Wall Street.  Tech, they say, is recovering but again we’re not seeing an explosion of good stuff for IBM, certainly nothing to match Intel’s or even HP’s performance.

With Google, I think most people who use the Internet recognize that we’re seeing a very significant shift in the quality and utility of search.  Part of it is driven by the mobility shift—people don’t search the same way when they’re mobile.  Part is driven by the fact that too much SEO and ad word placement will make it much harder to see relevant results, which means that people may either search less or learn to filter out what are clearly commercial intrusions into their own process.  In either case there’s only a limited upside.

It’s not that Google’s numbers show an immediate problem.  Own-site revenues were up strongly and cost of traffic was down, but so was revenue per click, which suggests that the online ad market continues to commoditize despite all the talk about how better targeting should make clicks more valuable.  Google is doing fairly well and it’s also investing in the future, with some near-term opportunities to grow revenue as well as some possible ramps for longer-term gains.

Android and shopping obviously represent near-term stuff.  Google seems to have an Android revenue model based not on licensing the software but on using the platform to draw traffic to both search and retail sites.  This seems a good approach; it’s keeping handset vendors from building their own operating systems, which might hurt Google in the short term by fragmenting the app space and in the long term by decoupling handsets from an orderly evolution to cloud-augmented apps.

The long term, in my view, is really defined more by a financial truth than by a technical announcement.  Google’s internal rate of return is slowly falling—they’re becoming more a “traditional” company (like Cisco in that respect).  As IRR lowers, companies can start looking at big-ticket cash-cow opportunities that have lower ROI.  I don’t think Google is close to making fiber to the home really pay, but it seems likely that their TV and fiber projects might be attempting a convergence on a future when online video could really displace broadcast.  Sadly for Google, no research suggests that’s going to happen any time soon.

Some insiders tell me that Google’s big jump in uncategorized revenue was due to cloud service growth, but Google is not yet a cloud company in the sense it could be, and that may have to change because of the directions of the other company who reported—IBM.

I think what’s differentiating IBM and Google is that IBM is talking about the long term without offering much real hope in the short term.  The Street likes the Apple deal but it’s not ready to take it to the bank.  Apparently IBM isn’t either because guidance was soft.  The big question for IBM, then, is the cloud, and I want to frame the cloud in terms of strategic influence and overall IT opportunity.

My surveys have consistently shown that both information technology spending and network spending are rooted in data center evolution.  IBM was the unchallenged leader in influence there for most of my own professional career, but then started to slip in the middle of the last decade and have never stopped that slow slide.  The cloud, rightly or not, is seen by most CIOs as the embodiment of data center change, and IBM rates below both Cisco and HP in terms of cloud influence.  It’s not that IBM doesn’t have a cloud strategy and perhaps even a great one, as it is that the strategy is positioned so badly that it doesn’t drive the influence gains it could.

Here’s a basic truth, one that IBM in IT and Juniper in networking both need to face.  You cannot drive a revolution in your space without a very strong cloud position, because buyers don’t accept that there’s any other revolutionary change out there.  The Cisco deal with Microsoft in the data center may be more trouble for IBM than for anyone else, because Microsoft is a contender in terms of data center strategic influence, and if it were combined with Cisco it would lead the pack.  If Cisco/Microsoft were to get their partnership right, they could create a momentum that would be nearly impossible for even smart and well-positioned moves from IBM to counter.  Even HP, now building cloud influence, would have a problem.

Positioning is IBM’s only hope, as I’ve been saying for over a year.  The cloud revolution, like all revolutions, is 90% slogans and 10% programs.  IBM wants to get the plumbing right while somebody else is doing the trim and bric-a-brac.  They’ll never sell the house that way.  The cloud, and the cloud-related revolutions of SDN and NFV, is in its very early phases and the drivers that will take the cloud to fruition are not yet visible.  That’s good news for outlying players—you can still grab the high ground, if you dare.  IBM needs to dare, because I think Cisco is going to make its own bigger cloud move shortly, perhaps as soon as this fall.

IBM, Apple, Cisco, Microsoft, and Google all have a stake in the intersection of mobility and computing, something that could be the primary mover of cloud technology into the ultimate, right, path.  Nobody has a clear handle on this yet, and for other companies the time to counter the moves of the giants and preserve market influence is definitely passing.

Can SDN and NFV Work for Mobile Video?

According to practically everyone, video is the fastest-growing and most disruptive of all the Internet traffic sources.  Were it not for video we’d probably not have neutrality debates, and most of the return on infrastructure issues operators are bemoaning would disappear.  And despite the fact that surveys in both the US and Australia have recently demonstrated that online video is not reducing broadcast viewing, we’re going to have more video going forward.  That means handling video might well be the largest issue in networking, and it’s therefore surprising we don’t hear more about using SDN or NFV to improve it.

Part of the video problem is neutrality regulations, which have generally discouraged Internet settlement and special handling.  In the US, the previous FCC Chairman was a fan of the OTT community and reflected his view in a neutrality order that allowed only customer-pays models.  Given that customers demonstrably won’t pay for priority, that has moved video delivery off the table in an economic opportunity sense.

Interestingly, even the original neutrality order didn’t forbid video optimization via SDN or NFV.  Content delivery networks (CDNs) are explicitly not covered by the order, and in any case the DC Court of Appeals vacated the key points because the FCC didn’t have the jurisdiction to apply common carrier rulemaking to a group of players (the ISPs) who they’d previously declared were not common carriers.  The ins and outs of public policy with respect to neutrality are too opaque to judge at this point, but it seems unlikely that a new order would go further than the old one, and so we could reasonably look at how video handling within a CDN might be facilitated by SDN, NFV, or both.

One key point in the discussion is the intersection between OTT video and mobility.  Most of the video growth is driven by mobile devices, and it’s the use of these devices to watch video when a TV isn’t available that’s responsible for the “broadcast-plus” model of consumption.  Mobile networking has promoted a different model of CDN deployment as well as of video viewing.  Classic CDNs peer with access providers (ISPs) and deliver content without all the intermediate handling that traditional Internet delivery would involve.  Most of these are owned by third parties like market leader Akamai.  Mobile CDNs are emerging as operator-driven accommodations to the challenges of delivering mobile video and at the same time preserving mobile bandwidth for other applications.

The technical difference is that mobile CDNs have to be closer to the handset or too much of the mobile infrastructure, particularly critical EPC facilities, are used up before you even get to a cache point.  It’s this deeper mobile CDN caching that could be a target for SDN or NFV.

“Could” is the operative word, because there’s not a whole lot going on to make either technology helpful.  The challenge is the nature of mobile broadband—the user moves around in the course of a video relationship and in IP networks the address of the user determines the point of delivery of the packet.  If a user roams into a cell, the pipeline for that user has to redirect to the cell location or the user’s video is delivered to where they were, not where they are.  That means that in order to make even mobile CDNs work you have to make mobility work with IP.

Evolved Packet Core, or EPC, is what handles this today.  In simple terms, EPC anchors all the mobile addresses for an area in a public gateway (PGW) that then initiates a tunnel to the cells, one end of which is moved by the whole mobility management thing of EPC to reflect user movement among the cells.  One way to resolve the issues with SDN and NFV and video would be to implement EPC, which of course many claim to do.  The difficulty is that they do it by simply executing EPC using hosted technology (calling it NFV) or explicit forwarding paths (calling it SDN) and not by re-framing the whole EPC notion using the new technology.

We could imagine SDN defining a new service model, the “whip”, which is a tunnel that’s anchored at one end and that moves under SDN control at the other to whip among cells to where the user is actually roaming.  We could imagine this model being controlled by hosted functionality.  That could solve the mobility problem.  We could also imagine the dual-stack location-address-and-user-address model proposed by some for the Internet doing the heavy lifting here.  In either case, we’d be decomposing EPC into functions and then optimally implementing them in new technologies, not simply doing a rehash of the old EPC structure using hosted components or explicit paths.

This same sort of thing could help video.  We can put a cache anywhere, but as a practical matter it has to be outside the EPC or it can’t be linked to the mobile user.  If we had NFV-hosted functionality to drive SDN-based “user address space assembly” we could run tunnels from every cache point to the cells directly and have them “join up” with tunnels (“whips”) that represent EPC-controlled sessions.  That would let us cache deeper and more efficiently.

The lesson here is that the challenge of video could be better addressed by simply saying that if SDN and NFV are truly revolutionary, we need to stop applying them simply to create pre-SDN-and-NFV structures.  We need to rethink the problem at the functional level, then solve that problem optimally in the context of our most modern technology choices.

This is another area where NFV could take a lead in the technology evolution of networking.  The challenges for this more modern version of EPC or CDN are more operational than data-plane-technical.  It’s not hard to describe what we want to happen, but it’s challenging to manage the complexity in such a way as to control costs.  Without cost control, it’s doubtful we could apply these new mechanisms because the operators wouldn’t see the return or the users wouldn’t pay the price.

It’s hard, even dangerous, to stick your neck out and suggest that we do something different.  It makes you an enemy of most incumbents in a space, who view change as risk.  We can alienate mobile operators, CDN operators, network vendors, content providers…practically everyone.  But maybe some current alienation is a prerequisite to future progress.

Apple, IBM, the PC, and the Cloud

We have a couple of important developments on the business side of networking that taken together could signal something significant—but not unexpected.  Intel reported numbers that were just consistently good, and Apple and IBM are entering into a mobility partnership.  The common thread offers some insight into what’s really happening with the cloud, the PC, and mobility.

Classical wisdom says that enterprise computing is going completely to the cloud, that tablets are killing PCs off, and that Apple is the trend-setter and insight leader in the market.  All of these beliefs are now called into question, or should be.  What’s is likely really going on is in some ways more interesting and in all ways more complicated.

Let’s start with the cloud.  Cloud computing, so they say, is exploding.  Given that the value proposition offered for the cloud is that it’s more efficient in resource usage than enterprise IT, cloud gains would necessarily hit the classes of computers that were the targets of cloud migration.  Intel is clearly seeing net strength in servers and less margin pressure to boot.  That’s hard to square with a view that the cloud is sucking the life out of data centers.

This doesn’t mean the cloud is a total fraud, though.  It’s reasonable to expect that in the very early phases of the market, cloud servers sold to providers to populate their resource pools would create a bump in buying.  The dip on the server side would come a bit later.  However, we are not seeing indications of wholesale flight from internal IT to the cloud at the chip level, and that should be visible by now.  Analysts are saying that corporate “refresh” is well underway and responsible for the uptick on the PC side, so why would corporations not be demonstrate their refresh-phase buying habits right now?  Cloud migration should peak with refresh because buyers have a choice between reinvestment and resettlement of applications to the cloud.  We’re not seeing it.

Speaking of PCs, Intel’s numbers also demonstrate that the death of the PC has been greatly exaggerated.  That’s always been the case; you don’t get your name in the trade rags by saying that everything is going pretty much as usual and so there’s no news to report, no adds to serve.  But we are seeing strength on the corporate side and weakness on the consumer side, PC-wise, which says that there are significant differences in the value proposition in the two places.

One difference is that most consumers really didn’t need a PC to begin with, which is why tablets took off so strongly on the consumer side.  Corporate users are a different matter, and we’re having some difficulty as an industry working through the whole notion of empowerment and productivity.  We want to think of telework and mobile work, but that’s simplistic.

First, we’re in an age where the service industry is our largest growth sector.  I may be able to write blogs and do presentations from home, even collaborate with others using web tools, but I can’t sling a burger at you that way, or rotate your tires.  Most employees don’t have a large information content to their jobs (burgers and tires, remember?) so they don’t need either PCs or tablets, so we’re fighting over the third of the market that does have strong information content, or could have.

That 30% tends to be chained to fixed work locations; our models have consistently shown that two-thirds of information-empowered workers would still have largely fixed work locations.  Less than half would have a need to take information technology with them to their actual work activity—meaning mobility.  Obviously if you’re not mobile you may find the value of PCs outweighs their lack of support for mobility.

This introduces the Apple/IBM deal, which has to be one of the great ironies of the market.  IBM has been the absolute leader in driving change in IT, but they booted it when it came to mobility.  Apple has transformed the lives of nearly everyone in a mobile-broadband device sense, but they’ve booted it with respect to the enterprise.  Now two Titanic survivors are clinging to each other and some random flotsam trying to stay warm.  And my cynicism notwithstanding, it’s a smart move for both.

I’ve blogged for some time now that mobility was inevitably the driver of new services because it introduces the notion of context.  Contextual services are therefore the linchpin of hopes for new mobile profits and also for new paradigms to empower workers.  Even though only about 20% of workers with mobile-information needs, these workers make up the top quintile in unit value of labor, so enhancing their productivity is disproportionately useful.  If you are IBM you need to be driving the bus on the contextual services space, which is hard to do without a handset or tablet.  So you partner with the primo mobile device provider.

Apple needs the cloud.  There is no credible path for mobility-driven service revolutions or consumer behavior change that doesn’t depend on a ubiquitous device window on your contextual world, and also on a bunch of servers and big data engines lurking over the horizon.  It’s extremely hard for Apple to advance their utility without a leading cloud position but they have what’s arguably the most insipid cloud strategy of anyone in the industry.  So they partner with somebody who gets the cloud, and who can offer it in every imaginable way.

One way of which is through the enterprise.  Right now consumer services are expected to be free or very low cost, which makes it hard to build out behemoth data centers to support them without looking like a CFO’s nightmare.  But point-of-activity empowerment in the enterprise could generate almost $880 billion in service revenues, and you could build a lot of data centers for a piece of that action.  Once you’ve got all that iron deployed, leveraging it for the broader market is far easier to do and far less risky from a first-cost perspective.

The early target of the enterprise is a bunch of enterprise apps for the i-devices, but I think this is going to change as both parties exploit the obvious opportunity.  In fact, if we don’t see some substantive announcements around Apple/IBM partnerships in the cloud, I think the deal has little chance of changing the fortunes of either company.  Right now, developing cloud-optimized applications for iOS is only a vague goal.  If really is vague, then these guys are in trouble.  If it’s not vague, then the alliance is going to provoke a truly massive response from Google.  Then, let the games begin.

SDN is a Spectator Sport

If you’re a poetry fan like I am, you may be familiar with a poem by John Milton, “On His Blindness”.  The poem ends with the phrase “They also serve who only stand and wait”, and I wonder if we could apply Milton’s theory to SDN by changing “serve” to “win”.  The prevailing SDN model seems to involve more standing and waiting than it does revolution, even for the vendors who see themselves more as the downtrodden masses than as the establishment.

When SDN got started, it quickly became identified with a kind of network-buyer “throw the bums out” vision where hordes of white-box revolutionaries were carried on the shoulders of the crowd as they trampled incumbents.  In reality, there is no indication that SDN is really revolutionizing much at this point, and if you believe the latest Wall Street analysis of buyer sentiment, we’re going to see an uptick in traditional network spending.  Why is all of this happening?

Part of our problem with SDN is shared with other “revolutions” like the cloud and NFV, and I’ve mentioned it before.  It’s the tendency for the media and analyst community to pick up the most dramatic possible outcome of anything and say that it’s not only inevitable but looming.  Tell a reporter that SDN will make a difference in three years and they’ll look for a source that says it will take only two.  Write an analyst report saying SDN is a ten billion dollar market in 2018 and buyers of research will cast about for a bid of $20 billion in 2017.

All this cynicism notwithstanding, history tells us that we can really have revolutions.  The IBM PC spawned one in the early 1980s and the Internet followed a decade later.  That means that it is possible to shake market foundations, and even to underestimate markets instead of hyping them out of all proportion.  It’s just not easy, and for any given revolution the odds are good that it’s exaggerated to the point of being a fantasy.  It’s also true that the players in the best position to force changes are the ones least likely to benefit from it.  Somebody wins with the status quo, by definition.  And that has created a new sort of SDN strategy, one of SDN as a spectator sport.

Observational SDN differs from participatory SDN in that the goal not to drive change, but to let natural market forces kill all hopes of change, and perhaps even help those forces along a bit.  Assume that buyers have a trillion or so dollars sunk in incumbent network architectures.  Assume that they have expectations of value set by the bullshit bidding war I’ve described here, and assume that these expectations will not be met in any early trials.  Every year (for carriers, every day) they are forced to buy something to sustain current operations, and so they stay the course.  Five years from now, they’ve evolved to a new network one router or switch at a time, and it looks like the old one with successor model numbers stuck on the front of the same boxes.  Who wins?  The establishment, so sitting things out is downright smart for an incumbent.

This raises the question of what could be done to promote a “revolution” effectively, to change the dynamic.  Are we going to call every change a revolution and be right for the two in fifty years that actually quality, or are we going to make the most of every change and let market Darwinism decide which ones can make the leap into “revolution” status?  If we do the former, then there will be no SDN revolution and current incumbents will win—mostly Cisco.  That’s what Cisco is banking on.  How then could we do the latter?

The barrier to SDN revolution is incrementalism.  You can’t readily value SDN by replacing one router at a time.  You have to displace a bunch of stuff to create a value in networking, and to do that you need significant benefits—new things.  Why?  Because if I have a data center that would benefit from SDN, even one where I might save 24% (the max users report as being possible) in capital costs, that savings could accrue only if I were buying everything at once, and I didn’t buy it that way.  On the average, data center switches installed today have 21 months of residual useful life.  Toss them and you write that off, and it represents over half of the original cost.  So how does trashing 57% of my current investment to save 24% make sense?  It doesn’t.

Cheapness will never fund an SDN revolution (or a cloud or NFV one for that matter) because we have technology we’re trying to displace with our revolution and the cheapest option is always the stuff you’ve already purchased.  You need a driver from above, a set of significant new benefits, that can fund a significant change-out of existing gear.  You need something to justify the write-down.

What?  Forbes recently reported on a survey by Harvard Business Review Analytic Services where buyers said that “business agility” was that high-level driver for the cloud.  Given that cloud deployment is likely the driver of SDN, agility is the benefit to watch for SDN too.  But as Forbes points out it’s pretty hard to quantify “agility” and even hard to say exactly how SDN contributes to it.  Remember, we have connective networks today—anything can talk to anything else.  What more could SDN do than that?  Anticipatory connectivity for that server you don’t have yet?  Businesses don’t run on networks, they run over them, and what generates the traffic and the business benefits are applications.

Whatever creates agility at the network level has to be supported at the application level, which makes things even more complicated.  Agile networks supporting monolithic applications don’t accomplish anything, which is probably why we’re having a hard time coming to terms with a real SDN value proposition.  In this regard, NFV would have an easier time because it combines hosting of stuff and connecting stuff under one roof.  Maybe that’s why Dan Pitt said NFV was a flash in the pan; it might be too bright a flash to compete with.  But maybe it’s also why Cisco started its own SDN mission at the top, with the APIs and the notion of application-centricity.  Forget “carpe diem” and think “carpe beneficium” (“benefit” for those who didn’t study Latin).

The trouble is that the revolutionaries are sitting in their own coffee shops, not building the barricades.  If you look at SDN strategies from Alcatel-Lucent, Cisco, and Juniper you see what should be an SDN spectator and two bomb-throwing SDN radicals, but that’s not what you see.  I like Alcatel-Lucent and Juniper’s SDN technology, but I don’t think their positioning is going to make that connection with application and business dynamism, and that leaves Cisco free to build SDN one router at a time, as only an incumbent can do.

Without a strong application story, nobody can make an SDN revolution work.  If Cisco can present such a story for legacy routing, it’s going to be hard to beat the combination.  Competitors are playing into Cisco’s hands by playing in the value-proposition mud while Cisco watches from the stands.  They still can’t tap enough benefits to put Cisco at risk, and we know that because Cisco is still “spectating” and not driving a radical approach.  They don’t need to get their hands dirty; they’re winning.

NFV-at-the-Edge: Is it a Business?

Last week, Netsocket announced its “Virtual Partner Program”, which is hardly the first ecosystemic announcement made by vendors in the next-gen services and SDN/NFV space.  The program’s value raises some interesting points about network evolution and also about ecosystems and partner programs in general.

I blogged about Netsocket’s approach to NFV when they announced it.  In brief, it’s based on deployment of MicroCloud Servers to the customer edge, where they provide not only the “normal” service edge features but also a platform on which value-added features can be hosted.  When a customer wants a feature, you simply load it into the edge device and run with it.

The first interesting point here is the whole notion of virtual edges based on server platforms.  There are a number of vendors (Overture and RAD most recently) who have provided smart edge devices into which you can load something on demand, but these devices are custom hardware platforms—essentially edge switch/routers with some server intelligence.  Netsocket proposes the other direction; make servers into edge devices not the other way around.

If you believe that servers can play any role in hosting features, you have to believe in this edge mission.  The service edge would almost never create demands on handling and performance that a properly designed but still standard server couldn’t meet.  Hosting stuff at the edge might seem to fly in the face of the notion of economy of scale for NFV, but there are plenty of reasons why that’s not the case, worth examining here.

The first reason is that edge-based NFV lets you deploy stuff agley and at the same time limit the cost of prepositioning a resource pool in the hope you can fill it properly.  One sale, one box, no first cost headaches.  The value proposition is particularly compelling where the functions you’re deploying are likely to be in place for a protracted period, and most business service edge functions like firewall are going to get turned on up front and left in place, except for bug fixes and enhancements, thereafter.  You don’t have a lot of dynamism to justify complex resource allocation.

The second reason is management cost and complexity.  I’ve noted before that when you sell users features that add up to a “virtual device” you have to have two-faced management systems; the customer sees the virtual device but the NOC has to see the real resources.  Virtualizing management views isn’t something we hear about today (and we don’t hear it from Netsocket either) and it’s easier to deal with the management challenge when the resources a virtual feature uses are still dedicated to a single customer.  If you’re going to do that, it makes sense to stick the features in the edge device where they also create a logical service dmarc.

Of course the value of edge hosting is proportional to the value of the stuff that you have available to host there.  Netsocket has always been a promoter of being able to run pretty much anything at the edge, which theoretically gives buyers the opportunity to stick any software/network application from any open platform (including open source) into the MicroCloud Server and sell it as a feature.  Obviously the new partner program is designed to facilitate this, but if the platform is really able to run most applications (for Linux at least) why have a specific program?

This is where we come to a fork in the road on these kinds of programs.  Most partner ecosystem programs are a combination of a PR device to build buzz (a cheering section) and a strategy to broaden sales channels by involving more parties who presumably have more prospects.  While there’s nothing wrong with opportunism, it’s not strategic; in fact it carries risks of having the partner inertia impede strategic changes needed to address market trends and opportunities.  I think Netsocket is trying to walk the edge here in yet another way—they want to do something a bit more than the classic Vonnegut “granfalloon”  but not get themselves into the really deep issues.  So what line is that?

It’s certification.  For many users and even operators in an NFV world, to say that you can run everything is to say you’re not sure you can run anything.  Operators want their services to be made up of certified elements, meaning that they want to be sure that a given edge function can be dumped into the box and it will run symbiotically with what’s there and with the rest of the service elements.  If the vendor can certify stuff for edge hosting, the operator can be sure it will work.  That’s important for all the operators, but especially for Tier Two and Three operators who don’t have the resources to staff a certification lab of their own.

The issue for Netsocket and others is that the value of certification is proportional to the difficulties associated with integration.  A full-bore NFV implementation with management and orchestration done the way it needs to be done in the long run, would make it fairly easy to deploy new features and would also provide a better management linkage.  That could mean that the certification strategy could broaden to cover all of NFV—even pooled resources in data centers.  Right now the edge guys like Netsocket have an advantage in NFV because they can deliver at least service agility without the burdens of deploying pools of resources and complex management/orchestration tools.  But those tools will be needed eventually.

The edge is just another place to host stuff in the long run.  We can’t expect edge-only approaches to survive unless the bigger players who deploy a general model for hosting functions decide for some unknown reason to foreclose the use of their approach in edge devices.  I can’t think of any NFV candidate who would do that.

So why would Netsocket take a narrow approach?  I think we can draw on the lesson of Overture.  When they announced their own edge-box NFV strategy, Overture backed it with what’s still the most complete commercially available MANO platform out there.  They can do a lot with it.  But it doesn’t sell itself and it’s not clear what the monetization model for it would be.  If it’s going to take me a year to sell a software platform for MANO and if everyone really wants one that’s open-source anyway, how much could Overture—or Netsocket—make by being a full-service NFV player?  Why not just sell boxes?

Netsocket is doing the right thing in the near term.  The question is how long the “near term” will be relevant, and who specifically will come up with the broader answer.  Niche players’ biggest risk is a mass market.

The Path to “Service Agility” and “Operations Efficiency”

Over the last year, we’ve seen a significant transition in expectations for things like SDN and NFV that are aimed at transforming networks.  Where once it was believed that moving to white-box switches and functionality hosted on cheap servers was a major driver of change, it’s now broadly accepted that something else has to drive us forward.  The things most often cited now are “service agility” and “operations efficiency”.  I happen to agree, but it’s more complicated than it seems.

First, my surveys and work with operators strongly suggests that we’ve attained our heightened understanding of drivers of change through the somewhat (at least) cynical path of having exhausted the easier-to-sell options.  Every salesperson, everyone who has tried to sell “management” on an idea, knows that the best starting point is something that’s easily understood and difficult to disprove.  Capex savings fit that bill, so that’s where we started.  Since we have not completed any useful research on how agile SDN or NFV would be, or how much operational savings they could generate, we’re not rushing (armed with stunning insights) to the right answer.  We’re just moving on past what we know won’t work to the easiest thing that might.

You can see that from the fact that “service agility” is the top requirement today, rather than operational efficiency.  People love service agility because they can postulate nearly any upside they like (“Hey, we could add 50% in revenues if we could address market trends faster!”) and because there are bunch of easy, harmless, things that can be said to address agility needs.  A good example is the notion that we could shorten service turn-on times by two (or maybe four, or even six) weeks.  First, all the service automation you can name won’t string a wire, so the things you can do quickly through an automated process are things that augment wires, not create them.  Second, the actual benefit of shortening the time you can bill for a feature added to a wire is non-replicable.  Operators tell me that shortening turn-on will likely add no more than three tenths of one percent in service revenues in any given year.  Over time, as feature commitments inevitably stabilize (once you get a firewall you tend to keep it), you’re not doing anything to be agile with.

I think that the admittedly more complicated truth here is that service agility and operations efficiencies go hand in hand.  Once we’ve addressed the low apples in agility (and found them little and maybe a bit sour) we have to start looking at the complicated agility issues.  Nearly any service that could be conceptualized and sold today could be built today.  The problem isn’t lack of services, it’s lack of profits.  If something can be done for 148% of the current willingness to pay, then it’s darn sure not going to be sold enough to make anyone care about it, or how agile you were in getting it turned up.  The fact is that more valuable services are more complicated services, and complexity always costs in operations terms.  Thus, we really need to be conceptualizing operations efficiencies first, and that poses two specific and significant challenges.

The first challenge is that any new technology will necessarily make up a relatively small part of the infrastructure pool we’re building from.  We worry about how to operationalize SDN or NFV when that’s not the problem.  We have to operationalize legacy because that’s what we’re starting with in any service roll-out.  Who among our operator friends will deploy an enormous NFV cloud to secure some opex benefits, given that the NFV pieces are little nubbins of functionality buried in a sea of traditional technology?

SDN’s biggest problem, in its OpenFlow purist form at least, is this point.  We have no credible proof you can replace everything in the Internet with SDN and most people don’t believe we can.  Yet without replacing at least a whole heck of a lot, we can’t make any major change to operations costs, and in the early stages of deployment the new and different SDN practices are going to be more expensive to unify with legacy tools and practices.  So we prove a negative benefit and hope that people believe it will magically get positive?  Good luck with that.

The second challenge is that we don’t know what the agility barriers are, because we don’t know what our service opportunity targets will be.  When anyone who touts agility talks, they are forced into pedestrianism that generates boredom in the hearts of our media friends, and we get nowhere because nobody even knows we’re trying.  We have to be able to solve agility operationalization challenges inside a framework of service creation that could address any credible service target.  That is a very big order for a marketplace obsessed with staring at its feet and next quarter and not the horizon and their opportunity to become the next IBM or Cisco.

The fact is that what we should be doing now isn’t directly related to operations efficiency or service modeling, it’s something that Diego Lopez from Telefonica talked about at the Light Reading network event recently.  It’s abstraction.  We need to be able to create services by manipulating abstractions.  We then need to be able to manage the result.  Service agility and operations efficiencies, translated into practical terms, mean model-driven service creation/composition and the linkage of service operations processes to service/network/IT events based on a service-instance-driven model of how resources and functionality are related.

This is the reason I like TOSCA but don’t yet love it.  TOSCA (Topology and Orchestration Specification for Cloud Applications) is a model-based approach to defining how functional atoms of a service would decompose into resources deployed in the cloud.  Since deployment of software-based features has got to be the core of any credible future service, that aligns the tool with the primary new problem.  What I don’t love is the fact that TOSCA is still “cloudy” both in that it’s aimed at the cloud and in that it’s an emerging spec with little current practical history of modeling services in the broad sense.  I think you can make TOSCA into the right answer, at first by augmenting it and later by enhancing it, but I think you have to define that as a goal, and the first step in proving your effectiveness is to take a network that has no SDN or NFV in it and prove you can model and operationalize services.  Because it’s not whether “agility” or “operations efficiency” will drive SDN or NFV, but whether there’s a way to get both and get them from day one in our evolution to the future.

Making Spaghetti Out of SDN, NFV, and the Cloud

Everyone surely knows the old saw about the difficulty in pushing spaghetti uphill, and it occurs to me that we’re in danger of trying to do something very like that with changes created by SDN and NFV, perhaps even the cloud.  The expression is usually applied to tasks that are made difficult not by the intrinsic goal but by the fact that execution is starting at the wrong point.  You probably understand why I might think that’s an issue with our trio of revolutions.

“Services” in an IT, network, or cloud sense are really retail services.  A seat is not a car, and while you surely need seats in cars for them to be useful, you can’t talk about seat improvements as though consumers ran down to their dealerships to buy seats every couple years.  If you focus on the seat and not the car, you miss the dynamics that actually drive the market.

When we say that SDN or NFV or the cloud will improve services, and at the same time present them as alternatives to doing what we already do, we’re saying that the “improvements” are non-functional.  If Car A differs from Car B for non-functional reasons, then the only valid differentiator the buyer would likely recognize is cost.  But if we’re saying there is no benefit to these technology shifts other than cost, then adopting them will simply reduce revenues to the sellers, and downward through the food chain.  So, arguably, the success of all three will depend on generating some of those functional improvements, increasing the utility of networking or IT to the point where people will spend more on services, not less, because they get something that justifies that change.

For the cloud, that means that we have to stop talking about IaaS.  IaaS is hosted virtualization; virtualization is multi-tenant server consolidation.  All of that stuff is about cost management.  Something isn’t expected to run better, or even different, in the IaaS cloud, only cheaper.  Amazon’s recent focus on augmenting EC2 with new web service features to deliver functional value to cloud applications is a step in the right direction.  IaaS is going to grow up, and we need to not only accept but embrace that maturation, by thinking about good stuff that can be presented as a functional service element and not a cheaper service platform.

For SDN, I think the key is the notion of the service model.  SDN manipulates forwarding processes per device to create an overall service/connection model.  If the model we create is identical to that of Ethernet or IP, then we’ve added no functional utility through SDN and we can only justify it based on the assumption it’s cheaper—a lot cheaper in fact because of the risk premium associated with a new technology.  We should be talking about service models created northbound from SDN, models that would bring different features to network users.

NFV is the classic.  On the face, NFV proposes to take a box and turn it into a chain of VM-hosted, network-connected, elements.  The result would do what the box did, and the allegation is that it would be cheaper.  Well, how many VMs do we need?  What’s the cost of the VM if we amortize server costs, data center power and cooling, and (most of all) operations costs?  How would creating something from four distributed hosted elements be easier to manage than the box?

We got into this mess by spaghetti-pushing.  The car buyer wouldn’t evaluate a car by looking at how well the wheels turned or how the seat moved or how bolt A fit into nut B.  They’d talk about “how it looks” and “how it drives”, meaning they’d apply very high-level functional utility standards.  The car-builder, at least those who want to live to sell another day, would similarly start with the primary aesthetic issues that buyers respond to.  Imagine your excitement as your dealer shows you how your car is assembled from a pile of parts.  Would you stand around to see what was going to emerge, or more significantly would you buy based on the promise of the process and not the characteristics of the final result?  If you say “Yes” I’d love to sell you a car.

NFV, SDN, the cloud, all mean nothing if they don’t expose new benefits.  Vendors will not invest to contract their markets, nor will service providers build out to lose money faster.  Service creation is the process of functional assembly.  Yet we don’t talk about functions at all when we talk about SDN or NFV or the cloud; we don’t talk about the high-level impact, because we’re not prepared to propose what the impact might be or how we might achieve it.

Orchestration is the poster-child for all of this.  Ask an “NFV” provider about it and they’ll say they do it—with OpenStack.  Flash: OpenStack is not orchestration.  It’s cloud component deployment.  It’s screwing the seat onto the frame, not building a car.  Run screaming from those who say OpenStack is the solution to NFV, or SDN, or the cloud.  Shun them.  They’re pushing spaghetti uphill and they want you to join them in their futile endeavor.

We have nothing in the way of standards to drive us forward toward functional-driven orchestration, nothing to connect all our low-level bluster with the buyers’ wallets.  We have no hope that a new standards process could achieve anything but generating Platinum memberships at this point; we’re out of time.  The buyer is standing in the showroom, checkbook in hand, and looking at our pile of parts.  By the time we assemble a notion of what we expect to do with them, that buyer will be standing somewhere else.  Someone is going to have to take a stand, build a car.  Unless there’s no automotive market, unless buyer functional requirements mean nothing, that someone is going to have a big payday.

SDN, NFV, and “Higher Layers”

I read a piece this morning, a paper from a vendor, who’s promising that just because SDN comes along, Levels 4-7 won’t get commoditized.  We’re suffering here again by a victory of hype and popular usage over technical accuracy, and so it’s time we took a look at the whole “higher-layer” or Level 4-7 picture to sort out what SDN and NFV might mean there.

Back in the 1970s, the “Basic Reference Model for Open Systems Interconnect”, shortened to the “OSI Model” came out.  This model established a seven-layer structure that was based on the notion that a given layer saw the network only through the services of the layer below and that each layer was simply a user of that lower one, used by the one above.  The progression of the first three layers is simple; physical media, data link, and network.  Above are the transport, session, presentation, and application layers.

The first stumble we find in popular discussions is that only Levels 1-3 are present in the network; all the higher layers are end-to-end.  Thus, fairly speaking, there could never be any impact of SDN on higher layers unless it stepped outside the network.  The facts never constrained a good marketing yarn, of course, but in this case there’s a reason why we’ve taken to implicitly violating the very model we reference.

The bigger problem for the 4-7 story is that your chances of encountering 7-layer traffic in the real world are probably about the same as being struck by lightning while kissing a Hollywood star/starlet.  The Internet protocol suite never really established formal structure above Level 4 (TCP and UDP) and nothing based on the OSI suite really deployed at all.  But it’s the reaction to this reality that has created the current higher-layer mythology.  Two specific developments, in fact.

Development number one was that the lack of a formal higher-layer structure made it difficult to distinguish between application flows to the same transport port.  As it became clear that certain applications demanded different QoS, the lack of a fixed structure of headers to look at meant that you had to inspect the packet payloads to try to classify what you had.  Hence, DPI.  DPI builds application awareness into the network, but not through “higher layers” but by accommodating as best as it can the fact that there aren’t any.

The second development was that some network vendors added their own layers above the TCP (Level 4) process.  These “overlay networks” do impose a formal structure and add formal headers, so traffic within one can actually be manipulated by higher-layer processes.  However, there really aren’t three additional layers up there; there are typically only one and in theory you could stack them up like playing cards as deep as you liked, presuming you could tolerate the header overhead.

DPI and overlays are different potential solutions to the common problem of application-specific handling.  With overlays, the endpoints can classify traffic and tag the stuff with a header so that everything that has to be recognized can be.  Without them, you dig into the packet content to look for structural patterns that would identify traffic.

SDN is different from “OSI” networking in that in theory you could define forwarding rules to accommodate as many layers as you like, and you can make the network sensitive to the higher layers even though OSI says that’s a no-no.  Thus, far from commoditizing higher layers, SDN could actually empower them.  However, it’s not enough in itself.

If you have traffic with a uniform header structure, a layer-modeled conforming packet, you know where to look for the signals that divide application traffic types.  If you have any traffic at all that doesn’t conform to the header structure you have the risk that the data payload of those packets will appear to be headers, and you’ll mishandle them as a result.  Thus, the most useful presumption for higher-layer traffic is that you create some sort of overlay network and tag all the packets with the proper headers.

Tagging packets is easy if they’re your packets, and if you know they need to be tagged.  Obviously we have a zillion apps out there that don’t know anything about higher layers and so won’t do the tagging.  A good approach then would be to apply DPI at the very edge and examine packets meticulously to be sure you identify them correctly, then tag them so you don’t have to repeat this potentially complicated and delay-generating process at every point in the network where you want to provide discriminating handling.  You can apply edge-DPI-based tagging to any stream that has enough information to reliably classify stuff.

The point here is that you can’t offer higher-layer handling without some explicit process of classification, which means that the data sources have to be equipped or augmented to provide that before the network can use it.  Otherwise you have to continually examine stuff at every point, which will likely generate way too much lost time and resources.

NFV enters into this picture not so much as a handling option (an NFV-based switch or router would have the same basic constraints on headers and DPI as anything else) but as a way to classify and as a source of classified traffic.  You could host DPI VNFs in useful places to classify stuff when you needed it, and you could also use overlay networks to separate service-plane traffic from the intra-function traffic that should never be exposed to direct user interaction.  It helps higher-layer, and could also help justify it.

So the point here is that higher-layer handling could be done very flexibly using DPI and SDI, better than we’d normally do.  The question is whether we can create a “service model” to describe how this kind of handling would work so it could be done cooperatively through a network.  Even in SDN, you can’t forward packets unless all the handling nodes have a cohesive idea of how the whole thing is supposed to hang together.  That model, sadly, is another of the applications of those missing “northbound APIs”.

 

Broadcom’s Hybrid Hardware: Right Idea but Wrong Target?

According to a decent interview/story in Light Reading, Broadcom has an NFV strategy (no surprise) and it’s one that favors the notion of a hybrid technology approach, augmenting COTS with specialized technology to enhance performance (no surprise there either; they make the stuff).  But just because you’re opportunistic doesn’t mean you’re wrong, and there are a number of factors that complicate the seemingly simple notion of NFV savings through off-the-shelf hardware.  Specialization of hardware, hybridization as Broadcom would put it, is one, but Broadcom will need to think about all the others too.

Factor number one is that it’s not really what carriers want these days.  Most of the operators who have delved into NFV seriously have already decided that capital savings won’t make enough difference to help them.  NFV targets primarily specialized appliances, and there aren’t enough of them in a network to drive a big shift in cost.  Not only that, most operators agree that capital savings wouldn’t likely be more than 25% max, and most of that could easily be eaten up in operations costs.  Replacing a single branch access box by three or four containers linked with service chaining generates more components, thus more complexity, thus more cost.  That means that first and foremost, an NFV strategy has to address the totality of the business issues, primarily operational efficiency and service agility.

Factor number two is that many of the NFV targets are really not even NFV applications.  IMS and EPC, as I’ve pointed out, are more like simple cloud hosting, which we can already do.  Service chaining, assuming you really want to approach it right, is more likely to require an agile edge box that you can load with the proper functionality, as a number of vendors have already noted.  The good news for Broadcom is that it’s easier to justify specialized hardware in a custom device, even one that’s service-edge-based.

You can probably see a point here already.  Yes, it is true that many possible applications of servers in networking would benefit from specialized hardware—and software for that matter.  We should be thinking about what an optimized NFV platform would look like overall, and then recognizing that COTS may be giving up too much for too little gain.  But, we have to note that if capex reduction isn’t really the target then optimizing hardware to support the real NFV mission demands we know what that mission is.

We need good data plane performance for NFV to be sure, and there are probably applications that would demand specialized stuff like content-addressable memory or high-speed arithmetic processing.  Having these things might be essential in creating an NFV server to fulfill our mission.  But they won’t make the mission’s business case.  For that, we need to manage the complexity of networks and services—not only for NFV but for SDN and everything else.

Remember my comments yesterday on functional agility and the movement of service logic toward an almost-transactional model?  Yes, you need to have high-performance data and control paths to make that work at scale, but you also have to stitch together functional components ad hoc to meet service demands.  You’re not “provisioning” something in the traditional sense because you’ve made many of the key service components multi-tenant so they’re there all the time.  Yes, you’ll need horizontal scaling and load balancing just like you do in a realistic NFV IMS/EPC implementation, but you also need to be able to find components quickly, steer work reliably, and do all of the stuff that distributed OLTP systems have to do.  We don’t get to this end-game immediately, we do start along the path as soon as we admit that service chains and multi-VNF services are more complicated, less easily operationalized, than a service made up of a couple shared or dedicated boxes.  Even if the new costs less, capex-wise, it’s not likely to demonstrate any stellar TCO benefits, unless we can control that complexity.  Service automation isn’t about doing what we do today, but with software.  It’s about doing what needs to be done for the next generation, stuff that will kill the business case if we don’t do it right.

What this means for Broadcom’s hybrid approach is that it could be totally correct, even smart, even essential, but still not be sufficient.  There has to be NFV deployment before you worry about how efficient it is.  If you can identify a solid driver for NFV deployment you have a specific business case with specific dollar benefits, against which you can apply the cost for COTS, for augmented servers, or whatever.  If you can’t identify the driver then it doesn’t matter whether you have hybrid or singular COTS.

Of course, Broadcom could define mechanisms for its augmented/hybrid hardware approach to reduce complexity.  There are credible missions for hardware augmentation in operations and workflow, for OLTP-like stuff.  The question is whether Broadcom is looking for these missions.  In the article, the Broadcom CTO uses switches as an example of a mission that justifies special hardware rather than COTS, which is probably true.  But recall that NFV wasn’t targeted at switching/routing, though there will certainly be a need for switches/routers in virtual networks of virtual functions.  In those missions, which are more contained than transit switch/routing missions, it may or may not be true that special hardware is needed.  Same with the agility/operations missions, the ones Broadcom needs to have proved out—by themselves or by somebody else—before their hybrid argument takes hold.

So here we are again, dancing with NFV missions, which is probably as frustrating for the operators as it is for me (or you, my reader).  It would be nice if we could lay out the totality of NFV, the business drivers, quantified benefits, requirements to be met for each benefit/driver, and do the math.  That’s not likely to happen right away, but eventually it’s inevitable.  My spring survey said that operators believed that their NFV trials were proving technology but not making the business case.  NFV, they say, is feasible technically, but the extent to which it can deploy depends on those hazy operations savings and service agility benefits, things that we still have not explored much in trials.

The ISG is looking for a mission for Phase Two; here’s one.  We need to look at NFV in context, in service applications that are credible and that have service revenues and cost targets.  We need to understand how various assumptions about servers and augmented hardware and specialized software and edge versus central hosting will impact the benefit case, how far we need to extend the concept of “management and orchestration” beyond virtual functions to capture enough costs and complexity to be meaningful