What Would Cisco, IBM, or Others Have to Do to Win at the IT/Network Boundary?

Yesterday, in the wake of earnings calls from both Cisco and IBM, I blogged that IBM was at least working to build fundamental demand for its stuff by engaging with Apple to enhance mobile productivity for enterprises.  I then commented that the challenge would be in converting this kind of relationship into some structured “middleware” that could then be leveraged across multiple business applications.  My closing point was that almost half of the total feature value of new middleware was up for grabs, something that could reside in the network or in IT.  It’s time to dig into that point a bit more.

If you look at normal application deployment, you see what’s totally an IT process.  Even multiple-component applications are normally deployed inside a static data center network configuration, and so it’s possible to frame networking and IT as separate business-support tasks, cooperating and interdependent but still separate.  While most companies unite IT and networking under a CIO, most still have a network head and an IT head.

The cloud, SDN, and NFV potentially change this dynamic.  OpenStack has an API set (Nova) to deploy compute instances and another (Neutron) to connect them.  At least some of the SDN models propose to move network functionality into central servers, and NFV is all about hosting network features.  The broad topic of “network-as-a-service” or NaaS reflects the goal of businesses to make networks respond directly to application and user needs, making them in a true sense subservient to IT.  If you apply virtual switches and components and overlay technology (like VMware) then you can create a world where applications grab all the glory in an overlay component and networking is all about plumbing.

The question, of course, is how NaaS is provided.  NaaS is like any other kind of networking—you have to be able to model what you want, deploy it, and manage it.  Clearly you can’t manage virtual resources alone; real bits get pushed by underlying switches even in the VMware model.  Furthermore, Nova could in theory make “hosting as a service” and be as disruptive to the current data center model as Neutron and NaaS would be to networking.  The point is that there’s a big functional chunk around this “as-a-service” stuff.

And it’s up for grabs.  Virtually no network operators and few enterprises believe that the new model of the cloud is mature and well-understood.  If you focus on the subset of enterprises who are looking for those compelling new productivity benefits—the ones that could drive new tech spending—then no statistically significant portion of the base believes they’re ready to deploy this new model.

The closest we’re approaching reality in our tech evolutions to date is with the cloud and its relationship with NFV.  Cloud computing for enterprises has been mostly about server consolidation; users tend to deploy fairly static application models to the cloud.  While this is helpful to a point, most enterprises agree that point-of-activity empowerment through the marriage of mobility of devices and agility of information is the best hope for new benefit drivers.  This kind of stuff is far more dynamic, which is where NFV could come in.

Service features can also be static, as most of the “service chaining” proofs of concept and hype demonstrate.  A company who buys a VPN is likely to need a pretty stable set of connection adjunct tools—firewall, NAT, DHCP, DNS—and even if they buy some incremental service they’re likely to keep it for a macro time once they decide.  Thus, a lot of the NFV stuff isn’t really much different from server consolidation; it’s a low apple.  The question is whether you can make something dynamic deployable and manageable.  The Siri-like example I’ve used, and the question “What’s that?” illustrate information dynamism, and you could apply the question to a worker in front of a pipe manifold or electrical panel or to a consumer walking down a commercial boulevard.

My point on all of this is that the essential element in agile cloud or NFV deployment is highly effective management/orchestration, or MANO.  IBM’s answer to MANO in the cloud is its “SmartCloud Orchestrator”, which is as far as I know the only commercial MANO tool that’s based on the TOSCA standard that I think is the best framework for orchestrating the cloud, SDN, and NFV.  Some inside IBM tell me that they’re looking at a “Service Orchestrator” application of this tool for NFV and that it’s also possible that NFV and the cloud will both be subsumed into a single product, likely to remain with its current name.

So here’s IBM, explicitly targeting productivity enhancement and having the best current core tool for agile-component MANO.  You see why I say that Cisco has to get on the ball.  It’s far from certain that IBM actually plans to broaden SmartCloud Orchestrator to target the full SDN/NFV/cloud universe, or that they could be successful if they did.  After all, most of you reading this have probably never heard of the product.

Cisco’s ACI is an SDN strategy that says that the current network can be made agile through the addition of APIs that allow applications to manipulate services more effectively and create them from infrastructure.  It’s a more logical approach for most enterprises and even operators because it protects a rather large asset base, but the VMware approach and in particular the partnership with Arista demonstrates there’s another way.  All you have to do is build an overlay network and couple it downward to cheaper boxes.  You get agility immediately, and as you age out your current technology you can replace it with plumbing.  If you connect IBM’s SmartCloud approach to this, you get something that could answer a lot of buyer questions in the cloud, SDN, and NFV.

The big bugaboo for IBM here, and for VMware and Arista and Cisco and everyone else, is the management part.  We still, as an industry, don’t have a validated model for managing transient virtual assets and the transient services or applications created from them.  We are thus asking everyone to dumb down to best efforts at the very moment we’re asking workers to rely on point-of-activity empowerment to make them more productive.

This makes management the boundary point that IT and networking have to vie to reach.  For IBM, coming from the potential strong base of productivity apps designed for tablet/cloud and with the best MANO offering available from a big player, success could be little more than delivering what’s promised.  For Cisco, it’s a matter of creating a complete solution for agile applications and resources that’s credible, not just Chicken-Little-the-Sky-is-Falling PR about how video traffic is destined to swamp networks if everyone doesn’t invest in more bits.  And of course, somebody else might step up.  We’re in the early stages of the future here and there’s plenty of maneuvering room.

The Fight at the Network/IT Border–and the Fighters

Anyone who believes in “cyclical spending” or “refresh cycles” or “secular recovery” in tech should take another look at the numbers after both Cisco and IBM reported yesterday.  We are obviously in a general economic recovery and yet tech is stagnant.  As it happens, though, the same two companies’ reports offer some insights into what comes next.  We have people who are thinking strategically, and those who are not.  We have people singing their positioning song too low, and others too stridently.  It’s like a soap opera.

The significant thing about both Cisco and IBM is that both companies are stuck in revenue neutral at best.  IBM has suffered revenue losses for nine quarters, and Cisco also continued its year-over-year decline in the revenue line.  Given that these companies are the market leaders in their respective spaces, there’s really only one possible conclusion, which is that buyers are trying to cut costs and technology is one place they’re cutting.  That shouldn’t surprise these guys; they see themselves cutting (Cisco plans to slash another 6,000 jobs) but somehow everyone doesn’t get the message.

There is nothing complicated about technology deployment.  A company these days has not one tech budget but two.  The first is the money that simply sustains their current capabilities.  The second is money that is added to the pot because there are new business benefits compelling enough to meet the company’s ROI targets—the “project budget”.  It’s this budget that creates longer-term revenue growth because it adds to the pool of deployed technology.  The sustaining budget is always under pressure—do more for less.  Historically, in good times, the project budget is bigger than the sustaining budget.  For the last six years, companies have reported their project budgets shrinking, and now we’re almost at the 60:40 sustaining versus project level.

I know that at least some people in both IBM and Cisco know this, because I’ve had conversations about it.  The interesting thing is that the two companies, facing the same problem, are responding very differently.

IBM’s theory is simple.  We have to reignite long-term technology commitment, and that’s what their focus on mobility is designed to do.  The theory is that mobile empowerment is the largest single opportunity to gain worker productivity, so it brings the largest possible benefit case to the table.  IBM wants to be the leader there, largely by embracing Apple’s aspirations in the enterprise tablet space and combining them with IBM’s software and cloud goals.

This is going to take a while.  The facts about project budgets have been known for a long time, so you have to ask why everyone hasn’t jumped on this.  The reason is that it’s much harder to drive a productivity-justified new project than just to replace an old server or router.  IBM is committing to a shift that will likely take two or three years to play out.  They should have started sooner (the signs have been there for almost a year and a half) but at least they’re starting.

Where IBM is going wrong here is in their positioning.  If you are doing something new, something that nobody else is doing, something that’s probably too hard for others to do easily, you don’t sit with your hands over your mouth, you sing like a bird.  IBM should be touting their new story from the rooftops, but they haven’t managed to get even savvy media to grok what they’re up to.  As usual, IBM is relying on sales channels to drive their story into the market, and that’s not good enough.  The salesforce needs a strong marketing backstop to be productive, and IBM continues to demonstrate it’s lost its game there.

Cisco?  Well here we have almost the opposite situation.  Cisco simply does not want to admit that new benefit paradigms are needed.  They want us to believe that a bunch of teen-agers who are downloading and viewing content at marginal returns that are falling by half year over year should be supported in their appetites no matter what the ROI is.  They want us to believe that all our household appliances and all our business devices are little R2D2s, eager to connect with each other in a vast new network with perhaps the chance of taking over from us humans, but with little else in the way of specific benefits to drive it.  Cisco thinks traffic sucks.  It sucks dollars from their buyers’ wallets into Cisco’s coffers.  All you have to do is demonstrate traffic, and Cisco wins.  Nonsense.  In fact, Cisco’s biggest problem now is that it’s expended so much time positioning drivel that it may be hard to make anyone believe they have something substantive.

To be fair to Cisco, they have a fundamental problem that IBM doesn’t have.  Worker productivity and even network services are driven by experiences now largely created by software at the application layer.  IBM understands applications, and Cisco has never been there.  The comments that came out recently that Cisco needs to embrace software are valid, but not valid where they were aimed.  It’s not about software-defined-networks it’s about software that’s doing the defining.  Cisco has confused the two, and now its fear of the first is barring it from the second.

No vendor is going to invest money or PR to shrink its own market.  SDN and NFV and the cloud—our trio of modern tech revolutions—are all about market shrinkage because they’re all about cost savings.  They’re less-than-zero-sum games, unless you target the revolutions at doing something better and not cheaper.

Cisco wants to be the next IBM, which begs the question of what happens to the current IBM.  IBM has weathered more market storms than any tech company; Cisco is an infant by comparison.  For Cisco to really take over here, they have to take advantage of IBM weakness, which they can’t do by doubling down on their own.  Think software, Cisco, in the real sense.  You have, or had, as much credentials in the mobile space as IBM.  Why didn’t you realize that SDN and NFV and the cloud were going to create opportunities for new benefits, services, and experiences that would drive up the total “R” and thus justify higher “I?”

Cisco has aligned with Microsoft, as IBM has aligned with Apple.  Microsoft is a solid Clydesdale against Apple’s Thoroughbred in terms of market sizzle, and they have the same problem of being locked out of emerging benefits as Cisco does.  But Cisco could still use the Microsoft deal to lock up the middleware and cloud models that would validate mobile empowerment and suck them down into the network layer.

That’s the key here for the whole IT and networking space.  About a quarter of all the value of new technology that new benefits could drive are explicitly going to IT, and another quarter to networking.  The remaining half is up for grabs, clustered around the soft boundary between the “data center” and “the network”.  If IBM can grab the real benefit case, support it fully with both IT and IT/network technology, it can move that boundary downward and devalue Cisco’s incumbency and its engagement model.  If Cisco can grab it, they can move the boundary up.  One of them’s singing a sweet but dumb tune, and the other is playing a great tune in their own mind.  Who fixes the problem first, wins all.

 

Is it Time to Consider Private-Cloud-as-a-Service?

Despite the fact that every vendor, editor, and reporter likely thinks that media attention to a concept should be sufficient to drive hockey-stick deployment, in the real world a bit more is needed.  One of the major challenges that all of our current technology revolutions—the cloud, SDN, and NFV—all have is that of operations cost creep.  Savings in capital costs, which are the primary focus of these technology changes, are all too easily consumed by increases in operations costs caused by growing complexity or simple unfamiliarity.  That can poison a business case to the point where the status quo is the only possibility.

Yesterday, a startup called Platform9 came out of stealth with a SaaS cloud-based offering that manages “private clouds” using principles broadly inherited from Amazon’s public cloud.  I use the term “private cloud” in quotes here to suggest that you should also be able to apply the Platform9 tools to virtualized data centers, which are inherently technology ancestors to most private cloud deployments.  The primary target for the company, in fact, seems to be businesses who have adopted virtualization and want to take the next step.  There also seem to be other potential enhancements the company could make that further exploit the flexibility of the term “private cloud”, which I’ll get to presently.

At a high level, the Platform9 is a management overlay on top of hypervisor resources and builds on OpenStack, but is designed to be an operationally more effective and complete way of viewing resources/infrastructure, applications, and users.  Details of the virtualization framework—ranging from containers to VMware, are harmonized through the tools so that users get the same management interface regardless of infrastructure.  That’s helpful to large companies in particular because many have evolved into virtualization in a disconnected way and now have multiple incompatible frameworks to deal with.

The Platform9 services provide for resource registration through a set of infrastructure views, and this is what an IT type would use to build up the private cloud from various virtualization pools.  Application or Enterprise Architects or even end users could then use a self-service portal to obtain compute and storage resources for the stuff they need to run.  The IT side (or anyone else, for that matter) can use panels to get the status of resources and instances allocated.

I’m not totally clear on where Platform9 fits with respect to DevOps tools.  Logically they should be a part of the “inside” processes at the IT level, and the assertion that the APIs are OpenStack compatible suggests that as well.  Presumably higher-level application deployment automation could exercise the user self-service interface, which might provide a second-level orchestration option that I think is the right answer for complex application and service deployment.

The goal here, obviously, is to make it possible for enterprise IT to deploy virtualized resources as private cloud services that have the same convenience as public cloud services would.  Certainly the Platform9 mechanism is likely to be considerably easier than gluing OpenStack onto current virtualized resource pools, and that could facilitate adoption of a true private cloud framework.  I think you could even assume that Platform9 would reduce the operations cost of virtualization, at least where there was a significant level of dynamism in terms of how machine images were assigned to resources.  After all, the boundary between virtualized data centers and private clouds is a bit arbitrary.

There are APIs and developer programs to support third-party extension of the platform and obviously Platform9 intends to add functionality.  Some of the features I’ve cited here (container and VMware, particularly) are future enhancements not available in the current beta release, and I’m sure the company expects to have to enhance the platform over time.  They’ll need to because there’s likely to be a lot of other approaches to the same problem.

As I noted earlier, I think the company should look at further generalization of that “private cloud” term to broaden the range of IT environments it can accommodate.  To offer IT on a self-service basis, it’s probably not optimal to think that all of it is deployed on a private cloud.  Obviously some is likely deployed on a public cloud or would be cloudburst there or failed over.  Equally obviously, some IT operations are neither based on virtualization nor on cloud computing; they’re the old business-as-usual multi-tasking server apps.  The point is that it is very unlikely that everyone will be all private cloud in the strict, explicit, sense and so the benefits of Platform9 would be limited unless it extends itself to cover the range of hosting options actually used.  This kind of expansion could let Platform9 provision PaaS and SaaS cloud services, hybrid cloud, and pretty much cover the opportunity space.

Another area I’d like to see the company address is that of operationalizing the infrastructure itself.  Cloud adoption is a combination of deployment and lifecycle management.  Some of the cost of private cloud is associated with the registration of the available resources and the commitment of those resources, but some is also associated with sustaining those resources using automated tools.  I suspect that Platform9 believes that third parties can enhance its offerings with automated lifecycle management, and if that’s the case I’d prefer they be explicit about that goal and also talk a bit about the APIs and the progress they’re making in having partners use them for this important task.  The company may also have some plans of its own in this area; it lists SLA management as a future.

I think that the Platform9 approach is interesting (obviously or I’d not have blogged about it).  It demonstrates that there’s more to the cloud than capex reduction, and that in fact operational issues can be profound problems for cloud adoption.  It demonstrates that there’s value to abstraction of “the cloud” so that users are more insulated from the technical details of cloud software.  If the company evolves their offering correctly, they have the potential to be successful.

This also demonstrates that the whole opex thing is perhaps one of those in-for-a-penny issues.  Ideally, private cloud deployment shouldn’t be exclusively private, or exclusively cloud, or even exclusively deployment.  It should be cradle-to-grave application lifecycle management, both in the traditional sense of ALM and in the more cloud-specific sense of managing the application’s resources to fulfill the expectations of the users.  We’ve had a tendency in our industry to talk about “opex” in a sort-of-two-faced way.  On the one hand, we say that it’s likely a larger cost than capex, which is true if we count the totality of operations lifecycle management.  On the other, we tend to grab only a piece of that large problem set.  Platform9’s real value will be known only when we know just how far they intend to go.

The timing of this is interesting.  We have clearly plucked most of the low-hanging fruit, in terms of cloud opportunity.  Absent an almost suicidal downward spiraling of costs driven by competition among providers, the IaaS cloud has to draw more on operations efficiency or it will stall.  We will likely see enhancements to cloud stack software to accommodate this, improved management/orchestration coming out of the NFV space, and additional commercial offerings.  All that is good because we need to optimize the benefit case for the cloud or face disappointment down the line.

How a Little Generalizing Could Harmonize SDN, NFV, and NGN

I’ve done a couple of blogs on SDN topics, but one of the important questions facing everyone who’s considering SDN is how it would fit in the context of NFV.  For network operators, NFV may well be the senior partner issue-wise, since NFV is explicitly aimed at improving capital efficiency, service agility, and operations efficiency and (at least for now) it doesn’t seem to be advocating a fork-lift for the network overall.  But what is the relationship?  It’s complicated.

The NFV ISG is at this point largely silent about the role of SDN in support of NFV.  In large part, this is because the ISG made a decision to contain the scope of its activities to the specifics of deploying virtual functions.  At some point this will have to spread into how these virtual functions are connected, but the details on that particular process haven’t been released by the body.  Still, we may be able to draw some useful parallels between the way that NFV MANO exercises virtual function deployment processes and how it might exercise SDN.

In the current spec, MANO drives changes to the NFV Infrastructure through the agency of a Virtual Infrastructure Manager or VIM.  In a sense, the VIM is a handler, a manager, that would presumably harmonize different cloud deployment APIs with a standard interface to MANO so that orchestration wouldn’t have to know about the details of the underlying resource pool.  Presumably OpenStack would be one of these options, and presumably things like Neutron would be exercised through a VIM.

The first question here is how the capabilities of resources as cooperative behaviors of functional systems can be represented.  What is it that a network of any sort uses to describe something like a VPN or a VLAN?  In OpenStack, this would be done by referencing a “model” that (while the implementation has evolved from the old Quantum networking to Neutron) is a logical structure with known properties that through custom plugins can be realized on a given infrastructure.  The Neutron approach, then, is to have some high-level abstraction set representing network behaviors, and then provide a plugin to implement them on specific gear using specific interfaces or APIs.

My view is that these models are the key to creating a useful representation of SDN for NFV.  If we assume that a “model” is anything for which we have at least one plugin available and which has some utility at the MANO level, then this approach allows us to define any arbitrary set of network behaviors as models, which unfetters SDN from the current limitations—we think of it as another way of creating Ethernet or IP networks.

The question is where to apply it.  NFV has an explicit requirement for inter-VNF connectivity, just as any cloud deployment architecture does.  If we think of SDN at the most basic NFV level we’d think of it as a way of connecting the VNFs, which would make SDN logically subordinate to the VIM, just as Nova and Neutron are subordinate to OpenStack.  I think many in the ISG (perhaps most) think this way, but in my view there are two problems with the notion.  One is that it doesn’t offer a solution to end-to-end networking and so can’t address the full benefit case operators are tagging as an NFV target.  The other is that applying it would tend to make NFV into nothing more than OpenStack, in which case the effort to date wouldn’t really move the ball much.

The alternative is to presume that there’s a handler, like a VIM, that handles network services.  A VIM could be a specific case of a general Infrastructure Manager (IM) that is responsible for harmonizing various APIs that control resources with a common interface or model that’s manipulated by MANO.  This approach has been suggested in the ISG already, though not fully finalized.  We could still invoke the “Network-as-a-Service” IM from inside a VIM for connectivity among VNFs, but we could also orchestrate the non-NFV service elements likely to surround NFV features in a real network.

This defines a challenge for the ISG, one that has existed from the very first.  There is logically a need to automate service deployment and management overall.  That need has to be filled by something that can orchestrate any service elements into a cooperative relationship, not just VNFs.  If the ISG defines, in its approach to MANO, something that can be generalized to support this high-level super-MANO, then it defines a total solution to service agility and operations efficiency.  It also defines the IM and the model(s) that IM represents as the way that SDN and NFV relate.  If the ISG doesn’t take that bold step, then it cannot define an NFV/SDN role because it doesn’t cover all the places the two technologies have to complement each other.

All this implies that there may be two levels of MANO, one aimed at combining logical service elements and one aimed at coordinating the resources associated with deploying each of those elements.  The same technology could be used for both—the same modeling and object structure could define MANO at all levels—or you could define a different model “below” the boundary between logical service elements and service resource control.  I’m sure you realize that I’m an advocate of a single model, something that works for NFV but works so independently of infrastructure (through the model abstractions of IMs) that it could deploy and manage a service that contained no VNFs at all.

You probably see the dilemma here, and also the fact that this particular ISG dilemma is one that’s shared with other bodies, including the ONF.  There’s a tremendous tendency to use scope control as a means of assuring that the specific needs of a process can be met, but that can create a situation where you know how to do something limited, but can’t address enough of the problem set to develop a compelling benefit case.  No standard is helpful if it solves a problem but can’t develop a business case to justify its own deployment.  Sometimes you have to think bigger or think “irrelevant”.

The ISG contributed what might well be the seminal concept of NGN, which is MANO.  It also contributed the notion of a “Manager” that represents infrastructure behaviors at the service level and allows service architects to build services without pulling infrastructure details into their composition.  What it now has to do is to fully exploit its own contributions.  Unlike SDN work, NFV work is arguably already above the network where services are focused.  If NFV can grow down, via generalizing its handlers and exploiting its notion of models fully, then it could not only drive its own business case, it could drive SDN deployment too.

At the end of the day, there’s only one network.  Somebody has to orchestrate and manage it.

What the VMware/Arista Deal May Mean to SDN and Networking

The deal between Arista and VMware may turn out to be one of the pivotal developments in SDN, and one of the pivotal steps in the evolution of networking.  Just how far it will take us is, at this point, not clear because there’s the usual mixture of issues, both tactical and strategic to consider.  And, as my use of the word “may” in the first sentence shows, it’s still possible this will be a flash in the pan.

Everything that’s happening in enterprise networking and a lot of what’s happening in service provider networking is linked to data center evolution.  A big part of that is the notion of multi-tenancy, but for the enterprise the most important driver is the continued use of virtual resources to leverage gains in physical server power and increased componentization of software.  The point is that for a decade now, everything important in enterprise networking has been driven from the data center, and that means the data center is a point of focus for vendor power struggles.

IBM has been the historical giant in data center evolution, but for the whole of the time that the data center has been getting more important, IBM has been losing strategic influence.  This can be attributed to an early withdrawal from networking (which took IBM out of the main event in connectivity), lagging virtualization and cloud positioning (IBM has struggled to be a “fast follower” there) and most recently a proposed withdrawal from x86 servers.  IBM’s loss here put the critical data center space more up-for-grabs than would have been the case normally.

Cisco and VMware have been the two trying hardest to do the grabbing, with HP a close third.  My surveys of enterprises has showed that it’s these three companies who are driving the bus in terms of both tactical and strategic evolution of the data center.  Of the three, obviously, only Cisco really takes things from a pure network perspective, and interestingly Cisco has been the one gaining strategic influence the most.  Cisco can be said to have established a physical ecosystem strategy for the data center, countering the logical ecosystem strategy espoused by VMware.  The conflict between these approaches is at the heart of the Cisco/VMware falling out.

The challenge for VMware, though, is that virtual/logical networking won’t move real packets.  You have to be able to actually connect stuff using copper and fiber, and even the early Nicira white papers always made it clear that there was a real switching network underneath the virtual software-based SDN they promoted.  VMware was leaving Cisco’s camel’s nose free to enter the tent, and I think that’s where Arista comes in.  Arista is both a physical network buffer against Cisco’s so-far success in the data center, and the representative of a position that Cisco doesn’t want to take—that networks are dependent on software even when they’re physical networks.

What all this means is that VMware and Arista will surely become the most significant challenge to Cisco’s continued gains in strategic influence.  If we see Cisco’s numbers fall short this week, it will likely be in part because Cisco has been unable to push a pure-hardware vision for the data center against even the limited VMware/Arista partnership we’ve had up to now.  Expect a full-court press from the pair in coming quarters.

The strategic question here relates to another of my blog points last week.  The best approach for SDN is likely to be a hybrid of physical and logical networking, an overlay network constructed on a more malleable model of physical networking.  The Street thinks that one of the goals of the expanded relationship between VMware and Arista is to create this explicit hybridization.  That’s bad for Cisco because it would validate the software vision of Arista and the hybrid model of SDN that has (IMHO) always been the greatest threat to incumbents.

What VMware/Arista could do is take advantage of the fact that building cloud or virtual data centers tends to build application networks.  In an enterprise, an application is kind of like a cloud tenant in that applications are deployed separately, often through their own ALM/DevOps processes.  Because at the core the networks are application-specific, the network has the potential of gaining specific knowledge of application network policies without any additional steps.  You can figure out what an application’s traffic is by using DPI to pull it from a vast formless flow, but if you’ve already deployed that application using specific tools on what could easily be application-specific subnets, you already know all that you need to know.

The partnership between application deployment and virtual networking, and the extension of that partnership down into the physical layer, is what’s important here.  Because VMware has a handle on application deployment in a way Cisco does not, the alliance forces Cisco to think more aggressively about its ACI positioning.  It also means that we could see other vendors who recognize that logical/physical network hybrids are likely the focus of the biggest marketing contest in the industry, to take their own shot at the space.

All of this is happening as the ONF is trying to push for accelerated deployment of SDN, and they may get their wish in that sense.  However, there aren’t standards in place to create what the Arista/VMware hybrid can produce.  Accelerating “SDN” in the broad sense may well change the focus of SDN to higher-layer functionality and away from diddling with device forwarding one entry at a time.  That would be good for the industry if the change of focus can be accommodated quickly, but it would be bad if what happens is an ad hoc logical/physical dynamic created by competition.  That would almost certainly reduce the chances the next generation of network devices would be truly interoperable, at least in the systemic sense.

That’s the biggest point here.  What Arista/VMware may do is to create a whole new notion of what a network is, a notion that goes deeper into applications, software, deployment than all previous notions.  That new notion could change the competitive landscape utterly, because it changes what everyone is competing for.

Thinking of SDN in Connection Network Terms

We are obviously a long way from exploiting the full potential of our revolutionary network technologies.  Years into SDN evolution we still can’t build a global network with it, for example.  Part of the reason is that we’ve not attacked the notion holistically.  I want to continue my exploration of “lessons learned” with respect to developing a model for universal management and orchestration for the cloud, SDN, and NFV.  Again, this isn’t about my ExperiaSphere activity in a direct sense, only about what’s come out of it, and what it means in that critical holistic sense.

At the high level, this is about what we could call connection networks.  A connection network supports the delivery of information between member endpoints based on cooperative functional behavior of the network components and in conformance to rules for handling that user and provider agree upon.  A connection network is essentially an abstraction, a black box that defines its properties by the relationship between its inputs and outputs.

IP and Ethernet create connection networks, based on adaptive cooperative behavior of the devices.  OpenFlow and SDN can obviously replicate the behavior of IP and Ethernet, and some believe they must do that to be useful.  My view is that while supporting current connection-network models is helpful in evolution from today to the future, it’s not essential even for that.  All you have to do is to interoperate with legacy networks in some way.  The really essential thing is that any model of SDN has to do something different, something better, or it will offer little incentive to remake infrastructure.

If you read about SDN, you’d conclude it’s a lot easier to point to things that aren’t SDN than to point to things that are, and not because of complexity of properties.  Everything claims to be SDN these days.  In part, that’s a fair claim because of my connection-network-black-box analogy.  If we use that as a jumping-off point, what we’re saying when we say “SDN” is that software has the ability to generate a variety of connection models from network resources.  It’s the connection models that software defines, the mechanism for creating the models inside the box is the function of management and orchestration—not in the limited NFV sense but in the broadest sense.  Functionally, this opens three broad ranges of choices for implementing inside the black box.

Option one is to provide hooks and tweaks to existing network protocols that can refine or change forwarding behavior, creating software control over connection network services.  This is the Cisco approach, roughly, and it has the advantage of building the future based on a fairly straightforward evolution of present devices.  The challenge this approach faces is that native behaviors of the underlying network, addressing, etc. are still exposed.

Option two is to build an overlay structure on top of the classic three layers of networking (the other four OSI layers are end-to-end).  This is what Nicira and many of the SDN players do, and the advantage it has is that it uses current, evolving, or future network devices and paths as transport resources and builds connection networks above them.  That not only frees up new paths of device evolution, it preserves current devices.  The disadvantage is that the overlay network can only segment connectivity and manipulate forwarding policies within the range of what the transport resources are providing.  A best-efforts IP or Ethernet path isn’t made better by just adding a layer to it.

Option three is to build a new forwarding model for the network devices themselves.  You would then control (in some way) the per-device forwarding process to secure the connection network behavior you wanted.  The classic approach to this option is OpenFlow, which uses a central control process to build forwarding rules that add up to that cooperative behavior inside the black box.  The advantage of this is that you couple connection network behavior down to where traffic is handled, which means you can control connection behavior as much as would be possible anywhere.  The disadvantage is that centralized control of network behavior can radically increase control traffic to manage the forwarding tables in devices and while central control elements can in theory respond to failures by creating new routes, there’s always a risk that a problem would cut a device off from the control interface, which could then mean a long and complicated process of finding a way for our device ET to phone home.

My experiences with SDN and NFV suggest that the best strategy overall would be a combination of the all three.  I think that overlay networks should be considered the “Level 3a” of the future, and that rules for connectivity should be created at this level.  Application and service awareness should be created and enforced in the overlay.  I also think that the overlay network should, through its own logical/virtual devices, be responsible for converting connection network requirements to transport-layer changes.  These could be applied for each domain of transport network (SDN/OpenFlow in a data center, legacy devices in a branch, optical tunnels between) in a way appropriate to the technology available.

The connection network notion also offers some benefits in defining a specific relationship between SDN and the cloud or NFV.  A connection network is a black box, remember.  In ExperiaSphere I called that abstraction a “service model”.  Any service model is a set of connection properties that can be created by anything that’s suitable, which means that the abstraction can be implemented using legacy technology, new OpenFlow devices, overlay networks, or anything else.  The consumer application, which might be orchestrating a service or application experience, relies on the properties, which are then assured by the management/orchestration practices of the implementation.

Connection networks also let us think about “network services” unfettered by assumptions about how they work based on projecting today’s service behaviors into future offerings.  You can forward packets based on the phases of the moon or the time of day or the location of a mobile user in a cellular network or the location of that user relative to the best content cache.  You can use grade of service, traffic type, application, or anything else as a metric.  Describe the properties of a useful service, consider it to be a service model representing a connection network, then use MANO principles to deploy what you want.

This approach makes migrating to new hardware a choice.  If highly useful services are highly inefficient when built as overlays onto legacy infrastructure in one or more places, you replace the infrastructure to improve efficiency.  But the service works through the process.  So when we think about how SDN should work, we should be thinking first of how we want connection network services to work.

What I Learned About SDN and NFV (that’s Not Pretty)

The Service Architect Lifecycle tutorial I just completed for ExperiaSphere taught me a lot about management and orchestration for SDN, NFV, and the cloud.  There’s nothing like having to explain how something would be used to focus your attention on what needs to be done!  I don’t want to dive into all the issues, which hopefully the tutorial itself will expose and aren’t appropriate to my public blog in any case, but I do want to do something that’s outside the scope of the tutorial, which is talk about how my induced insights couple with industry trends.

Don Clarke, perhaps the spiritual father of NFV overall and now at CableLabs, said nearly a year ago that operators were going to need to understand an NFV strategy in the context of a complete service lifecycle in order to validate its benefits.  The first step in that lifecycle process is the Architect phase, the place where a specialist who understands the NFV implementation builds the elements from which services will be created, by harnessing the behaviors of resources and systems of resources.  Every operator knows this is essential, and yet we don’t really hear much about lifecycles and Architects in NFV announcements or see them illustrated in PoCs.  Architects do service and resource modeling up front, creating structures that can then support service automation when the service is ordered and as it’s being used.

We don’t hear about this because it’s complicated and most NFV proponents don’t want to address that complexity.  Building a complete picture of NFV is complicated because network services and infrastructure are both complicated.  But the fact that reality is complicated doesn’t justify oversimplification.  If NFV is going to deploy, if SDN and the cloud are going to succeed, we have to come up with an approach for building applications and services that is as agile as these revolutionary technologies allow.  We also have to support our new agile framework, and our evolution to it, at such a high level of operations automation as to make even the complex easy and cheap to do.  Which is why we can’t start off our processes by defining that complexity as “out of scope” or “provided at a higher level”.

I propose a revolutionary thought.  We are the “higher level” here.  Anyone who wants our industry to get better, to be as vibrant and valuable in the future as it was in its golden age, has to step up and try to solve complicated problems by facing them down.  There are those who will disagree with the way I’ve approached SDN, NFV and the Cloud in my open-source-biased ExperiaSphere architecture, but one thing I’m confident about—they will have enough information about the complex stuff that we must address to know what they’re disagreeing with.  It took me over sixty slides to describe the Architect lifecycle stage.  Most “NFV” product presentations use less than a third that number to describe everything they do.

ExperiaSphere is targeted at universal management and orchestration, which is a superset of NFV, but I don’t think that my scope is too broad—NFV’s scope is too narrow, and so is the scope of SDN and other “revolutionary” activities in our industry.  The original goal of reducing capex by exploiting software and COTS has for most operators given way to a new goal of improving service agility and operations efficiency.  But even if that evolution of goals hadn’t happened, NFV has to be a lot more than we think it is.  I believed that from the first, when the NFV Call for Action was published.  I’m certain of it now.

Service automation can only work if we have an abstract model of a service that is first used to marshal the necessary resources (deployment) and then sustain each resource and each level of cooperation through the life of the service.  If we do anything else, then we can’t interpret an event, a change in conditions, and respond in a way that restores normalcy because we don’t know what normal is and in what direction it lies.  A service is a finite-state machine not only at the high level, but at the level of each functional element.  It’s a clockwork-like interplay of interdependent pieces.  If you focus on a single way that a single function is implemented (firewall, for example) you can’t change the overall agility or economics, any more than you can make a clock by making one wheel in the mechanism.

The NFV ISG took a critical, seminal, step into the future with the concept of MANO—management and orchestration.  They introduced the idea that you had to build NFV elements the way software architects build applications—by binding components using tools that in the software space would have been called “DevOps”.  The problem is that they didn’t go far enough.  They’ve limited their work to the enclaves of hosted functionality within a service, and agility and efficiency have to be service-wide to be relevant.  The work being done there is good, but it may be only part of a solution—an appeal to that mythical “higher layer”.  The OSS/BSS hooks in the reference architecture look like the same stuff that manages devices today.  How does that improve agility and efficiency?

The same thing can be said for the ONF and for OpenDaylight.  Technically both are doing good work, but we’re still muddling around in the basement of service creation and ceding all of the visible pieces of the service and service management to higher-layer applications, north of the famous “northbound APIs”.  We could, in theory, use apps to build new and unheard-of services based on explicit forwarding rules.  We could make “virtual EPC” a reality, transform our notions of security and access control, and open whole new retail opportunities.  All of this stuff is up north, where everyone fears to tread, and so we’re proposing to transform networking by using totally new forwarding technologies to replicate what we can already do.  And somehow, this new technology is going to be so operationally compelling that efficiency will justify deployment.  How?  Where do we address those efficiencies?  With a hook to OSS/BSS that looks just like what we have today.

Operations is the ultimate stumbling block for everyone.  The TMF had a number of very strong ideas about the evolution of operations as a component of service agility and management costs.  They had an initiative to create open operations processes based on componentized software principles—OSSJ it was called for “OSS Java”.  They had an initiative to define federation among operators, called “IPsphere”.  They had a vision of steering events to service processes based on the service contract, “NGOSS Contract.”  All of these are still theoretically projects but none have moved the basic structure of the TMF—the SID data model and the eTOM operations map.  ExperiaSphere has shown me that rigid data models are in impediment and that operations processes don’t have a native flow or structure; they’re simply components in a service-wide state/event engine.

I think we have to look at all of our revolutions in a new way—why not, if we believe they really are revolutions?  One thing I learned is that it’s not about interfaces or APIs or data models, it’s about flows and bindings.  Information flows through object-modeled structures to build and sustain services.  A model of a service is a set of objects through which parameters flow downward to drive resource behavior and management information flows upward to automate changes in response to problems and to involve human operators where needed.  So the TMF’s work should be focused on binding objects into services with flows, which I told them.  The NFV ISG’s work should be focused on information flows through logical elements, a virtual management view, which I communicated to the ISG too.

OK, why am I the standard here?  Maybe I’ve got this wrong.  Maybe operators don’t want agile services or efficient operations.  Maybe somehow the high-level stuff everyone seems to be dodging is getting done.  Maybe vendors have a better approach to this than the one I’m advocating.  But I don’t think we’re going to revolutionize networking and transform its benefits by taking all of the limitations of the past and re-implementing them using new technology choices.  Why not take the missions of the future and optimize technology to support both those missions and the evolution to them?  That is what the TMF and the NFV ISG and the ONF and OPN and everyone else should be doing.  Complexity, if faced squarely, can be minimized by making the right choices in architecture.  I’m happy to put my concept out there for everyone to assess.  I’d like others to do the same.  Let’s stop doing SDN or NFV platitudes and start doing architectures.  Show me your sixty slides on Service Architect and we can talk on level ground.

How NFV Might Set the IT/Network Balance

The most fundamental question that we in networking faced is not whether we’ll see a transformation of the network, but what form that transformation will take.  As I suggested yesterday in my blog, fundamental changes in what a network delivers to its customers will always require changes in how the network operates.  We’re seeing those fundamental changes in content delivery, mobility, social networking, and more.  We’re seeing direct initiatives to change networking—SDN and NFV.  How these will interact will shape our future as an industry and determine who lives and who doesn’t.

All of our network revolutions are, under the covers, based on increased reliance on separated IT-hosted intelligence for embedded intelligence.  We’re looking at this shift to simplify service creation and operations, lowering costs and also improving services and revenues for operators.  For the enterprise, the changes could open new partnerships between information resources, mobile devices, and workers—partnerships that improve productivity.  There are a lot of benefits driving this change, so it’s important that we look at how that shift impacts networking, IT, and the players.

Cloud computing creates the most basic threat to traditional network devices, even if it’s an accidental impact.  Tenant control and isolation in the cloud has relied on software-switch technology from the first.  This vSwitch use may not displace current switches or routers, but it caps the growth of traditional switching and routing in cloud data centers.  My model suggests that fully a third of all new applications for switching/routing in the next three years will end up being implemented on virtual technologies, and that’s money right out of network hardware vendors’ pockets.

It gets worse, though.  The popular VMware “overlay network” acquired with Nicira creates what’s effectively a set of tenant/application overlay networks that ride on lower-level (so far, largely Ethernet) substrates.  The overlay model focuses visible network features “above” traditional network devices.  That dumbs down the feature requirements at the lower (hardware) level, which makes differentiation and margins harder to achieve.

SDN could take this further.  If you presume that application/tenant connection control is abstracted into virtual functionality, then it’s easy to pull some of the discovery and traffic management features from the devices—where they’re implemented in adaptive behaviors and protocols—and centralize them.  Thus additional feature migration out of the network drives even more commoditization, ending with (so many say) white-box switches that are little more than merchant silicon and a power supply.

It’s not just what’s happening, but where it’s happening, that matters.  All this feature exodus is exiting features from the data center, and that has profound short- and long-term implications.  The short-term issue is that data center network planning and evolution is driving enterprise networking overall, which means that as we dumb down the network we reduce the ability of network vendors to control their buyers’ network planning.  IT vendors are a kind of natural recipient of strategic influence as the tide of simplification of network devices pulls gold flakes of value out of “the network” and carries them into servers.

The long-term issue is that if the data center is the planning focus for buyers, then principles of network virtualization that are applied there would inevitably spread to the rest of the network.  I’ve noted many times that if you presumed application-based virtual networking in the data center, you could establish the same structure in branch offices and then link your virtual networks with simple featureless pipes.  In theory, transit IP networking could be displaced completely—you use tunnels or groomed optical connections depending on bandwidth and costs.  With this change, all the current security concepts other than digital signing of assets are at risk.

The impact of this on vendors is obvious.  Just the introduction of cloud computing is enough to create considerable downward pressure on prices for data center switching (Facebook’s approach is a proof point), which means that vendors must accept lower profit margins.  Every additional step in the migration of valuable features further erodes vendor pricing power, so even if you assume that a vendor (Cisco comes to mind as an example) would attempt to provide SDN-like capabilities without obsoleting current hardware, the current hardware prices would still decline.  There is no stopping this trend, period.

NFV now enters the picture, and it’s a bit more complicated in terms of impact.  At one level, NFV is an explicit step in this feature-migrating trend because it’s an architecture designed to host service components rather than embed them.  At another level, though, NFV could be the way that networking gets back into its own game.

I’m not talking about the kind of NFV you read about here.  I don’t personally think that much of what’s being “demonstrated” today for NFV has much relevance either to NFV conceptually or to the market overall.  The focus of most “NFV” is simply to host stuff, which isn’t much more than cloud computing does naturally.  Making this hosted alternative to appliances agile and operationally efficient is the key—and not just to NFV’s own business case.

The important thing about NFV, IMHO, is that it could provide a blueprint for creating application/service instances that mix network and hosted functionality and manage the whole service lifecycle automatically.  NFV is therefore a kind of mixture of IT principles and network principles, befitting the fact that the cloud is a mixture of the two technologies at the service level.  A vendor who can harness the union of IT and networking is harnessing the very path that value migration is flowing along.  They can expedite that migration (if they’re an IT player) or they can try to direct more of the feature value into traditional network lines.

You can see some of this process today with the whole issue of in-cloud versus in-device feature hosting.  If you’re an IT vendor you want everything to be hosted in servers.  If you’re a network vendor you’d like everything to be hosted in devices.  Neither extreme outcome is likely, and so you’re going to have to address a future where features run where they fit best, which means that network devices will gradually become “feature hosts” as well as bit-pushers.  Those who really control an NFV solution will be able to optimize the platform choices in favor of their own solutions.  Those who don’t will have to fit somehow into another vendor’s approach, which is hardly likely to leave them an optimum spot to grow.

The reason NFV is important is not because it might drive down opportunities for network appliances; competition and commoditization will do all the damage needed in that area, and provide most of the operator benefits.  It’s important because it’s an industry initiative that, as a byproduct of its primary mission, is forcing us to focus on that critical IT/networking union, and on the concept of composing services like we compose applications.  That’s important to NFV, to networking, to the cloud, and to vendors.

So who wins?  Right now, Alcatel-Lucent and HP are both poised to maybe-do-something-useful in the NFV space.  IBM seems well-positioned to act, and so does Oracle.  For the rest of the IT and networking vendors, it’s still not clear that anyone is ready to step up and solve the union-izing problem NFV presents.  Yeah, they’ll sell into an NFV opportunity someone else creates, but they don’t want to do the heavy lifting.  However, the barriers to doing NFV right aren’t all that high.  I’ve worked on two projects to implement an open NFV model, and neither of them require much resources to deliver.  Leadership in NFV, then, is still up for grabs, and who ends up grabbing it most effectively may not only become the new leader in networking, they may define how networking and IT coexist for decades to come.

Juniper Proves You Can’t Hedge a Revolution Bet

Sometimes it seems like companies miss the simplest truths.  A market, over time, has to either change or stay the same, right?  Why then do we seem to have a problem accepting that in the networking space?  We presume change and static markets, seemingly, at the same time.  Juniper’s recent quarter, and the steps the company has taken leading up to it, are an example of ignoring the obvious, I think, but Juniper is far from the only culprit.

Since 1996, Juniper has been the technology powerhouse of networking, the company who had unsurpassed engineering and a seemingly foolproof strategy for figuring out what the next big thing would be.  Example: They predicted the cloud years before we even thought about the concept.  This year, their approach has been to drive share price up by cutting costs and buying back stock, assuming that they can sustain market share and lower expenses, so earnings would be higher.  That’s a bet on a static requirement set from buyers, a lack of revolution, from a company who has previously excelled at revolutionary technology.

If we look at the evolution of the Internet into “the cloud”, of network connections into experience deliveries, we see a fundamental transformation in the way that matters most—from the demand side.  To think that these changes wouldn’t impact the way that networks are built and what’s sold from them is simply not sensible.

Juniper would argue they’ve been revolutionary here, through things like Contrail.  The problem is that Juniper and others have failed to face these obvious, inevitable, changes.  SDN and NFV aren’t revolutions themselves, they’re spin-offs of these great demand-side changes, but you have to recognize that truth to deal with them effectively, and that’s something Juniper’s new management doesn’t seem to be willing or able to do.

If the current strategy of boosting profits by sustaining revenue and cutting costs was working, you’d expect to see it in the numbers.  In the current quarter, Juniper was just in line in performance and light on guidance both in revenues and operating margins.  This suggests that Juniper can’t hold its revenue line while cutting costs, which would threaten the whole “new paradigm” of profit growth.  They have to step up sales, and quickly.

To do that, so the story goes, Juniper is going to focus on big enterprises at the expense of the broader commercial customer base and on emerging providers like the cloud providers, what they characterize as high-growth segments.  That sounds an awful lot like jumping from a space where you know you can’t prove market share growth to the Street, into one where your failure to grow market share isn’t yet proven.  If the same products are involved here, which they are, then why will focus changes from large proven markets to unproven ones of unknown size be a step forward?

In any case, the whole idea of being a powerhouse by shrinking, of exploiting market constancy when we’re clearly in a period of great change, is dumb.  A static market is a commoditizing market, and Juniper could never cut costs enough to compete with price leaders like Huawei.  Without dramatic new visions for services and network/cloud integration, even SDN and NFV will simply put more pressure on network equipment prices.  We needed a revolution—Juniper in particular.  They didn’t support one.

Juniper went astray when they brought in Kevin Johnson from Microsoft, IMHO.  First, they never seized the initiative that their cloud insights represented, and now rival Cisco is driving cloud data center technology with storage network additions.  How does this fit with Juniper’s targeting of cloud providers?  They didn’t develop Junos Space, their management and service-layer platform.  They were a leader in a space that is now coalescing into NFV and SDN’s northbound applications and service management and operations efficiency.  Those are the key technology drivers for buyers today, whether they’re big enterprise or SMB, carrier or user, cloud or legacy.  We are trying to make networks part of applications and experiences and not just mechanisms that deliver access to these things.  Juniper had that in Space, and a couple of years ago they walked away from it.  Now they struggle to position things that would have been formidably positioned had they just thought their own insights through.

Now, the problem is that you can’t cut back and change customer targets and somehow get your strategic mojo back all at the same time.  Contrail as an SDN strategy needs Space-like air-cover.  Juniper’s cloud strategy needs something other than sales targeting, it needs productizing, and NFV is the current initiative that defines how cloud providers operationalize large-scale complex service.  Yeah, I know it’s supposed to be about service features, but how many Internet revolutions will it take for us to realize that “service”, “application”, and “feature” are all software components being hosted on something and connected through something else?

Alcatel-Lucent also had its problems this quarter, and perhaps for similar reasons.  The company has product families with low margins, contracts for professional services that aren’t profitable, and a fairly high level of operations costs.  What might make them different is that Alcatel-Lucent is really trying hard in the cloud, SDN, and NFV—trying to link their product strategies with market revenue opportunities and buyer needs.  CloudBand is still the most-recognized of the cloud and NFV solutions, and it wouldn’t take a whole lot to make it truly comprehensive.  That may be Alcatel-Lucent’s moment of truth; can they shed some of the parochialism that’s haunted them since the merger and just do something right even if it steps on the toes of other business units?

Cisco has yet to weigh in with numbers, but Street expectations are modest given the results of rivals Juniper and Alcatel-Lucent.  I think Cisco has some of the right answers—you need to be an IT player to win in networking these days, and the data center is the financial center of opportunity for everything.  They just don’t yet recognize that the cloud and the data center are really more about software, because without software differentiation all hardware is a commoditizing mess.

Mobility, the drive to the cloud, and the rise of hosted experiences versus connections as the thing users value about networks, have changed the world.  Vendors have to change their own business models at the high level to accommodate the fact that their customers are changing their basic notion of the value of networking.  That doesn’t mean “get smaller”, it means “get smarter.”