Can We Really “Distribute” NFV to Devices?

Nothing is ever as easy as it seems, particularly in networking.  We had some recent proof of that this week as RAD announced a bit more about its “Distributed NFV” concept.  On the one hand, “classical NFV” says that you want to remove functions from devices and host them in servers.  RAD says that’s not necessarily so; you want to put functions where they make the most sense, including in devices.

Other vendors have talked about hosting stuff on device blades.  Juniper took that position quite a while ago with its universal edge concept, but they tied the idea more to SDN than to NFV so it wasn’t recognized as an argument for a more flexible attitude toward function hosting.  With RAD joining the chorus of “let the functions kids play on devices too”, we have to ask whether “distribution” means distribution to any arbitrary hosting point.

RAD and Juniper would likely agree on their justification for edge-hosting in devices; you put features at the customer edge because that’s where the traffic is.  Firewalls, NAT, security, and other virtual functions make sense if they can be kept in the data path, and put close enough to the customer that you don’t have to add complexity to pull out individual customer flows to act on them.  I agree with this; you have to factor in the affinity of a function for the natural data path if you’re going to optimize its location.

What’s not clear, though, is whether you’re extending NFV to host on devices or just hosting on devices and citing NFV function distribution as your excuse.  You’re doing the former if you have a generalized architecture for NFV orchestration and management that can accommodate hosting on devices.  You’re doing the latter if you’re not linking the device hosting very explicitly to such a general NFV model.  Right now, I think most people are doing the latter.

I just spent some time reviewing the data modeling for the CloudNFV initiative, which defines services by assembling what the data model calls “Nodes” but which represent the TMF-SID components like “Customer-Facing Service” or “Resource-Facing Service” or the NFV notion of a “Package”.  Each of these defines one or more service models to collect interfaces, and each service model has a Service Model Handler that can deploy the service model if called on.  The standard Service Model Handler uses OpenStack and deploys via Nova, but you could develop one to deploy on devices too.  Thus, if you are satisfied with the overall NFV capabilities of CloudNFV, you could create an NFV-based device-hosting model with it.

The trick here is to be able to define services so that for each unit of functionality, you can define optimizing properties that determine the best place to put it.  I think operators would not only accept but applaud a strategy for function hosting that admitted devices into the picture as long as they could define how they believed the capex benefits of server hosting related to the potential complexity and operations-cost risks of that strategy.

Probably the big determinant to the value of distribution of functions into network edge devices (or in theory any other device) is that question of benefit/risk balance.  It’s likely that capital costs would be lower for handling functions via servers, but it’s not clear what the operations costs would be.  Both SDN and NFV currently suffer from management-under-articulation.  There is no clear and elegant model proposed for either to insure that injecting dynamism into a network by hosting software to control its behavior doesn’t explode complexity and drive up operations costs.

If you under-articulate management, you tend to create a convergence on the legacy model.  SDN and NFV either have their own management architecture or they inherit some existing architecture, which means inheriting the current one.  That would mean that we managed “virtual devices” that corresponded with the real devices we already manage.  It sounds logical, but let me offer an SDN example to show it’s not like that at all.

Suppose we have a hundred OpenFlow switches and an SDN controller, and above the controller we have our northbound application that imposes a service model we call “IP”.  We want to manage this, and so we have one hundred “router MIBs” to represent the network.  The problem is that the “routers” are really composites of local forwarding and centralized route control, so we can’t go to the routers for full knowledge of conditions or for management.  We have to create proxies that will “look like” routers but will in fact grab state information not only from the local devices but from central control elements.  How?

If we centralize these virtual MIBS they’re not addressed like real router MIBs would be.  If we distribute them to the routers so we can address the “SNMP Port” on the router, we then have to let every router dip into central management state to find out what its full virtual functional state is.  And you can probably see that it’s worse in NFV, with functions scattered everywhere on devices whose real management properties bear no resemblance to the virtual properties of the thing we’re hosting.

So here’s my take on the notion of function-hosting in devices.  You have to look at whether there is a cohesive deployment and management strategy that treats devices and servers as co-hosts.  If there is, you have truly distributed NFV.  If not, then you have a way of hosting functions in network devices, which doesn’t move the ball and could in fact be flying in the face of the operators’ goals with SDN and NFV.

Common Thread in the SDN/NFV Tapestry?

We’re in the throes of one show involving SDN and NFV and in less than 2 weeks, the SDN World Congress event where NFV was launched in the first place will be underway.  It’s not surprising we’re getting a lot of SDN and NFV action, so let’s take a look at some of the main stories in more detail.

AT&T’s main spokesperson on things SDN and NFV, Margaret Chiosi, is calling for SDN APIs to support the abstractions at the service and network level.  This to me is the best news of the day, even the week, because it is a comment beyond the usual hype and drivel we see on SDN/NFV these days.  What takes it beyond is the “abstractions” reference.

You can’t have service without a service model.  SDN has to be defined at the top by the services it presents, and these services then have to be mooshed against the network topology to create specific forwarding requirements.  Absent any specific mechanism to describe either of these, the layers of SDN that I’ve called the “Cloudifier” (services) and the “Topologizer” (network) have no clear input and thus no clear mechanism for connection to software, networks, or much of anything.

We have specific examples of “Cloudifier” abstractions with OpenStack’s Neutron, but these abstractions are too server-specific and the process of enhancing them or adding to them isn’t flexible enough or tied enough to the general needs of telecom.  We have a number of ways of describing network topology, but the only way we have to exchange topology in a reliable way comes from the adaptive behavior of devices and their advertising of routes.  The purist center of SDN thinking is trying to eliminate both these things, so we surely need to know what replaces them.

Perhaps the lack of APIs, which would seem table stakes for software-defining any sort of network, is why Ciena is reported in LightReading as saying that we don’t take SDN seriously enough; “under-hyped” is the term reported.  I’m reluctant to say that anything in this industry is under-hyped or even admit that state is possible to achieve, but the point Ciena is making is that new optical capacity can’t be used to direct bandwidth where it’s needed and thus some of the capacity improvements technology could bring are wasted.

Nay, I say.  The problem isn’t getting bandwidth where it’s needed, it’s getting it to where somebody will pay for it.  The challenge of SDN is that technology that impacts delivery can only impact costs.  Technology that impacts experience can impact revenue, but the great masses of network service consumers won’t pay for “experiences” that are only improvements in capacity or network performance.  In some cases, like here in the US, there are regulatory impediments to doing anything to make the Internet recognize premium handling.

This is what takes this back to the issue of APIs.  If SDN could be made to offer service models that would enhance higher-level experiences it could actually link increased capacity to increased revenue.  Content delivery is an example; the FCC exempts content delivery networks from net neutrality, so we’d be free to build one that extends close to the access point and offer premium handling on it.  Can a content delivery network be built with current IP or Ethernet service models?  Clearly it can, but if SDN is really valuable then it would have to be true that we could build a better one by building in custom forwarding rules—a new service model.  That means APIs capable of supporting that, right Margaret?

At some levels, you could say that HP’s SDN ecosystem and Alcatel-Lucent’s NFV ecosystem, already announced, could address this, but that may be a stretch.  Developer ecosystems are built around APIs more than they build the API.  It is possible that the experiences gained by the two parties in their respective ecosystems would uncover the need for the kind of abstraction-tuned APIs that are essential to service models, but it’s far from certain.  It’s also possible that competing vendors like Cisco, Ericsson, Huawei, Juniper, and NSN—or even IBM, Microsoft, or Oracle—could come up with the right answer here, but that’s even less certain.  The dynamic of the network industry today is to fight like hell for the near-term dollar and let the future take care of itself.  I don’t see how SDN is going to create enough near-term dollars without the ecosystem in place, which won’t happen till those APIs are popularized.

Another point that’s being raised, this one in an article by InformationWeek is that conflicting views of SDN implementation are at least now being recognized and explored.  The article, about the conflict between physical and virtual visions of SDN, points out that some vendors (not surprisingly, network hardware vendors among them) view SDN control as applying to both physical and virtual devices while others say that ties one of SDN’s legs to a tree before the race.

Here again we have some good points, but not necessarily good discussions.  I’ve said for months that SDN to succeed has to be made a two-layer model, where the top layer is totally virtual and agile and is directly controlled by software, and the bottom layer is controlled by policy to provide services to the top.  This is actually the SDN model that gets the most deployment, but because we’re still not thinking in these terms, we’re not answering the critical questions.  One is how the two layers interact—the boundary functions exposed by the bottom as services—and the other is how the policies are enforced onto the real infrastructure.

Which I think gets back to service models.  We’re messing up a whole industry here because we can’t think in modern terms about the product of a network—the service model.  I’m hoping that the SDN World Congress will at least raise awareness of that issue so we can get it behind us.

Who Will Orchestrate the Orchestrators?

Most people would agree that NFV crosses over a lot of subtle technology boundaries.  It’s clearly something that could (and IMHO likely would in most cases) be hosted in the cloud.  It’s something that is certain to consume SDN if there’s any appreciable SDN deployment, and it’s something whose principles of orchestration and management are likely to slop over into the broad (meaning legacy device) network.

The orchestration and management implications of NFV are particularly important, in part because we don’t have a fully articulated strategy for orchestration of services today and we don’t really have a handle on management of virtual, as-a-service, or cloud-based resources.  There have been some announcements of element/network management systems that incorporate orchestration principles; Amartus, Overture Networks, and Transition Networks all have management platforms that are converging functionally on a holistic model for management and orchestration.  What’s not yet clear is how far all these players will go and how long it will take to get there.

NFV presumes virtual network functions (VNFs) deployed on servers (in the cloud, or in some other way) will be linked to create cooperative functional subsystems that can then be composed into services.  This process is a deploy-connect model that has to be carried out in a systematic way so that all the pieces that have to participate are there somewhere and are mutually addressable.  This is very much like cloud application deployment, so much so that a number of the current NFV implementations (CloudBand and CloudNFV) use cloud technology to support the actual deploy-connect process (orchestration decides what to deploy and connect, and where, so it’s still a higher-level function).

The complication is that a realistic service consists of a bunch of elements that are probably not VNFs.  The access network, metro aggregation, optical transport, and other network elements are either not immediate NFV targets or not suitable for server support at all.  There are also likely to be legacy devices in networks that have to be used during a transition to NFV, even where such a transition will be complete eventually.  That means that the orchestration process has to handle not only virtual functions but legacy devices.

There are two basic models for implementing orchestration.  One is to say that you’ll orchestrate in the same way that you manage, meaning that there would be a series of element/network management systems that would organize service resources within their own domains, and the other is that there’s a master-orchestrator that will take a complete service description and a complete network/resource map, and then place everything and connect it.

Vendors have tended to support the first of these models, in no small part because NFV as a standard focuses on orchestration of VNFs, which leaves the orchestration of legacy components to existing management systems.  What’s not clear at this point is what the implications of this situation would be at the service management level.

The TMF and OSS/BSS vendors have generally worked to create service templates that describe how a service would be put together.  These templates model the service, but not necessarily at the level needed to create a complete recipe for optimized deployment.  Certainly it’s likely that these templates would need to be updated to hold the information necessary to deploy VNFs, unless it was possible to define each element of the model of a service as being targeted at a single specific NMS/EMS that could “sub-orchestrate” the deployment of that element.  Even there, the question is whether that sort of decision could be made atomically; would it not be true that the optimum placement of VNFs depends on the way the rest of the service, the legacy part, is supported?

Another issue is that orchestration is where resources are assigned to service mission fulfillment.  You can’t do management in a meaningful way without an understanding of the mission (meaning the service) and the state of the resources.  Since orchestration sets the relationship, it’s the logical place to set the management processes into place.  The optimum management vision for a network that has to be extensively orchestrated for any reason is one that recognizes both the need to create service-relevant management and resource-relevant management at the same time, in a multi-tenant way, and at full network scale.  If everyone is orchestrating their heart out at a low NMS/EMS level, how do you provide cohesive management?  That problem occurs even today in multi-vendor networks.  Imagine it in a network whose “devices” are a mixture of real and virtual, and where the MIBs that describe a real device reflect variables that don’t even exist on servers or in data centers!

My personal view has always been that the TMF models can handle this.  You can define service and resource models using the SID (GB922) specification, and the principles of GB942 contract-arbitrated management (where management flows through contract-defined orchestration commitments to find the real resources) seem to resolve many of the management problems.  But even here, the question is whether there’s a way to aggregate NMS/EMS-level orchestration to create a unified service model, without creating some higher-level orchestration process to do it.

SDN presents similar issues.  It’s likely for a number of reasons that SDN will deploy in a series of “domains”, encompassing a data center here or there and perhaps some metro functionality in between.  Maybe in some cases there will even be SDN in branch locations.  The management of SDN has to change because it’s not possible to look at a device and derive much information about the service model as a whole; that knowledge is centralized.  Yet the central knowledge of what was commanded to happen doesn’t equate to the state of the devices—if it always did, you’d not need to manage anything.

So what we’re seeing here is that the two networking revolutions of the current age—SDN and NFV—both demand a different model of management and orchestration.  What model is that?  We’re going to need to answer that question pretty quickly, and while current orchestration offerings by vendors may aim too much at the NMS/EMS level, they’re a useful start.

SDN: Growth or Just Changes

The world of SDN continues to evolve, and as is usually the case many of the evolutions have real utility.  The challenge continues to be the conceptualization of a flexible new network framework that exploits what SDN can do, and at an even more basic level provide the framework by which the different SDN models can be assessed.

One of the most potentially useful announcements came from HP, who say they want to build an SDN ecosystem by providing an SDN app store and a developer environment complete with toolkit and validation simulator.  This is built on top of HP’s SDN controller of course, but it’s arguably the first framework designed to promote a true SDN ecosystem.

I don’t have access to the SDK for this yet; the URL provided for the developer center is broken until November when the tools arrive.  As a result I can’t say what the inherent strengths and limitations of the framework are.  Obviously it’s disappointing that the program doesn’t have the pieces in place, but it’s not unreasonable.  I do think that HP should publish at least the API guide in the open, though.  People need time to assess the tools and their potential before they commit to an ecosystem, particularly one as potentially complex as SDN.

The challenge with any developer ecosystem is the level at which the developers are expected to function.  OpenFlow, as I’ve said before, is simply a protocol to manipulate switch forwarding tables.  To presume that developers would be building services by pushing per-switch commands directly is to presume anarchy, so HP has to be providing higher-level functionality that lets programmers manipulate routes or flows and not tables and switches.  Even there, a basic challenge of SDN is that applications that can manipulate switches, even indirectly, can create serious security and instability flaws.  Logically there has to be two levels of SDN, one that lets applications control basic connectivity on a “domain” basis and another that lets infrastructure providers manage QoS, availability, etc.  How that gets done is critical to any ecosystem, IMHO, and I’d sure like to see HP document their model here.

Another SDN development is Version 1.4 of OpenFlow, which enhances the flexibility of OpenFlow considerably but also raises some questions.  The new version has features that are so different from those of the previous version that it will be essential that switches and controllers know whether they’re running the same thing.  That sort of change is always hard to make because “old” software rarely prepares for “new” functionality.  It’s also virtually certain that some of the features of the new OpenFlow will have to be exposed via changes in the controller APIs, which means that applications that run on top of controllers may also have to be changed.  This collides with the notion of building ecosystems, since nothing aggravates a developer like having the platform change underneath.

Still, it’s pretty obvious that SDN is growing up.  Not surprisingly, players like Cisco and rival Huawei are promoting more SDN-ready technology, perhaps even starting to build things that go beyond exploiting SDN in a limited way toward actually depending on it to fully access features and capabilities.  We’re also hearing about SDN layers a bit, but in what I think is an unfortunate context.  We hear about “data center”, “carrier”, or “transport” SDN, and I think that this division blurs some pretty significant boundaries and issues.

At the top of the network, where applications live, the notion of software-defining networking is fairly logical.  What you want to do is to allow for the creation of new service models (connectivity control based on something other than legacy L2/L3 principles; see my blog yesterday) and at the same time support the notion of multi-tenancy since applications are for users and there’s a load of users to support.  As you get deeper, though, you are now supporting not an application but a community.  It’s always been my view that something like OpenFlow, designed for specific forwarding control, gets more risky as you go down the stack.  Further, at some point, you’re really dealing with routes at the transport level, even TDM or optical paths that don’t expose packet headers and aren’t forwarded as packets but as a unit.  Here we have both a technical and a functional/strategic disconnect with classic OpenFlow.

The OSI model has layers, and I suspect that the SDN model will need them for the same reason, which is that you have to divide the mission of networking up into functional zones to accommodate the difference between network services as applications see them, and network services as seen by the various devices that actually move information.  We’re not there yet on what the layers might be, and arguably there’s a real value in “flattening” the OSI layers down to something more manageable in number and more logical in mission.  We aren’t going to harmonize these goals if we never have real discussions on the topic, though, and we’re not having them now.

We also need to understand how SDN and NFV relate, and how both SDN and NFV relate to the cloud.  If operators are going to host a bunch of centralized SDN functionality or a bunch of virtual functions, it seems to me that they’d elect to use proven cloud technology to do that.  How does proven cloud technology get applied, though?  SDN supports service models that cloud architectures like OpenStack’s Neutron don’t support, because SDN in theory supports any arbitrary connection model.  How do we use the cloud to distribute “centralized” SDN control so it’s reliable and can be exercised across a global network?  How does NFV work in supporting both SDN centralized technology and its own function mission, but in the cloud?  Can it also deploy cloud app components, and build services from both apps and network functions?  There are a lot of questions to consider here, and a lot of opportunity for those who can answer them correctly.

Finding the True Soul of SDN

Cisco’s announcements on Network Convergence System (NCS) follow their normal pattern that’s fairly characterized as “chicken-little” sales stimulation.  The network sky is falling because everyone and everything is going to be connected and demanding their bits be transported.  Suck it up, operators, and start spending!

Underneath this opportunism is a reality, though, which is that there will be changes to traffic and network facilitation of applications.  I think that Cisco, underneath it all, actually has a handle on what those changes might be and is preparing for them.  The problem is that it’s complicated and it’s coming over a longer cycle than they’d like, so they need to toss some blue pieces of china up into the air and generate some near-term sales momentum.

Networking has always been a layered process, a way of creating connectivity by dividing the overall mission among functional elements in an orderly way.  Data links create reliable box-to-box connection, and then a network layer takes over to offer addressing and open connectivity.  The Internet works that way today.  SDN, which is an element of Cisco’s NCS, offers us an opportunity to do things a little differently.

“Routing” in a functional sense (meaning at any level) means that we make a per-device forwarding decision set that adds up to creating a service model, which is really a connection model.  The presumption behind nearly all of our connection models today is that we have a community of users who expect to communicate.  They may not all talk with each other, but they might.  Therefore we build open models to address and connect.

When you move to Cisco’s “Internet-of-Everything” what changes is that you add a bunch of “things” that aren’t designed for open communication.  The traffic camera or process sensor needs to communicate with a control center, and in fact needs to be protected from other access.  If you add a bunch of these new things to the Internet what you break is less traffic than the model of open connection.  With SDN you could fix it.

SDN (at least in its purist OpenFlow form) says that connection is explicitly controlled by a “central” meaning non-device, non-adaptive, process.  Traffic goes from hither to yon because you explicitly made that happen.  There is no presumption of open connectivity, only “Mother-may-I”.  If you have a traffic camera or a process sensor, you can drive a route between them and everything is fine.

The challenge for this is that it’s clearly not going to scale.  We can’t build routes for every flow on the Internet, not even in a small geography.  The challenge of SDN in a technical sense is addressing this natural trade-off between the explicit control of connectivity that would literally change everything online, and the challenge of scaling central control to manage all the connections.  Right now, for all the hype and crap to the contrary, we’re not meeting that challenge.

I’ve postulated that the solution to the problem of SDN is to break “the network” into two layers.  Up top we have an SDN layer that’s highly agile and based largely on software processes that are largely (but not necessarily entirely) concentrated at the network edge.  This layer runs on top of a transport infrastructure that is fully connective and reliable, offering the same relationship to the higher SDN layer as the data-link layer has with the network layer today.  The goal is to strip requirements for resiliency from the top SDN layer to reduce issues of scalability.  You’d then create “user services” by manipulating only higher-layer SDN paths.

In many cases, we could use this structure without being per-flow in our thinking.  Collect all the workers in a facility by job classification, so that all workers in a given class have the same application access needs.  Make an access subnet out of each class.  Now collect all the application components for each application you run in a data center subnet, per application.  To build a WAN, you now drive an SDN connection between each access subnet and the data center subnet for the applications it needs.

For an Internet of things, we can have an access subnet that contains the process sensors or other telemetry sources associated with a given application, and we can have our data center subnet for applications.  The same concept would then let us connect one to the other.  What we’ve done is to create a new service model using the natural agility of SDN.  If we have carefully placed “facility switches” in our SDN layer, there are only minimal places that have to be made flow-aware, and only minimal scaling of central control processes is needed.

This is what we need to be doing with SDN, people.  We need to think of the ways that controlling forwarding explicitly lets us build service models that better suit present and evolving applications than the permissive connection models we have today.  This is what “Northbound APIs” on OpenFlow controllers should be doing, what Cisco should be (and, I think, sort-of-is) doing with NCS.  I also think other SDN players are seeing some of this same stuff.  We’re just not hearing about it, because as always vendors will expose what sells the most the fastest and not what the future really holds.  It’s up to the media, analysts, bloggers, and people who read the industry news to demand that we get more depth on the future so we can better judge the present.

Service models are in a very real sense the only thing that matters in SDN.  If we can’t generate new and useful service models with it, then the only value SDN might offer is slightly lower capital cost on equipment.  “Slightly lower” because I don’t think that we can replicate an IP or Ethernet network at current scale with both lower capex and opex.  We’re building SDN for a limited future if we don’t think beyond our current service limits.  Looking for something to beat your vendor up on?  Think service models.

Selling the Future for a Hot Dog?

In mid-October the SDN World Congress marks the anniversary of the first NFV white paper’s publication.  SDN is far older than that, and the cloud is older still.  I attended a cloud conference last week, and while it was interesting and even insightful in spots, it’s pretty clear that we’re still missing a big part of cloud reality, and even more of SDN and NFV.  Most of what we read on the latter two topics is PR hype.

The problem is one of “mainstreaming”.  If you look at the distribution of humans by height you find most people are clustered around a centerpoint value creating a classic Maxwell curve, but there are very short and tall people.  In IT, most companies have fairly consistent patterns of deployment of technology and standard uses for it, but some companies have unusual applications and issues.  There’s the norm, and there are deviations from it.

Now, imagine someone selling chairs with seats three feet from the floor, or perhaps 18 inches.  There’s a ready market for this sort of thing among people who are unusually tall or short, and if we looked at the early adoption of our chairs we might be tempted to project hockey-stick growth based on the explosive early sales.  What’s missing is Mister Maxwell; the notion of a mainstream market that represents the dominant value proposition and the dominant opportunity.

If all we want to do with the cloud is host applications that are rightfully targets of server consolidation, we’re creating success that doesn’t extend to the center of our distribution of opportunity.  If we add to the “cloud market” by adding in everything that’s remotely hosted, every third-party server relationship, we’ll demonstrate that markets grow if you define them in a bigger way, but we won’t address the way that center-of-the-curve gets empowered.

In the cloud, success comes from a simple truth:  If the cloud is revolutionary for its flexibility and agility, then it will succeed if we have applications that can exploit those properties.  We need a revolution in application design to create mainstream cloud winners, and we’re not talking about what that revolution might look like.

Look now at SDN.  We have, with the OpenFlow model, the ability to create forwarding rules based on centralized policies, so that any arbitrary connection model could be supported.  Suppose we said that all traffic had to transit at least three switches, or move through a state whose name started with the letter “A” or stayed in the same time zone?  Stupid rules to be sure, but illustrative of the fact that with forwarding control you have service model control.  Total agility in building routes.  So what do we do with it?  Nothing that we’re not able to do without it.  Our efforts are almost totally focused on recreating Ethernet and IP, which we already have.

Success in SDN is another simple truth; SDN will succeed if it can deploy and exploit service models that can’t readily be generated with traditional adaptive-behavior network architectures.  This is why I’m all hung up on those “northbound APIs”, but it’s also indicative of a problem with how we look at SDN today.  We’re not talking about service models and they’re the only thing that matters.

Now it’s NFV’s turn.  NFV will succeed if it can frame a model of extemporaneous service generation, a model agile enough to give users what they’ll pay for with minimal explicit service creation and minimal human intervention for management.  The value of moving functionality from current appliances to hosting platforms is limited; it’s a good way to pluck some low apples to fund a real future.  It’s not enough to assure it.

That’s the common denominator here.  The cloud, SDN, and NFV are being framed as cost-savings strategies.  If they succeed in that form, they’ll make our industry smaller and duller.  But do we accept the notion that there’s nothing innovative left to do with connectivity, with features, with applications and productivity?  I don’t, and most of you reading this probably don’t either.  But we’re stuck in a model that not only accepts but embraces that view.  That’s why most CIOs now report to CFOs.  Instead of innovating, they’re sharpening financial pencils.

I read a book on the American Revolution recently, and any objective account makes it clear that it was hard.  I don’t think most revolutions are easy, and our cloud, SDN and NFV ones aren’t going to be an exception to that.  Whether we’re trapped in quarterly-profit obsessions by SOX or have simply surrendered our thinking caps, we’re not being innovative enough with our so-called “innovations”.  A great and revolutionary concept becomes an excuse to raise a few banners and maybe sell hot dogs at the mob scenes.

We can double total investment in software and hardware, in networking and IT over the next five to seven, based on past industry trends, if we can simply recapture our fire, our ability to think not about costing less but about doing more.  It’s growth that won’t last forever; that past history says that we’ll exhaust the potential of the new paradigm and the rate of growth in the industry will slip again—until we find the next revolution.  We can’t rest on our laurels in technology, because people don’t have USB interfaces in their bellybuttons.  We aren’t natural technology consumers, so technology has to be framed to enrich our lives and that enrichment only sets the bar higher for the next wave.

I’m going to be at the SDN World Congress in mid-October to talk about the future—not the pedestrian one that’s unfolding today but the exciting one the industry just may still be capable of driving.  See you there.

Can We Realize AT&T’s Domain 2.0 Goals?

AT&T made some news by announcing its latest Supplier Domain Program, based on the goal of developing a transformative SDN/NFV architecture for the network.  The company is pledging to begin to buy under the program late this year and into 2014 but has not issued any capex updates.  That tells me that they expect the program to have only a modest impact on equipment selection in the next year or so, since savings are the goal of SDN/NFV and none are being projected.

I think that Domain 2.0 is more a shot across the bow of vendors than a serious commitment to either SDN or NFV.  The fact is that we don’t have any mature notion of how to build a large-scale SDN/NFV network and in particular how we’d manage one.  The big impediment is that network vendors simply don’t want to cooperate, and that’s what I think AT&T is trying to address.

What is interesting is that AT&T is biasing its justification for the new program toward better service agility and other revenue-side points.  I don’t believe for a minute that this shows a lack of interest in cost savings (despite the fact that capex plans aren’t being changed for 2014),  but I do think that AT&T has like other operators come to realize that the value of SDN and NFV has to come in large part from boosting the “R” in ROI and not just cutting the “I” part.  That also makes the initiative more palatable to vendors, who otherwise might see their own revenues vanish to a point.

And surprise, surprise, just as this is happening, Cisco is introducing its Network Convergence System, a kind of silicon-up approach to SDN and NFV that is aimed at transforming the network in the very way that AT&T says it wants.  NCS is a three-party marriage—custom chip, promised enhancements to control-plane handling and control-to-traffic integration, and improved management coupling down to the chip level.

This sort of vertical integration from the chip to the skies isn’t new; Juniper made the same claim when it announced its new chips a couple years ago.  The problem was that they diluted the benefit claims in their fuzzy and rudderless QFabric launch, the first application of the new chip.  However, I think the lesson—nay, lessons—of both announcements are the same.  First, you have to think about the network of the future very differently, and second that difference is going to be hard to square with our conception of how networks are built.

Let’s start at the top.  Agility in service creation isn’t helpful if the services you’re creating have features invisible to the user.  But service features that are visible have to be either built above the current network connection services, or they have to transform the connection model itself.  NFV is the logical way to do the first, and SDN could be made to do the second.  But NFV so far isn’t claiming to be about creating new features, but rather about hosting old ones that were once part of appliances.  SDN, so far, is either about sticking an OpenFlow pipe up in the air and hoping a service model lands on top of it, or about using software to make an OpenFlow network do Ethernet switching or IP routing.  How fast we stick both feet back into the network quicksand of the past isn’t a good measure of the value of agility.

I’ve tried, in the CloudNFV initiative for which I’m the Chief Architect, to make a clear point that the virtual functions of NFV have to be both network features and cloud application components.  They can’t be things that have to be laboriously written to a new set of standards, nor can they be simply stuff that reluctant vendors have been forced (via things like AT&T’s Domain 2.0) to unbundle into software form.  If we want software to be the king of the network’s future value, then we have to take software architecture for networking seriously.  That means drawing value from current tools, providing a platform for creating new stuff, and supporting the major trends in the market.

One of which is that mobile point-of-activity empowerment I keep talking about.  In agile services, we can build something that the market needs a bit faster than before.  I’m recommending extemporaneous services, services that self-compose around the requests of users.  We are not going to capitalize on the new value of networking by continuing to consider a “service” to be a long-planned evolution of features designed to capitalize on something.  How do we know the “something” is even valuable, and how would a user come to depend on that “something” if nobody offered it?  We have to be able to define software using the same agile principles that we expect to us in designing networks.

That’s the service-model side.  SDN and NFV have to be coupled, in that NFV has to be able to define services it expects SDN to create.  Those services shouldn’t, can’t, be limited to simply IP and Ethernet, because if that’s enough then we don’t have any real new value for SDN to address.  How should connectivity work in an “extemporaneous” service?  How do we secure and manage something that’s here one minute and gone, or different, the next?  That’s the question that AT&T’s Domain 2.0 implies we must answer, and the question that Cisco’s NCS is implied to answer.  But implication isn’t the same as proof, and we have yet to see a single extemporaneous application of the network from Cisco, or an architecture that’s convincingly framed to support such a thing.  Can Cisco do it?  I think so, but I want them to step up and be explicit on the “how” part.  So, I’m sure, does AT&T.

And that’s the hope here.  If AT&T did drive Cisco to claim more agility is at the core of NCS, then can any network vendor ignore the same issues?  I don’t think network vendors will willingly move into the future unless buyers threaten to push them out of the present, which may be what AT&T is doing.  Other operators, I think, will follow.  So will other vendors.

Can Blackberry be the Next “Private Phoenix?”

Well, the story now is that Blackberry is following Dell’s example and going private.  Certainly there are simple financial reasons for that, but I also have to wonder whether some of the same “SOX-think” I talked about with Dell be operating here too.  If there’s a turnaround in play for Blackberry it’s going to be a long process, one that’s not going to benefit from having executives pilloried on an earnings call every quarter along the way.  Then there’s the issue of corporate raiders who’d try to grab the company (and, of course, still might).

The question is whether there’s a strategy for Blackberry that’s as clean as that for Dell, and I think that the answer is “Possibly”.  You have to make some assumptions, but here’s an angle that might be followed.

Start with the big assumption, which is that their version 10 OS is a good one.  A number of experts tell me that it is in fact elegant and in some ways superior to both iOS and Android, so I think we can at least say that Blackberry has something it can start with.

The next assumption is that Blackberry has to be thinking along the same lines that a Microsoft executive noted recently, which is that the company has to create a kind of cloud symbiosis.  In Microsoft’s case the goal would be to make a set of cloud services an augmenting point for both the Windows desktop/laptop platform of old and the maybe-emerging appliances.  Blackberry has always had a mail/message service that was credibly a cloud service so this isn’t too much of a stretch either.

Given these two assumptions, I’d suggest that what Blackberry has to do is to morph its current OS into an expanded model in two parallel dimensions.  First, it needs to frame up a microkernel of the current OS, a minimal element that can run on anything and call on “BBcloud” services for functionality.  In effect, it’s Firefox OS.  But second and unlike Firefox OS, Blackberry has to make it possible to pull more than microkernel functionality into devices that have the capability, either permanently to create something like a traditional phone or tablet or as needed.  That means that the appliance becomes an extension of the cloud and not just a portal onto it.  Software features migrate between device and cloud.

The third point is that BBcloud services then have to be expanded considerably.  What Blackberry needs is the ability to make BBcloud into an ecosystem, something that lets developers build mobile and cloud applications not based on “compatible” code but on the same code.  Everything, from apps that are currently nailed to the appliance to stuff that’s not yet written, is designed to be run in BBcloud and hosted wherever convenient.  I’d suggest that Blackberry even provide a mechanism (via a “shell”) to run this stuff on a standard PC.

You probably see where I’m heading with this.  Blackberry has a framework that looks like an extensible mobile environment, one that lets applications and resources spread out from the device into either local user compute resources or into the cloud.  It’s bold enough and powerful enough conceptually to give the company a shot.

The risk, you’ll say, is that whatever Blackberry can do here the other players like Microsoft, Apple, and Google can do as well, and likely better.  Yes, but they have two disadvantages versus Blackberry, two related ones in fact.  The first is that this sort of ecosystem would likely be seen by all the competition as enormously risky to their current business model.  The second is that this risk would be smashed in their face like a pie every quarter, because the competitors are still public companies.

Whether Blackberry (or Dell) sees things this way, it seems pretty clear that “going private” might actually be the way that we get innovation back into our industry.  We have VCs who want to fund social burgers that can be put up for next to nothing, run against a small chance of a buyout, and if they win they pay big.  It’s like the lottery.  We don’t have the kind of financing that builds fundamental tech revolutions out there in VC-land because they see the process as being too lengthy and risky.  But if private equity steps in to take a company like Blackberry private, we could have what’s essentially a giant startup with branding, funding, installed base, and all kinds of other neat stuff.

You also have to wonder whether this sort of thing, promising as it may sound, could actually work.  Many revolutionary approaches have proven to be ineffective, after all.  Australia, who took the bold step of creating a kind of national broadband project with its NBN initiative, is now facing major scaling back, cost issues, and a complete route of the NBN board.  It may be that Telstra (the national carrier who frankly should have been doing this all along) will end up effectively “being” NBN or at least running it.

What the NBN thing proves is that the big disadvantage of revolutionary business mechanisms like the go-private-and-scuttle-SOX approach is that nobody has done it at scale and nobody knows whether it will work financially, much less whether the now-private company can pull something together and make it work.  The private cloak can hide revolutionary development from shareholders wanting instant gratification (like corporate raiders and hedge funds) but it’s not going to make that revolutionary development happen.  Both Dell and Blackberry will have to get their internal R&D processes tuned to a whole new world, where strategy matters a lot, after having lived for decades in a world where it didn’t matter at all.  But isn’t strategy how we got to where we are today?

Can Dell Turn “Private” into “Leader?”

No less investment maven than Carl Icahn believes that Michael Dell’s successful campaign to stave off his takeover and take Dell private was “just one recent example of a ridiculously dysfunctional system.  Lacking strategic foresight, the Dell board for years presided over the loss of tens of billions of dollars in market value at the hands of CEO Michael Dell. Instead of deposing him, the Dell board froze out shareholders and voted to allow the CEO to buy the company at a bargain price using shareholders’ own cash.”

I agree that the stock system is “ridiculously dysfunctional” but I think Icahn may be more of a contributor to dysfunction than an impartial arbiter of functional-ness.  What Icahn didn’t do, arguably the bubble and SOX did, and it’s just possible that Michael Dell is on the track to salvation of the system and not its destruction.

The biggest problem with innovation in tech today, IMHO, is the fact that since the excesses of the NASDAQ bubble in the late ‘90s and the Sarbanes-Oxley Act (SOX) designed to curb them, companies have been shortening their sights to the point where executives can’t see beyond the next quarterly numbers.  That’s equivalent to wanting to trek like Lewis and Clark without being able to see beyond your feet.  Nobody wants to bet on the future because they can’t be rewarded for doing that.  You deliver your numbers, which are at least a small improvement over the old, by cost reduction if you can’t manage revenue gains.  And if you’re into network equipment or server sales you likely can’t manage revenue gains because all your buyers are staring at their feet too.

So do we fix SOX?  Heck, we can’t even fund a government or pass a debt ceiling extension, or approve nominees for key positions, or much of anything else.  Imagine trying to do something substantive in Congress these days.  So what’s the answer?  Well, it just might be to take your marbles and go home.  Leave the public company rat race and go private, where you can build for the future without having your hands slapped for not making the numbers in the next quarter.

I’m not Michael Dell’s confident, so I can’t say if this is what he has in mind, but the opportunity to leverage the “special benefit” of being “private” rather than “public” will be there whether that was Dell’s intentions or not.  Cloaked in the righteous anonymity of private-ness, Dell could quietly foment the revolution of networking and IT, emerge when they’re done like an erupting volcano, and totally transform the industry and confound competitors of all types.  Pull it off and Dell would be in a position to be what Chambers wants Cisco to be, the “number one IT company”.

Of course, it’s the pulling it off part that’s the trick here.  To revolutionize the industry like this, Dell would have to know what revolution they wanted to back and how to back it.  While I’m sure that some in Dell understand the benefit that being a private company could bring them, I’m less sure they know what they want to do next.

I think Dell and I agree on something, which is that “the cloud is the future”, but that’s too vague a notion to build a revolution on (anyway, everyone’s been saying that so the slogan has been used up).  The thing that’s the revolutionary part is the creation of an agile experience framework in the cloud, something that vendors and standards groups have sort-of-begun by looking at how things like IMS/EPC and CDN could be cloud-hosted.  Cloud-hosting isn’t the whole answer, though.  You have to add in a lot of flexibility and agility, the notion of not only highly dynamic resource-as-a-service frameworks but also a more dynamic way of building software to support experiences (something I blogged about just last week).

The most complicated parts of our revolution may not be defining slogans or pieces of doctrine, but simply finding a place where it can start.  A good revolution has to balance the realities of the present with the value of the future it promises; it has to build on the current system so that there’s not widespread problems associated with adoption.  But it can’t get bogged down by supporting the present.  I contend that our SDN, cloud, and NFV revolutions are too tied to current paradigms.  They’re all cost-based.

You don’t take up the barricades to save 8% or 12% on your phone bill.  The promise of the future that a revolution must convey—that Dell must embrace—has to talk about the wonderful things the future will bring.  Bringing slightly cheaper forms of the present won’t cut it.  All of our technology revolutions are just that, revolutions on the technology side that will impact how you do something and perhaps how much it would cost to do it, but they won’t deliver things different than we already have.  Those who are promising otherwise are just making empty promises, lacking convincing detail because they have no detail to offer.

I don’t have it either.  I believe the future is a fusion of our current revolutions with mobility to create a kind of gratification framework that grows around everyone and links them to the stresses and opportunities of the world they face.  I think a good tech company could build that framework, if they were unfettered by SOX.  You, Dell, are about to be cut free.  Can you take up the challenge?

Why Not “Software-Defined Software?”

We have software defined networks, software defined data centers, software defined servers.  What’s missing?  I contend it’s the most obvious thing of all; software designed software.  We’ve built a notion of virtualization and agility at the resource level, but we’re forgetting that all of the stuff we’re proposing is being purchased to run software.  If you have the same tired old junk in the software inventory, how are you going to utilize all that agility and flexibility?

I said yesterday, referencing Oracle’s quarter, that the key to the future for both Oracle and the others in tech was supporting point-of-activity empowerment, which simply means using agile technology to give people what they want, when and where they want it.  In a simple infrastructure sense, PofAE is a marriage of cloud and mobility.  In a deeper sense, it’s also a renewal of our agility vows, but starting at the software level where all experiences start.

If you look at software and database technology today, you will find all kinds of claims of agile behavior.  We have “loose coupling”, “dynamic integration”, “activity-directed processes” and so forth.  It might seem like software has been in the lead all along, but in fact software practices still tie developers and users to some rigid frameworks, and this rigidity is a barrier to full realization of the benefits of elastic resources.

One simple example of a barrier is the relational database.  We think of data as being a table, where we have key information like account number and we have supporting data like address or balance.  The problem with this in an agile world is that the structure of the data creates what might be called a “preferred use” framework.  You can efficiently find something by account number if that’s how the table is ordered, but if you want to find everything where balance is below or above or between specified points you have to spin through everything.

There’s been talk for ages about semantic relationships for the web or for data, but despite all this debate we’re not making much progress in moving out of the tabular prison.  We should be thinking of data in terms of virtualization.  If we can have virtual servers and virtual networks, why not virtual RDBMS?  Why can’t an “abstraction” of a relational table be instantiated on what is essentially unstructured information?  Such a thing would allow us to form data around practices and needs, which is after all the goal of all this cloud and software-defined hype.

Another example is in the area of processes.  If I develop an application for a single server, I can run it on that single server and lose efficiency because it utilizes it badly.  I can also consolidate it into a virtual data center where it shares server capacity with other apps, and it will be less inefficient.  Perhaps if I run it in the cloud where there’s even more economy of scale, I can be a little more efficient still.  But at the end of the day, I’ll still be running the same software and supporting the same experience that I started with.  In fact, realistically, I’ll be trading off a little performance to get that higher efficiency so I’m actually doing less for the experience than before.

A popular way of addressing this is to componentized software, to break it up into functional units.  Something that was once a monolithic program can now be broken into a dozen components that can be reassembled to create that thing we started with, but also assembled in different ways with other components to create other things.  The most architected extreme of this is found in the Service Oriented Architecture (SOA) which lets us build components and “orchestrate” them into applications through the use of a business process language (BPEL) and a workflow engine (like the Enterprise Service Bus or ESB).  There’s a language (WSDL) to define a service and its inputs and outputs to make browsing through service options easier, and to prevent lining up a workflow whose “gozoutas” don’t match the “gozintas” of the next guy in line. SOA came along almost 20 years ago, and it’s widely used but not universal.  Why?

The reason is the Web.  SOA is a complex architecture filled with complex interfaces that provide a lot of information about dynamic processes but that are difficult to use, particularly where you’re dealing with best-efforts network or IT resources.  The Internet showed us that we need something simpler, and provided it with what we call “REST”, a simple notion of a URL that when sent a message will do something and return a response.  It’s elegant, efficient, simple, and incredibly easy to misuse because it’s hard to find out what a given RESTful service does or needs or returns.

We clearly need to have something in between, some way of creating applications that are not just dynamic or malleable, as REST and SOA are, but are extemporaneous in that they don’t presume any process structure or workflow any more than a truly agile data environment would presume a data structure.  With something like this, we can handle “big data”, “little data”, clouds and NFV and SDN and all of the other stuff we’re talking about at the software level where IT meets the user.  If we don’t do this, we’re shadowboxing with the future.

You don’t hear much about this sort of thing out in the market, probably because despite the fact that we think software and IT is the bastion of innovation, we’ve let ourselves become ossified by presuming “basic principles” that guided us in the past will lead us indefinitely into the future.  If that were true we’d still be punching cards and doing batch processing.  The virtual, software-defined, world needs something more flexible at the software level, or it will fail at the most basic point.  What software cannot, for reasons of poor design, define will never be utilized at the server or network level, even if we build architectures to provide it.