Coming Soon: An Open Architecture for Orchestration and Management

There have been a number of commitments by network operators to new technologies like the cloud, SDN, and NFV.  Last week, Metaswitch earned its carrier stripes with a win in Europe, one of the first (of many, I’m sure) non-traditional IMS deployments.  Their stuff has been used in at least one NFV PoC. Verizon and AT&T are both committed to the cloud and operators are deploying SDN too.  But I’m sure you agree that all these deployments are islands—no operator has committed to a complete infrastructure refresh based on next-gen technology.

The benefits operators hope for largely center on “service agility” and “operations efficiency” and yet the “island” nature of these early trials makes it impossible to realize these goals because there just hasn’t been enough of a change in infrastructure to drive agility up or opex down overall.  Truth be told, we didn’t need these revolutions to meet the agility/opex goals, we needed a revolution in management in general, and in particular in that new wonderful thing called “orchestration”.

Many of you have followed my discussions on management and orchestration models, and even engaged in a number of lively dialogs on LinkedIn on one or more of the blogs.  Some have asked whether I’ll be presenting a complete view of my management/orchestration model, something that starts at the top where services start and ends with resources.  People want something that works with the cloud, SDN, NFV and legacy network and IT technology, that does federation among technologies and operators, and that’s compatible with an open-source implementation.

Well, the answer is that I’m going to be publishing a complete orchestration model later this summer.   I’ll be releasing a complete vision based on the two key principles I’ve blogged about—Structured Intelligence and Derived Operations, and it’s based in large part on my ExperiaSphere open-source project, though it expands on the scope considerably.  The presentation will be made available as a YouTube video on my channel and as a PDF on SlideShare.  The material will be free, links can be freely distributed for non-commercial attributed purposes, and all the concepts I’ll illustrate are contributed into the public domain for all to use with no royalties or fees.  I’ll be using the ExperiaSphere website to communicate on this material as it’s released, so check there for news.

I want to stress that I’m not starting a project here; I can’t contribute that kind of time.  What I’m doing is making a complete picture of a suitable orchestration-layer architecture available, in fact making it public-domain.  If standards groups want to use it, great.  If somebody wants to launch an open-source project for it, likewise great.  Vendors can implement it or pieces of it if they like, and if they actually conform to the architecture I’ll give them a logo they can use to brand their implementation with.  None of this will cost anything, other than private webinars or consulting that a company elects to do on the model.

 

That’s a key point.  Some people already want to do a webinar or get a briefing, and as I said I can’t donate that kind of time any longer.  I will make a complete video and slide tutorial (likely two, an hour each) available when I can get it done.  Meantime I want to get the idea exposed where it counts, with the network operators and some key standards bodies.  Therefore I’m going to start by offering service providers who are members of either the TMF or the NFV ISG the opportunity to attend a single mass webinar at no charge.  This will be scheduled in May 2014 at a specific date and time to be announced.  I’m asking that service providers who are interested in signing on contact me by sending an email to experiasphere@cimicorp.com. Say that you’re interested in the “SI/DO Model” and please note your membership in the TMF or NFV ISG, your name, company, and position.  I promise not to use this for any purpose than to contact you for scheduling.  Slots are limited, so I can’t accept more than five people per operator for now, and even that may have to be cut back.  You’ll have to promise not to let others outside your company sit in.

At some point in June I’ll be offering for-fee private webinars to network equipment vendors (and to service providers who want a private presentation), billed as a consulting service and prepaid unless you’re already a CIMI client.  These sessions will be exclusive to the company that engages them but you’ll still have to provide email addresses of the attendees, and you may not invite people outside your company.  If you like you can host these private webinars on a company webinar site rather than mine, but if you record the sessions you must not allow access outside your company without my permission and under no circumstances can the material be used in whole or in part as a part of any commercial or public presentation or activity.  If you’re interested in participating in one of these webinars, contact me at the same email address and I’ll work out a time that’s acceptable to both parties.

At this same point, I’m offering the TMF and NFV ISG and also the ONF the opportunity to host a mass webinar for their members, at no cost.  This will be a shorter high-level introduction designed to open discussion on the application of my proposed model to the bodies’ specific problem sets.

I’m expecting to have the open public video tutorials and slide decks will be available in August, and these will include feedback I’ve gotten from the operators and standards groups.  Anyone who wants to link to the material can do so as long as they 1) don’t imply any endorsement or conformance to the model unless I’ve checked their architecture and agreed to it and 2) they don’t use it in any event or venue where a fee is paid for viewing or attendance.  I want this to be an open approach and so as I’ve said, I’m releasing the architecture into the public domain.  I’m releasing the material with these simple restrictions.

Contact me at the email above if you’re interested, and be sure to let me know whether you’re an operator, a member of the standards/specification groups I’ve noted, or a vendor or consulting firm.  I reserve the right to not admit somebody to a given phase of the presentations/webinars if I’m not sure of where you fit, and if you’re not committed to an open orchestration model don’t bother contacting me because everything in this architecture is going public!

Finding Money in the Clouds

Well, it’s Friday and a good time to piece together some news items for the week, things that by themselves may not be a revolution but that could be combined to signal something important.  An opening point is that Oracle’s Industry Connect event, which proves that Oracle does in fact see the communications industry as a vertical of growing importance.  Why?  We also saw Amazon and Google chest-butting on the price of basic IaaS services, bringing what seems likely to be a cut of a third or more in prices.  Why?  Dell suddenly gets into the data center switch and SDN business on a much more aggressive scale.  Why?

Let’s look at “cloud” as being the core of all of this.  Credit Suisse thinks that what we’re seeing with Amazon and Google price cuts is an indication of improved economy of scale, but I think they’re reading the wrong page of their economics text.  What they’re seeing is a lesson in optimized pricing.

The efficiency of a data center rises with scale because there’s more chance that you can hold utilization levels higher.  Pieces of this and that can be fit into a machine somewhere to make more of it billable.  You reach the point of adequate efficiency levels pretty quickly, so there’s really little change in “economy of scale” being represented here by either Amazon or Google.  However, to fill a slot in a data center you need both the slot and the “filling”.  Both Amazon and Google are going to cut their own revenues by cutting prices so they clearly expect to make up that loss in volume.  It’s not that we are running at better scale, folks, it’s that we can’t fill the slots as much as we want.  Otherwise lowering the price is dumb.  Amazon and Google think that there are cloud-ready applications that could be cost-justified at lower prices but not at the old levels.  They think that they’ll grab a lot of these other apps at the new price points.  They also think that they’ll make up for lower IaaS pricing with increased emphasis on platform services features that will augment basic IaaS.

The cloud is getting real, not by making IaaS the kind of revolution that low-insight articles on the topic suggest it is, but by gradually tuning applications to be more cloud-friendly.  As that happens, the features of the cloud that actually facilitate the shift (the platform services) become the key elements.  IaaS is cheap milk at a convenience store, a loss leader to get you to buy a Twinkie to have with it.

This means that “the cloud” in some form is going to create a new IT reservoir, not one that replaces the current data center.  Not all of the cloud will be incremental of course—SMBs in particular will be likely to shift spending rather than use the cloud to augment productivity.  Still, we are certainly going to see more server and software and data center growth driven by providers of cloud-based services than by traditional IT practices and budgets.  One place that’s very likely to come true in spades is in the communications industry.  Carriers have traditionally low internal rates of return so they can tolerate a price war on IaaS better than current competitors, and carriers also have the possibility of using network functions virtualization to offload functionality from appliances, reducing capital costs.  If they get what they really want from NFV, which is significant improvements in service agility and operations efficiency, they could realize even more gains and justify more deployment.

The data center is where all the real money is in both networking and IT, so it follows that what everyone is chasing is the new data center, which is the easiest to penetrate.  Where is the biggest opportunity for that new data center?  The cloud at a high level, but precisely where in the cloud?  The carrier cloud is the answer, because there aren’t enough Googles and Amazons to drive a huge business by themselves.  Operators will build the majority of incremental cloud data centers, and it could be a lot of data centers being built.

Our model, based on surveys of operators, says that in the US market alone a “basic” level of deployment of carrier cloud to host both services and NFV would require approximately 8,000 incremental data centers, and if the optimum level of NFV effectiveness could be realized the number could climb to over 30,000 data centers.  The midpoint model says that we could add over 100,000 servers from all of this.   Yes, most data centers would be small had have only a few servers—they’d be located at key central offices—but there would still be a lot of new money to be gained.  That’s why the communications vertical is important, why Amazon and Google need to grab more of the TAM now, and why Dell needs to be in the game in earnest.

HP is the poster child for cloud-to-come.  They have servers, software, cloud, network equipment, and all the good stuff.  They have an NFV story that is functionally as good or better than anything anyone else has productized at this point.  Oracle has software that could be combined with COTS to create a similarly good story, but they have to work harder on the software side and on what “platform services” are in order to win.  Even IBM has enormous assets in the software space that could be leveraged into a powerful cloud and NFV position and could address that server/data-center bonanza.  Dell has been a growing influence in servers, but it’s not been sparring one-on-one with HP in software, networking, or the cloud.  I think Dell’s announcement is a step in that direction, an attempt to catch the big boys before it’s too late.

Is Dell doing enough, though?  Are Amazon and Google doing what they need to do?  Is Oracle, even?  The fact is that NFV may be the most important thing in the cloud because it’s a source of demand that a network operator is in a unique position to own.  If operators build optimum NFV they’d have justified a boatload of data centers and an enormous resource pool they could then harness.  The might even have an operations framework uniquely capable of doing efficient provisioning and management of complex services.  Amazon and Google are not (despite Google’s fiber games) going to try to become network operators, so they can’t go to the NFV space.  Oracle, IBM, and even Dell need to have a real NFV strategy and not just a hope of selling infrastructure if they want to counter HP, and even Alcatel-Lucent and Cisco, both of whom have respectable NFV approaches they could leverage.  None of the NFV stories so far are great, and Dell’s announcement didn’t really move the functional ball for Dell in either SDN or NFV—they just announced hosting.  Does a hundred thousand servers, and all the associated platform software and network equipment, sound good to anyone out there?  If so, it’s time to get off your duff.

Service Automation and “Structured Intelligence”

Everyone knows that operators and enterprises want more service automation.  Some classes of business users say that fixing mistakes accounts for half their total network and IT TCO, in fact.  Nobody doubts that you need something to do the automating, meaning that software tools are going to have to take control of lifecycle management.  A more subtle question is to define how these tools know what to do.

Current automation practices, largely focused on software deployment but also used in network automation, is script-based.  Scripting processes duplicate what a human operator would do, and in fact in some cases are recorded as “macros” when the operator actually does the stuff.  The problem with this approach is that it isn’t very flexible; it’s hard to go in and adapt scripts to new conditions or even to reflect a lot of variable situations, such as would arise with widespread use of resource pools and virtualization.

In the carrier world, there’s been some  recognition of the fact that the “right” approach to service automation is to make it model-based, but even here we have a variation in approaches.  Some like the idea of using a modeling language to describe a network topology, for example, and then have software decode that model and deploy it.  While this may appear attractive on the surface, even this approach has led to problems because of difficulties in knowing what to model.  For example, if you want to describe an application deployment based on a set of software components that exchange information, do you model the exchanges or model the network that the components expect to run on?  If the former is done you may not have the information you need to deploy, and if the latter is done you may not recognize dependencies and flows for which SLAs have to be provided.

Another issue in model-based approaches is the fact that there’s data associated with IT and network elements, parameters and so forth.  For service providers, there’s the ever-important operations processes, OSS/BSS.  You need databases, you need processes, and you need automated deployment that works for everything.  How?  I’ve noted in previous blogs that I believed that the TM Forum, years ago, hit on the critical insight in this space with what they called the “NGOSS Contract”, which says that the processes associated with service lifecycle management are linked to events through a data model, the contract that created the service.  For those who are TMF members, you can find this in GB942.

The problem is that GB942 hasn’t been implemented much, if at all, and one reason might be that hardly anyone can understand TMF documents.  It’s also not directly applicable to all the issues of service automation, so what I want to do here is to generalize GB942 into a conceptual model that could then be used to visualize automating lifecycle processes.

The essence of GB942 is that a contract defines a service in a commercial sense, so it wouldn’t be an enormous leap of faith to say that it could define the service in the structural sense.  If the resources needed to implement a service were recorded in the contract, along with their relationship, the result would be something that could indeed steer events to the proper processes.  What would have been created by this could be seen as a kind of dumbed-down version of Artificial Intelligence, which I propose to call structured intelligence.  We’re not making something that can learn like a human, but rather something that represents the result of human insight.  In my SI concept, a data model or structure defines the event-to-process correlations explicitly, and it’s this explicit-ness that links orchestration, management, and modeling.

Structured intelligence is based on the domain notion I blogged about earlier; a collection of elements that cooperate to do something create a domain, something that has established interfaces and properties and can be viewed from the outside in those terms alone.  SI says that you build up services, applications, and experiences by creating hierarchies of these domains, represented as “objects”.  That creates a model, and when you decide to create what you’ve modeled you orchestrate the process by decomposing the hierarchy you’ve created.  When I did my first service-layer open-source project (ExperiaSpere) five or six years ago, I called these objects “Experiams”.

At the bottom of the structure are the objects that represent actual resources, either as atomic elements (switches, routers, whatever) or that represent control APIs through which you can commit systems of elements, like EMS interfaces (ExperiaSphere called these “ControlTalker” Experiams).  From these building-blocks you can structure larger cooperative collections until you’ve defined a complete service or experience.  ExperiaSphere showed me that it was relatively easy to build SI based on static models created using software; I built Experiams using Java and called the Java application that created a service/experience a “Service Factory” because if you filled in the template it created as it was instantiated and sent the completed template to the Factory, it built the service/experience and filled in all the XML parameters needed to manage the lifecycle of the thing it had build.

Static models like this aren’t all that bad, according to operators.  Most commercially offered services are in fact “designed” and “assembled” to be stamped out in order form when customers want them.  However, the ExperiaSphere model of SI is software-driven and less flexible than a data-driven model would be.  In either case there’s a common truth, though, and that is that the data/process relationship is explicitly created by orchestration and that relationship then steers events for lifecycle management.

I think that management, orchestration, DevOps, and even workflow systems are likely to move to the SI model over time because that model allows you to easily represent process/event/data relationships because it defines them explicitly and hierarchically.  Every cooperative system (every branch point in the hierarchy) can define its own interfaces and properties to those above, deriving them from what’s below.  There are a lot of ways of doing this and we don’t have enough experience to judge which are best overall, but I think that some implementation of this approach is where we need to go, and thus likely where a competitive market will take us.

Domain Models, Interworking, and Network Evolution

We’re obviously not going to get to the network of the future by having some pretty genie wiggle her nose to create an instant transformation.  No matter how radical things like the cloud, SDN, and NFV might be in the end-game, they’re going to get to that point one small step at a time.  Most enterprises and network operators believe that legacy technology will still dominate networks in 2018, in fact.  Which means we need to be thinking about how you introduce new technologies in a way that works both financially and technically.

Nearly everyone who looks at this problem, including everyone who has actually faced it, believes that the answer is domains.  We’re going to have little enclaves of new stuff that will appear in the midst of the old-model technology, and these will increase in number and size over time.  It’s likely this model will in fact prevail, but it generates some questions that have to be addressed quickly or our evolution may end up stalled in a drying puddle somewhere.

The primary reason for a domain model of network evolution is the fact that a given network technology is likely to develop its benefits unevenly and present uneven cost gradients.  That means that there will be places where cost is modest, benefits are high, and so ROI is strong.  That’s where something new can be done.  Nearly everyone believes that SDN will start in the data center, for example, because the cost/benefit relationship is better there.

When you have a domain model, you introduce the natural issue of interworking, which is different from “federation” because it is aimed at making “in-network” cooperation work between domains rather than in codifying their separation.  Looking at the interworking angle you quickly see a logical substructure of the domain model emerging, with some interesting alternatives to be addressed.

If you view a network as a connected set of domains you get a picture that’s not unlike the model of IP networks where we have subnets (within which one set of connection rules apply), AS domains, and networks.  Each of the domain types has a boundary condition and a mechanism for communication within themselves.  As a baseline, it would be logical to assume that services based on SDN would be applied in domains that map to IP network domains today, and that is in fact happening with datacenter SDN and with Google’s core-SDN model.

This kind of domain model is good, evolutionarily speaking, because the domains are selected in part to preserve the interconnection rules of the legacy network you’re evolving away from.  We have things like default gateways, OSPF, and BGP already, so what we end up with is a set of “new” domains that look like old, traditional, ones at the edges.  This means SDN (or NFV, or other) domains might be considered abstractions, black boxes.  We virtualize our networks going forward by creating new models for what’s inside existing black boxes.

This approach works pretty well when new domains are sparse, but as the plot thickens, domain-wise, you end up with a potential issue.  Imagine for example that we have two SDN-based domains adjacent to each other.  Traffic that moves between them could still be represented through the old interdomain model, but if these new domains have other capabilities you’d like to be able to consider them to be logically fused in at least one sense.  Yes, you could actually fuse them (presuming they were from the same owner) but for ownership reasons and because some traffic might not be able to take advantage of new-age SDN-driven connection models, you’d probably not do that.  Instead you would create what was effectively a second domain model that extends the old one and runs in parallel.  In protocol or API terms, the boundary conditions and exchanges of this new model are a superset of the older interworking model.  But you might want to do the same thing if you had traffic that was traveling over legacy domains between two new SDN domains.  Why not pass along some SDN-centric stuff?

The point here is that while the current SDN vendors haven’t made the case for their decision very well, the fact is that there’s a good reason to extend existing protocols like BGP to carry SDN-specific information as a step toward SDN.  My gripe with these guys (Alcatel-Lucent, Juniper, for example) is that they need to do a better job of explaining what the new interworking model or models might be.  Show me the future before you evolve me into it and it takes me by surprise, a buyer might think.

If we presume that there’s an SDN controller involved in all of this, we could visualize the new interworking models as being created among controllers, since the controller understands the connectivity rules of its domain(s).  Controller federation is then an evolution of traditional internetworking, one that is promoted more as SDN domains expand (in number, size, or both) or as new service models are created within domains that can’t easily be exploited across domains using legacy interworking rules.  At some point, this likely creates a situation where a supercontroller owns all the controllers in all the domains and acts as a mediator for interworking exchanges.

I’d like to see vendors talk about this structure explicitly and apply it to their own SDN solutions, because I think we need a common model.  Right now that’s not happening, so there are things everyone needs to be demanding relative to SDN implementations.

You might wonder how NFV fits into this.  In my view, you can visualize NFV as being either  an extension to the current domain interworking model, a domain within that overall model, or a new layer.  If SDN connectivity is applied end-to-end (overlay SDN) and it’s universally available, then SDN could create a connectivity fabric that has essentially no topology.  Anything can go anywhere in an overlay network (as long as some “underlay” goes there).  With NFV or even with hosted switching/routing, you can host nodes where you like which means you can build layers of virtual topology on top of the real infrastructure and within that topology apply any set of connection and interworking rules you like.  You can add features by making them virtual elements of all of this.  If the scope of SDN is universal then the range of the virtual NFV model is likewise; otherwise it becomes another domain enclave in a legacy sea.

I think it’s clear at this point why we need to think about all of this stuff at a higher level.  This is a pretty interesting model and one that could let us enjoy more SDN benefits even when “infrastructure” SDN doesn’t have full play yet.  It’s also a model that doesn’t require SDN, or OpenFlow, to work, only overlay networking and virtual network elements.  That gives us another set of options to sort out as we try to visualize what the future of our new technology waves might be.

Cisco’s “Intercloud”: the New “Everything?”

Cisco did their expected cloud announcement yesterday, but it wasn’t exactly a complete picture of Cisco’s cloud intentions.  It was at least a better picture of what Cisco is doing than the media has been able to muster, though.  For example, it’s pretty obvious that their investment of a billion dollars in “the cloud” isn’t (as has been reported) a commitment to Cisco’s entering the cloud business or competing with Amazon.  The announcement is the now-common Cisco mix of insight and unbridled hype, and it’s a job to wade through it all.

The foundation concept is the “Intercloud”, I think, which is an ecosystem of partners that Cisco says is essential in supporting the Internet of Everything.  This is almost enough for me to dismiss the whole thing as being PR nonsense, since I think it’s clear that’s what the IofE is.  However, look a bit deeper and you can see some of the insights peeking through.

In my view, it would make no sense for Cisco to pour a lot of money into becoming a cloud provider.  They’d compete with all the people they hope to sell equipment to.  So the strategy is more about building a global Cisco-based cloud ecosystem, which is what their “intercloud” means.  To Cisco, an intercloud is a network of clouds, so it’s essentially a federation of cloud providers and an ecosystem of partners who will offer everything from hardware to professional services.  Cisco’s press release says the following are the initial players:

…leading Australian service provider Telstra; Canadian business communications provider Allstream; European cloud company Canopy, an Atos company; cloud services aggregator, provider and wholesale technology distributor Ingram Micro Inc.; global IT and managed services provider Logicalis Group; global provider of enterprise software platforms for business intelligence, mobile intelligence, and network applications MicroStrategy, Inc.; enterprise data center IT solutions provider OnX Managed Services; information availability services provider SunGard Availability Services; and leading global IT, consulting and outsourcing company Wipro Ltd.

Cisco does plan to provide data centers to the Intercloud, but they’ve not indicated how many and what role they’ll provide.  In particular we don’t know whether Cisco is counting the hosting facilities already committed to WebEx, Meraki and Cisco Cloud Web Security.  To these already-web offerings, Cisco plans to add a bunch of services and capabilities ranging from SAP HAMA to Videoscape.  Go through all of this and you see that CCS is really an umbrella concept that unifies all of Cisco’s various software strategies (in nomenclature, at least).  It also promotes Cisco’s networking vision, and in particular Cisco’s Application-Centric Infrastructure, which Cisco intends to build into an SDN solution.

The insightful part of this is twofold.  First, Cisco’s move finally centralizes the company around a vision that has some legs—the future is going to look like a connected cloud, even down to the enterprise level.  Second, IaaS isn’t enough to give anyone a strong cloud future, so you have to build both a way to enhance operations efficiency to make IaaS most profitable, and a way to augment it to build revenue and volume.  Likely the way to do that is “platform services”, which I believe Oracle demonstrated they understood with comments in their last earnings call.

The problem is that every indulgence has its price, and Cisco indulged itself fully in hyperbole in their CCS announcement.  That raises the bar in terms of tangible proof points, and in this case Cisco has omitted some critical things instead of validating them.

The first problem is that if you’re going to do an Intercloud you need to have a federation story so strong it’s like a tech national anthem.  Cisco doesn’t tell one in their announcement, and I don’t see any indication they have one.  I know that Cisco understands the need for federation because I sat in as a silent partner on a Cisco pitch on it to a Tier One.  Cisco was a fly in the ointment of the only carrier federation architecture designed for the service layer (IPsphere) because it was a Juniper initiative.  Now, given that federation is mandatory in NFV, Cisco might have launched itself into federation supremacy by simply doing and announcing what’s clearly going to be a requirement, but they didn’t.

The second problem is that Cisco Cloud Services is too big.  A 900-pound gorilla can sit anywhere he wants, but a 9,000-pound gorilla sits wherever they happen to be forever because they’re not agile enough to move.  There’s just too much here to grasp, and especially to sell.  The only clear theme of Intercloud is that “inter” part that I’ve already noted Cisco neither develops fully nor defends with a strong solution.

So what this may come down to is intent, meaning the goal of the ecosystem itself.  Cisco is a sales behemoth, and so if it can’t sell CCS in a direct sense, can it make it into a concept so vast that every buyer and seller ends up running into it?  That may be true, and it may have been Cisco’s goal.

There’s not a standards activity on this planet—SDN, NFV, cloud, content, whatever—that has a prayer of keeping up with its own market.  We’ve been trying for the last decade to standardize at a glacial pace when the industry measures time in “Internet years”, and if anything it’s getting worse.  Cisco could create a de facto, proprietary, universe that’s a kind of fifth dimension to the reality of standards and openness but has the advantage of being there.  Operators need to have a cloud strategy; Cisco offers one.  VARs and integrators and software companies need a cloud story they can glom onto, and Cisco has it.  Buyers and sellers need a booth where the two groups can meet and do commerce, and Cisco is presenting one.

All along, with SDN and NFV and the Cloud, Cisco has bet on the idea that the market has grand ideas and pathetic execution skills, and many would argue (with some justification) that Cisco is the same way.  How many times did they announce a five-step program in which they were already in Step 2 and never got beyond Step 3?  But “the market” is unable to unify around anything useful.  Standards groups find agreement in superficial cooperation because anything deeper would be opposed by both groups for political reasons.  The IQ of any group of people, so the saying goes, is equal to the IQ of the dumbest divided by the number of people.  Given that standard, Cisco knows it doesn’t need to be all that smart, and they may have a point.

Is Oracle Seeing the Future of IT and Networking?

I promised last week I’d have more to say about Oracle’s quarter, but as you’ll see this blog isn’t as much about Oracle’s quarter as what’s behind it.  Oracle could well be the poster child for the new age of tech, the company who can teach us what the future of both IT and networking will look like.

In the quarterly call, Larry Ellison said “Our Engineered Systems business is growing rapidly for the same fundamental reason that our Cloud Applications business is growing rapidly.  In both cases, customers want us to integrate the hardware and software and make it work together, so they don’t have to. As customers shift to pre-integrated hardware and Cloud computing, in search of lower costs and more rapid implementations, Oracle is presented with new opportunities for leadership in a number of market categories.”  Cost management and agility, right?  We’ve heard that before, from operators explaining their hope for NFV.  But what I want to focus on is that phrase “…integrate the hardware and software and make it work together….”

What is Ellison’s integrated stuff going to look like as it works together?  I assert that the answer is a service, which is also why there’s a linkage between growth in Engineered Systems and Cloud Applications.  The future Ellison sees, and what I think Oracle sees as its “new opportunities for leadership” is a future where integrated stuff is presented as a set of services.  Services can be combined as needed for agility, hosted where optimal for managed costs.  It’s a cloud future, but a cloud made up of platform services that are more like the components of modularized software than they are like hosted virtualization.  The future, in short is a future of service orientation on a much larger scale, but also based on different principles than “service oriented architectures” today.  It’s how this future develops that will determine the fate of Oracle, and every other company in tech today.

Service orientation is about the value of the services, not the value of the platform.  That means it tends to devalue hardware and OS products unless they’re packaged into a self-sustaining system along with some useful horizontal software component.  It means that you have to look at how a “service” could benefit from being hosted in an appliance versus on a resource pool—scaling and resiliency come to mind.  Overall, it means you have to take a broader look at the value of what you’re offering—a look at how it fits into that service-based approach.

My fascination with management and orchestration arises from this point.  Oracle’s vision is that “services” created either in the cloud or by appliances will be composed into stuff that provides productivity gains, entertainment, whatever.  That’s orchestration.  This process has to be as efficient as possible because the costs created by the complexity that inevitably accompanies componentization on a large scale will quickly overwhelm benefits otherwise.  That makes the critical point that you have to orchestrate and manage in parallel, because the two steps are related in process and driven by that common initiative to create flexibility by componentizing.

Where this takes IT is clear, I think.  We start to focus on harnessing services that are put wherever it’s cheap and convenient to host them.  The cloud of the future doesn’t need explicit hybridization because it’s boundary-less.  Integration/orchestration binds components into applications and experiences.  Hardware and operating systems and even Oracle’s Engineered Systems are ways to create components—meaning services.  You use appliances where packaging reduces operating cost and complexity and improves agility enough to offset any capital cost increase.  Sometimes things like horizontal scaling will mean that separation of software features of a service from the hardware platform is good.  Sometimes, particularly with DBaaS and other highly contained and focused functionality, an appliance might well be better.

How about networking?  One important starting point is that if you’re going to have a service oriented future you have to think in terms of creating services from and through networks.  Here we immediately come to a seeming contradiction, because Engineered Systems is a move toward custom appliances and some modern concepts like SDN and NFV are moves away from that.  How is that logical?

Probably it isn’t, which is what operators have told me over the last six to ten months.  They’re seeing the value of SDN and NFV being (guess what!) agility and operationalization.  To me that means that it’s less important to create a bunch of virtual functions to get rid of appliances than it is to create services from networks or by melding network and IT functionality.  SDN needs to be thinking not about creating the tools for defining white-box strategies for connectivity but about creating network services that are uniquely different from what we have today, and thus uniquely valuable.  NFV needs to be looking not at how to turn appliances into hosted software, but at how to create network services that can be composed like software APIs.

That this might be a tough transition is illustrated by Cisco.  They’re supposed to be preparing to spend a billion dollars getting into “the cloud business”.  What cloud business is that?  Could they be thinking UCS-centrically?  If I’m right in reading Oracle’s tea leaves, IaaS could be suicidal for Cisco’s cloud aspirations, the reprise of their Flip debacle.  On the other hand, if Cisco thinks WebEx, and if they play their management and orchestration angles right, and if they frame their “SDN strategy” about making network as a service into something that’s really service-oriented they could have a big win here.

Cisco could be reading the same market issues as Oracle, in which case it would make sense that HP drive an aggressive NFV strategy and make no sense that IBM seems waiting in the wings on that whole angle.  The cloud is the senior partner among the tech revolutions in an opportunity sense, but NFV might be the closest thing to a blueprint for the solution to the technical problems.  Yes, cloud orchestration could evolve to a higher level, but if DevOps tools can morph into orchestrating and managing the services of the future, why haven’t they made progress toward that already?  That a question we all should ponder, but in particular the big IT and network vendors, because if the cloud does answer its own orchestration and management questions it will mean that everything else, including NFV and SDN, could be nothing but services for a hungry cloud.  That could be Oracle’s truly big opportunity.

The Reality of Neutrality

I guess it wasn’t hard to figure out what my blog today would be about.  Netflix’s CEO has blogged and made some strong comments about the need for a very broad interpretation of “net neutrality”.  Netflix in fact wants something even more “neutral” than the FCC order that was overturned on appeal would have provided.  Their goal, in my view, is to force the FCC to demand Comcast accept such a position as a condition for their TWC merger.  So is this a good idea?

Let’s get something straight from the first.  Netflix wants a free ride on everyone’s infrastructure to boost their profits.  Comcast wants every content provider to turn over their entire gross profits to the retail ISPs.  Everyone in this process is out for themselves, just like always.  I’d love my Internet service to be totally free.  I’d love my vacations to be free too, and my health care and my lawn care…we all like free stuff.  But we all know we can’t go into a BMW dealer and drive away scott free with whatever car we like and that we’re going to have to pay the bills.  Let’s leave everyone’s latent opportunism out of this.

The problem that’s in focus here is simple.  OTT players connect to the Internet through large wholesale players or directly peer with access providers.  They dump content into the access networks—Comcast, Verizon, and so forth—and they generally do not pay incrementally for the traffic they generate.  Yet the ISPs have to carry it, and as video traffic increases it increases the cost of transporting it.  That forces return on infrastructure investment lower, which discourages that investment.  This old business model, called “bill and keep” has been around from the first days of the Internet, and it worked at first because we didn’t have fast enough broadband to support the kind of demand that would strain networks.  Later on, when consumer broadband came along, we still didn’t have a traffic source big enough to cause a problem—till video came along.

The FCC, Congress, or any regulator lacks the power to order a business to operate at a loss.  Whatever anyone says, the simple truth is that the current cost/price curves for operator infrastructure can’t be sustained for more than a couple of years.  The current model cannot be extended as is because of that, so what we’re arguing about is whether we can force operators to the breaking point or whether we will force somebody to pay.  Netflix knows that as well as I.  Even the FCC’s Neutrality Order, now toast, didn’t forbid notions like allowing content providers to pay for QoS (which would inevitably lead to that happening) or settlement for traffic handling among providers.  It just discouraged that by saying that it would need a justification.  So the FCC also knew that eventually something would have to happen.  By default, the only outcomes of the current situation is steady deterioration of the Internet or increased Internet access prices on consumers.

Which is what Netflix and content providers want, but they want the increased prices to be on all consumers.  If an access provider like Comcast can charge Netflix for transport, Netflix will pass the increase along to its customers, some of which will stop using the service because of the hike.  But if Comcast increases everyone’s Internet charge whether they use Netflix or not, the per-customer increase on Netflix customers is less because the cost is spread over everyone, and there’s no benefit for a customer to drop Netflix because they’ll pay the “content tax” anyway.  That’s what this is really about.  To turn Netflix’s analogy back, we’re asking the FCC and Congress to make all the toll roads free by charging everyone for them, whether they use them or not.  That’s not how transportation works, nor can it be how the Internet works in the long run.

You hear that we need neutrality to spur Internet innovation, right?  Well, that’s crap.  What is happening is that we’re discriminating against Internet innovation in the network sense of the Internet.  We’re creating an OTT ecosystem with an inexhaustible and incrementally free food supply and wondering why everything is getting eaten.  We should be innovating in infrastructure, where it’s almost impossible to get startups funded, and one reason we don’t is that it’s easy money to be an Internet eater and not so easy to be an Internet farmer.

How about those who throw out the US ranking in broadband?  That’s crap too.  First off, how does making an access carrier transport video at a loss incent them to invest more?  But second, the most important statistic in broadband speed is demand density, the dollar network spending opportunity per square mile.  If we normalize the US density to a value of 1.00, we find that Europe on the average has four times the density of the US and some high-density Asian markets have as much as 20 times the density.  We’re behind because we’re spread out, so we need policies that encourage infrastructure investment, not deter it.

Why is Comcast out in front here?  Why do they always seem to be the guy poking a stick in someone’s eye, regulatorily speaking?  The answer is simple; cable infrastructure has lower marginal ccess efficiency than other broadband delivery options.  Yes, multi-customer cable has a lower “pass cost” than FTTH or even deep FTTN, but the problem is that if you fill up a given cable span with streamed video, you force the cable provider to segment the span.  Say a thousand-customer span costs a hundred bucks per pass.  Force that to divide into five spans of 200 customers and you have to run four additional fiber feeds to the span head ends.  Cable is the most worried about video growth, as they should be.

So here’s my point.  We have reached the stage where some EU carriers are offered more incentive to invest in infrastructure in developing countries than in their home service area.  We are giving all operators an incentive to invest more in VoD than to invest in the Internet.  The problem of the Internet’s finances can be fixed easily and fairly by letting the market do what it wants.  We should let ISPs charge for premium delivery, and charge either the consumer or the content provider.  We should allow ISPs to require settlement on traffic passed to them.  We should only prevent the discrimination against a specific type of traffic or a particular source, based on anything other than volume of bits for traffic engineering and charging purposes.  Then the providers who try to gouge will find themselves losing customers or customer QoS and they’ll respond appropriately.  This is likely to increase costs a bit for us, and to discourage some OTT-based startups, so a few VCs and Internet entrepreneurs will make a bit less.  But it will create a healthy industry in the long term, and at a lower cost than we’ll eventually face by being selfish and stupid in the near term.

My Take on the ONF/NFV ISG Strategic Alliance

This week, the ONF and the NFV ISG announced they would be entering a “strategic partnership” to define how NFV could use SDN to address the connectivity requirements of its use cases, and the ONF published a preliminary document to describe an example.  That SDN and NFV have to cohabit has been clear from the first, but the announcement material offers some hints as to how both groups will have to evolve, and what might emerge at the end.  As always, there’s good and bad news.

The ONF’s reference framework for SDN is a three-layer structure with the top layer being the Application Layer where “business applications” live, the middle layer the “Control Layer” where network services are presented, and the bottom layer the “Infrastructure Layer” where real hardware lives.  Applications presumably call on network services that are then created (using OpenFlow) by the control layer.

NFV’s connectivity model is evolving, but what seems to be emerging is the notion of “network-as-a-service”, which would be some set of abstractions representing a connectivity model—VLAN, VPN, subnet…you get the picture.  During orchestration, the MANO processes would call for a model and this model would be instantiated (through an infrastructure manager) on real hardware.  The ISG seems to be evolving toward recognizing that some IMs would be managing “virtual” infrastructure and others might be managing real (eg. WAN) infrastructure.

It’s pretty obvious that NFV’s notion of NaaS models could map directly to the ONF Control Layer notion of a “network service”.  That would mean that NFV (both the VNFs and MANO) would live in or parallel to the Application Layer.  To be more detailed, MANO would be acting as a kind of network service broker for the VNFs being orchestrated, so that it creates the network services that the VNFs then consume.  That’s not a big deviation from the SDN norm, since few network users would want their applications brokering their own network services without intermediation for security, stability, and cost control.  And, in fact, the diagram provided by the ONF to illustrate a use case for the cooperation shows just that—NFV is a consumer of the Control Layer services.

Some may see this as deprecating NFV, but it’s a realistic picture of NFV’s relationship to the ONF model.  However, I could draw my own diagram that would make NFV a three-layer structure and stick all of SDN as a plug-in on the bottom.  It’s a “New Yorker’s Map of the US” all over again (if you’ve never seen this little item, you’d see all of Texas as being about the size of Central Park).  If you look past the histrionics, the two views are really the same thing.  SDN is just one possible way of generating a service/connection model fulfillment strategy for NFV.

Why, then, are we making a big thing of this?  There should be no need for a strategic alliance to establish how SDN could fulfill the connection requirements of an NFV use case.  Unless SDN is brain-dead it should be able to fulfill them in many ways out of the box, and in fact it could—at one level.  I can construct nice SDN paths between VNFs for sure.  The question is whether that does any good.  If three VNFs think they’re on the same LAN subnet and reach each other through normal IP/LAN processes, then that’s what you’d better be creating with SDN.  It’s not the use case or the workflow relationship between VNFs that matters, it’s the network model that the VNFs were written to live on.  You can’t tunnel something to an interface that’s not a tunnel interface, software-wise.  But this is something the NFV ISG still hasn’t glommed onto, and maybe the SDN connection will provide the insight.

The real value-add of this alliance (and of the ONF’s diagram) is that notion of a “network service” as the thing that links SDN and MANO.  NFV has to consume connection models created by cooperative device behaviors (real or virtual device) down below.  Orchestration at the VNF level (or higher) can’t get bogged down in the lower-level details because if that were the case then the “recipe” for a service would have to be revised for every infrastructure detail that changed.  The world of MANO, in NFV, knows the world of NFVI through the intermediary of a series of models (at least two classes, one representing hosting models and the other representing connection models).  Similarly, VNFs live in hosting/connection models that were created for them.  What ties things together isn’t SDN specifically (as we’ll see) it’s the models.

Another interesting thing in the ONF diagram is the representation of the Control Layer as OpenDaylight.  What I like about OpenDaylight is that 1) it makes the notion of “network services” explicit and 2) it has southbound plugins beyond just OpenFlow.  The connection models used to connect VNFs are not all going to be based on SDN because SDN isn’t going to be universal when NFV deploys—if ever.  However, OpenDaylight can control legacy networks, can map its network services to arbitrary infrastructure, which is what you need to be able to do if you ever expect to deploy NFV on a real-world network.

The key to whether this strategic alliance generates anything other than ink (media happiness at another story to write), is whether everybody in the alliance sees those “network services” as the key development.  Every exchange in a world of virtualization and abstraction has to be virtual and abstract.  We need to define connectivity to applications and to NFV not based on how it’s achieved but on what properties it asserts to its users.  How that happens is just magic, the stuff inside that black-box abstraction.  Everything should be about “service models” in both SDN and NFV.  It’s not that way now, and that has to change for the good of all.

My final point on the alliance is its sin of omission.  SDN doesn’t have squat for a management strategy.  IMHO, neither does NFV.  Two wrongs (or nothings) don’t make a right.  Absent an effective way to build both SDN and NFV into networks as components (so that you don’t have to have a mass extinction that sweeps legacy away to replace it with what you want), service agility and operational efficiency gains are doomed.  Yet it is those things that both SDN and NFV want to harness as benefits.  You guys both need to ally with the TMF (if they can get their act together on the high-level orchestration problem), or with some responsible higher-layer management process body.  Otherwise, benefit-wise, you’re holding hands in the sea beside the Titanic watching the lifeboats row away.

Taking a Harder Look at Cloud, SDN, and NFV Federation

In a previous blog, I talked about two models of “SDN federation”, one that was at the service level and one that was at the “SDN” or controller level.  One obvious difference between the two is that the former requires some specific service, whose interworking features then drive the federation process.  If you wanted to federate on a more general level, you’d need a more general approach, and I think there’s value in looking at how that might work.

The thing is, SDN isn’t the only thing that requires federation.  You also need to federate NFV and the cloud, or at least there’s a high value in doing so.  It’s helpful to look at the deeper level of SDN federation, but at a broader scale so that we address NFV and the cloud as well, and that’s what I propose to do here.

First, I’m using the term “federation” to describe a relationship between two or more administrative domains wherein the two domains retain their autonomy with respect to some customers and services, but agree to cooperate within specific limits to serve another group of customers and another service set.  There are a lot of reasons why federation is important, but historically the big driver has been that not all service providers operate globally and some customers need services on a geographic scale wider than a given provider supports.  Hence, partnership—federation.

Modern services can be visualized as being built at three layers.  At the bottom there’s infrastructure, or the pool of resources needed.  Above infrastructure is the structural orchestration of cooperative resource combinations, and above that is the service framework as it appears to a retail (or wholesale) customer.  It follows that we have to accommodate this layering in the federation process, because we could create federation utilizing any of these layers.

The easiest form of federation to imagine is infrastructure-level federation.  With this level of federation, the partners agree on a means of cross-sharing of their pool of resources.  All of the partners can compose services at the next two levels using a combination of their own resources and the federated infrastructure resources of partners (subject to whatever the restrictions might be).  The partner is not aware of the services being created, only that resources have been committed, and in general the ceding of resources to a federation partner would cede the management of those resources (again subject to potential constraints both technical and commercial).

In SDN, NFV, and the cloud, infrastructure federation would mean sharing the underlying resources (OpenFlow switches, NFV infrastructure, servers or containers) so that they are placed under the control of the structural orchestration of the partner who’s creating the retail service.  In SDN, Carrier “A” uses their SDN controller to control switches in Carrier “B”’s domain.  What makes it simple is that it’s almost purely a matter of making the resources accessible to the partner.  What complicates it is that you have to assume the federating partners have protected themselves against collisions of demand that might impact the services of all involved.

Next up the latter is structural federation, which means that the two parties involved retain structural orchestration of their own infrastructure but link these orchestration processes so they cooperate in some way.  An SDN controller pair might exchange filtered route information so one controller can “see” paths in another domain, but a request for one would then be actioned by the other controller.  This requires creating some structural interworking interface, but it reduces the risk of a partner going maverick and doing something bad to your resource pool and the services it supports.

Structural federation interworking requires some knowledge of how the structuring happens.  We have examples of it in some of the old routing protocols, where a header contained an abstract route stack that described how a packet got through a partner network in a general sense (ingress to egress) but this was popped on entry by the receiving network and replaced with its own specific by-hop instructions.  The same sort of thing could be done in SDN or NFV.  For the cloud, structural federation would mean taking a service model and offering pieces of it to multiple OpenStack instances, for example.

At the top of the list is service federation.  Here each of the federating providers creates a portion of a service, and commands partners to create symbiotic pieces of that same services, connecting them at some physical gateway point.  The service components are “black boxes” that assert their own service-level properties but keep everything else opaque.  The interconnection point doesn’t have to be service-specific as NNIs typically are, but it does have to be service-compatible, meaning you have to be able to pass the service across it.

The work of the IPsphere Forum (now a group in the TMF) was focused on service federation, and this is the type of federation that network operators are probably most comfortable with overall.  It dodges questions of management visibility, though there may still be a need to coordinate remediation across a boundary and offering some drill-down to partners may be a way of doing that.

All federation strategies need to have a combination of a commercial agreement on what’s to be done and a set of technical policy elements to enforce those constraints.  The constraints could be load-related, time-related, location-related, security-based, and management-visibility-based.  In general, the policies would be applied during the activation or modification of a federated service but also during management or control events in general.  That means that you’d likely have to apply policies to the channel that carries federation requests but also at the gateway points where federation traffic connected among the partners.

There are going to be use cases for all the kinds of federation I’ve listed here; there are already, in fact.  These use cases exist in the cloud, SDN, and NFV too.  For all the interest, though, there has been little done in creating an effective template for federation in a complete sense.  In IPsphere, for example, the internetwork gateway point specifications never caught up with the times.  We all know that SDN federation is in its embryonic stages, and cloud federation is almost case-by-case.  I think that this is going to change quickly, though, because carrier interconnect is a way of life and just as we’ve had to accommodate a shift from service-specific networks to converged IP, we’ll have to accommodate the shift to SDN, NFV, and the cloud, in our interconnect plans.

How Enterprises Say We Could Turbocharge the Cloud

We didn’t have a major cloud show in the last month, so I’ve not been blogging much about the cloud compared to other technology revolutions like SDN or NFV.  However, I did get some interesting cloud data from the fall survey, and if you combine the data with a bit of modeling and analysis, you can get a clearer picture of what’s happening in the senior partner among our tech revolutions.

First, it’s true that cloud computing is growing, though it’s a bit of a juggle to get exact numbers because they’re not broken out.  Cloud growth, measured in revenue, is almost certainly hitting close to 40% per year, a nice number.  And this is happening despite the fact that we’re not at this point really doing much more than scratching the surface of cloud opportunity.  I’d estimate that we’ve penetrated only a bit more than 1% of what could be achieved.  But don’t take the revenue numbers at face value, because a big slice of cloud spending comes from OTT startups and not businesses.

But the second point is equally interesting.  What’s driving cloud usage in the enterprise is changing.  The first wave of the cloud was driven by two primary factors—server consolidation and development/testing activities.  The second wave was driven by the creation of application front-ends associated with business web presence.  We’re now looking at a more complicated and more interesting “third wave”, and this is where the future of the cloud is likely to be determined.

You hear that the cloud’s big push will come by taking over the hosting of the core business or mission-critical apps.  If that’s true then the push isn’t coming any time soon—if ever.  Enterprises say that these applications would cost them between 15% and 25% more to run in the cloud, considering all cost factors.  There are special situations where a transition from private data centers to the cloud for mission-critical apps makes sense, but they’re not common and won’t make much of a market.  These applications won’t move to the cloud as much as evolve into the cloud.

The same companies who say that their core applications would be more costly to run in the cloud think that components of these applications will shift cloud-ward over the next five years.  The drivers for this migration are a combination of two factors.  First, companies are being forced to address mobile empowerment more, both because worker mobility is a fact of life for a key segment of their workforce and because point-of-activity empowerment offers potentially huge benefits.  Second, companies are seeing more and highly useful platform services that offer them cloud-based capabilities that aren’t available on premises.  Could they be?  Sure, but with no current practices and technology to displace, these services give the cloud an edge.

The way enterprises expect this to go depends on how these two drivers impact planning, and also on whether the company is already looking to the cloud for cloudbursting or failover applications.  An application is a bit like an onion; there’s a core element that’s already deployed in the data center and is highly integrated with critical databases, and a series of outer layers that are more directed at presentation of information than at the central storage and processing.  It’s easy to move the outer layers of the application when something fails, or supplement them under load, and so these layers are what typically get cloudsourced when a company starts backing up or supplementing data center resources.  Interestingly, though, these outer application layers are also the ones most likely impacted by things like mobility or platform services.  Thus, where we do have cloudburst/failover migration of components, we ease the way toward justifying keeping these components in the cloud for mobility or platform service reasons.

According to enterprises, cloudbursting and failover are always a part of the business justification for migrating mission-critical applications to the cloud but they rarely are more than an insurance policy.  The persistent commitment to the cloud is created by those other two drivers in some mix, and exploiting the combination of a kick-off based on availability and performance and a drive to the goal based on mobility and platform services would seem to be the optimum strategy for cloud providers.

They don’t follow it, though.  According to enterprises, the biggest tech-side drivers for their mobility projects come from their IT vendors and they’re almost exclusively drivers for mobility-based changes.  We’ve called this “point-of-activity empowerment”, and it’s something that almost three-quarters of large enterprises and over half of all enterprises say they’re exploring.  With, as I’ve said, their IT vendors.

That is interesting because it means that the things most likely to advance the state of the cloud are things that are not being exploited by the cloud providers.  IBM and HP and Microsoft may have cloud strategies but since the cloud is viewed as a means of reducing IT spending, they’re not likely to be pushing their users to the cloud as much as addressing the cloud if they have to.  That’s why point-of-activity-empowerment changes are seen more often as IT evolution than as cloud evolution.

Platform services are essentially a cloud-specific strategy; Amazon has the most of them and is the strategic leader.  But CIOs are not being pushed into platform services by cloud providers.  Where they hear of them, the information is percolating up from cloud specialists on staff and it’s not necessarily connected to any specific project activity.  That means that when a company launches a point-of-activity project that’s perfect for platform services (a double whammy, value-wise) they will see the connection because some tech guy recognizes it—and that’s not the best way to drive a cloud evolution.  The CIO needs to be educated.

In fact, the CIOs themselves admit they are a major barrier to cloud adoption on a larger scale.  In our survey, 82% of CIOs said they could “be doing more” to realize the full value of the cloud.  The reason they don’t?  Because they don’t have the information they need.  Cloud evolution is a religion to sellers and the media, and to the CIO it’s a technology decision that first must be justified and second poses a significant career risk.  If you’re a cloud seller of any sort, you need to be doing more to help your real buyer.