Why Not Have REAL Virtualization?

Does a network that presents an IP interface to the user constitute an “IP network?”  Is that condition enough to define a network, or are there other properties?  These aren’t questions that we’re used to asking, but in the era of virtualization and intent modeling, there are powerful reasons to ask whether a “network” is defined by its services alone.  One powerful reason is that sometimes we could create user services in a better way than we traditionally do.

SDN is about central software control over forwarding.  In the most-often-cited “stimulus” model of SDN, you send a packet to a switch and if it doesn’t have a route for it, it requests one from the central controller.  There is no discovery, no adaptive routing, and in fact unless there’s some provision made to handle control packets (ICMP, etc.) there isn’t any response to them or provision for them.  But if you squirt in an IP packet, it would get to the destination if the controller is working.  So, this is an IP network from a user perspective.

If this sounds like a useless exercise, reflect on the fact that a lot of our presumptions about network virtualization and infrastructure modernization rely implicitly on the “black box” principle; a black box is known only by the relationship between its inputs and outputs.  We can diddle around inside any way that we like, and if that relationship is preserved (at least as much of it as the user exercises) we have equivalence.  SDN works, in short.

Over the long haul, this truth means that we could build IP services without IP devices in the network.  Take some optical paths, supplement them with electrical tunnels of some sort, stick some boxes in place that do SDN-like forwarding, and you have a VPN.  It’s this longer-term truth that SD-WANs are probably building toward.  If the interface defines the service, present it on top of the cheapest, simplest, infrastructure possible.

In the near term, of course, we’re not likely to see something this radical.  Anything that touches the consumer has to be diddled with great care, because users don’t like stuff that works differently.  However, there are places where black-box networks don’t touch users in a traditional sense, and here we might well find a solid reason for a black-box implementation.  The best example of one is the evolved packet core (EPC) and its virtual (vEPC) equivalent.

Is a “virtual EPC” an EPC whose functionality has been pulled out of appliances and hosted in servers?  Is that all there is to it?  It’s obvious that the virtual device model of an EPC would fit the black-box property set, because its inputs and outputs would be processed and generated by the same logic.  One must ask, though, whether this is the best use of virtualization.

The function of the EPC in a mobile network is to accommodate a roaming mobile user who has a fixed address with which the user relates to other users and to other network (and Internet) resources.  In a simple description of the EPC, a roaming user has a tunnel that connects their cell site on one end, and to the packet network (like Internet) gateway on the other.  As the user moves from cell to cell, the tunnel gets moved to insure the packet is delivered correctly.  You can read the EPC documentation and find citations for the “PGW”, the “SGW”, the “MME” and so forth, and we can certainly turn all or any of these into hosted software instances.

However…if the SDN forwarding process builds what is in effect a tunnel by creating cooperating forwarding table entries under central control, could we not do something that looked like an EPC behavior without all the acronyms and their associated entities?  If the central SDN controller knows from cell site registration that User A has moved from Cell 1 to Cell 5, could the SDN controller not simply redirect the same user address to a different cell?  Remember, the world of SDN doesn’t really know what an IP address is, it only knows what it has forwarding rules for, and what those rules direct the switch to do.

You could also apply this to content delivery.  A given user who wants a given content element could, instead of being directed at the URL level to a cache, simply be connected to the right one based on forwarding rules.  Content in mobile networks could have two degrees of freedom, so to speak, with both ends of a connection linked to the correct cell or cache depending on content, location, and other variables.

I’m not trying to design all the features of the network of the future here, just illustrate the critical point that virtualization shouldn’t impose legacy methodologies on virtual infrastructure.  Let what happens inside the black box be based on optimality of implementation, taking full advantage of all the advanced features that we’re evolving.  Don’t make a 747 look like a horse and buggy.

We’re coming to accept the notion that the management of functionally equivalent elements should be based on intent-model principles, which means that from a management interface perspective both legacy and next-gen virtualization technology should look the same.  Clearly, they have to look the same from the perspective of the user connection to the data plane, or nothing currently out there could connect.  I think this proves that we should be taking the black-box approach seriously.

Part of this is the usual buzz-washing problem.  Vendors can claim that anything is “virtual”; after all, what is less real than hype?  You get a story for saying you have “virtual” EPC and nothing much if you simply have EPC or a hosted version.  Real virtualization has to toss off the limitations of the underlying technology it’s replacing, not reproduce those limits.  vEPC would be a good place to start.

Is There Really a Problem With OpenStack in NFV?

Telefonica has long been a leader in virtualization, and there’s a new Analyst Mason report on their UNICA model.  There’s also been increased notice taken of Telefonica’s issues with OpenStack, and I think it’s worth looking at the report on UNICA and the OpenStack issues to see where the problems might lie.  Is OpenStack a problem, is the application of OpenStack the issue, or perhaps is the ETSI end-to-end model for NFV at fault?  Or all of the above?

In ETSI NFV’s E2E model, the management and orchestration element interfaces to infrastructure via a Virtual Infrastructure Manager.  I have issues with that from the first, because in my view we shouldn’t presume that all infrastructure is virtual, so an “Infrastructure Manager” would be more appropriate.  It also showcases a fundamental issue with the VIM concept, one that UNICA might not fully address.

We have lots of different infrastructure, both in terms of its technology and in terms of geography.  Logically, I think we should presume that there are many different “infrastructure managers”, supplied by vendors, by open-source project, or even developed by network operators.  Each of these would control a “domain”.  It’s hard to read the story from the report, but it I’ve heard stories that Telefonica has had issues with the performance of OpenStack while deploying multiple VNFs, and in particular issues with performance when requirements to deploy or redeploy collide.

The solution to the issue in the near term is what Telefonica calls “Vertical Virtualization”, which really means vertical service-specific silo NFV.  For vEPC, for example, they’d rely on Huawei.  This contrasts with the “horizontal” approach of UNICA, where (to quote the Analyst Mason paper) “Ericsson supplies the NFVI and related support for UNICA Infrastructure, which is the only infrastructure solution globally that will support VNFs.”

So here is where I think the issue may lie.  NFVI, in the ETSI document, is monolithic.  There is therefore a risk that a “domain” under NFVI control might be large enough to create hundreds, thousands, or even more service events per minute.Hu;;Hu  There is a known issue with the way OpenStack handles requests; they are serialized (queued and processed one at a time), because it’s very difficult to manage multiple requests for the same resource from different sources in any other way.  The use of parallel NFV implementations bypasses this, of course, but there are better ways.

Parallel implementations, “vertical virtualization” creates siloed resources, so the solution has only limited utility.  What is better is that there be some “VIM” structure that allows for the separation of domains, separation so that different vendors and technologies are separated.  Multiple VIMs can resolve this.  But you also need to have a way of partitioning rather than separating.  If OpenStack has a limited ability to control domains, you first work to expand that limit, and then you create domains that fit within it.

The biggest problem with OpenStack scaling in NFV is the networking piece, Neutron.  Operators report that Neutron can tap out with less than 200 requests.  It’s possible to substitute more effective network plugins, and here my own experiences with operators suggest that Nokia/Nuage has the best answer (not that Ericsson is likely to pick it!).

If you can’t expand the limits, then size within them.  An OpenStack domain doesn’t have to wrap around the globe.  Every data center could have a thousand racks, and you can easily define groups of racks as being a domain, with the size of the group designed to ensure that you don’t overload OpenStack.  However, you now have a resource pool of, say, 200 racks.  How do you make that work?

Answer; by a hierarchy.  You have a “virtual pool” VIM, and this VIM does gross-level resource assignment, not to a server but to a domain.  You pick a server farm at a high level, then a bank/domain, and finally a server.  Only the last of these requires OpenStack for hosting.  Networking is a bit more complicated, but only if you don’t structure your switching and connectivity in a hierarchical way.  In short, it’s possible to use decomposition policies to decompose a generalized resource pool into smaller domains that could be easily handled.

It’s also possible, if you use a good modeling strategy, to describe your service decomposition in such a way as to make a different VIM selection depending on the service.  Thus, you can do service modeling that does a higher-level resource selection.  Then you could use the same modeling strategy to decompose the resources.  If you’re interested in this, take a look at the annotated ExperiaSphere Deployment Phase slides HERE.

The point here is that the fault isn’t totally with OpenStack.  You can’t assign resources for a dozen activities in parallel, drawing from the same pool.  Thus, you have to divide the pool or nothing works.  You can make the pool bigger by having more efficient code, but in the end, you’re disposing of finite resources here, and you come to a point where you have to check status and commit one.  That’s a serial process.

This is a problem that’s been around in NFV for years, and many (including me, multiple times) have called it out.  I don’t think it’s a difficult problem to solve, but every problem you decline to face is insurmountable.  It’s not like the issues haven’t been recognized; the TMF SID (Shared Information and Data Model) has separated service and resource domains for literally several decades.  I don’t think they envisioned this particular application of their model, and I like other modern model approaches better, but SID would work.

No matter how you hammer vendors (or open-source groups, or both) to “fix” problems, the process will fail if you don’t identify the problem correctly.  Networks built up from virtual functions connected to each other in chains are going to generate a lot more provisioning than cloud applications generate.  If there was no way to scale OpenStack deployments properly without changing it, then I think Telefonica and others could make a case for demanding faster responses from the OpenStack community.  But there is a way, and a better way.

Telefonica has a lot of very smart people, including some who I really respect.  I think they’re just stuck in the momentum of an NFV vision that didn’t get off to a good start.  The irony to me is that there’s nothing in the E2E model that forecloses the kind of thing I’ve talked about here.  It’s just that a literal interpretation of the model encourages a rigid, limited, structure that pushes too much downward onto open tools (like OpenStack) that were never intended to solve global-scale NFV problems.  I’d encourage the ISG to promote the “loose construction” view of the specs, and operators to push for that.  Otherwise we have a long road ahead.

Where Will the Rush to “Digital Transformation” Lead?

For the current quarter, we’ve seen Ericsson post a profit warning, Nokia warn of deteriorating outlook, and even Huawei slowing its rate of growth (to something other vendors would die for!)  Analysts say that “digital transformation” for operators is a must.  A Wall Street research company says that only one of the US operators (Charter) increased capex versus expectations for 2017; all the rest have fallen below expectations.  This doesn’t sound like business as usual, or maybe it sounds like it shouldn’t be.

I’ve blogged enough about the regulatory issues that have distorted the Internet as an ecosystem.  Operators had for decades followed a settlement model among those who cooperatively delivered traffic and services, and that was lost with the Internet.  Before the current blitz on “neutrality”, even many in the Internet space believed that settlement among connecting partners in the Internet was essential; I worked on an RFP to support it about 20 years ago.  Maybe we’ll see a shift in policy, but whether we do or don’t, we really have to look at what the ideal operator model should be.  No matter what the policy shift, you don’t create a vibrant market from trashy ingredients.

The Internet was a transformation in demand.  Before the Internet, we defined a “communication service” as a “connection service”.  After it, we defined a communications service as a “delivery service”, meaning that there was an experience-creating partner that was sourcing something to the user, rather than two users connecting with each other.  Even chat and email has an intermediary experience agent.

What made the Internet great, transforming, was that it was about experiences, then.  It follows that if you want to be big in the Internet, you have to be big in experiences and not just in connection.  Network operators, including telcos and cablecos, are still primarily connection people in the Internet part of their business.  Classical wisdom says that they need to move over to the other side.

Yes, but.  The “but” part is that somebody has to do the connecting.  It’s not enough that we let operators settle among themselves or with content providers (experience providers).  They have to be efficient connection providers because if they’re not, the cost of delivering experiences will rise to the point where a lot of stuff that might happen will become financially impossible.

This means that, given the already-visible profit-per-bit pressure on operators, the first critical task is lowering the cost per bit.  I’ve blogged already that my model shows clearly that software automation of the service lifecycle management tasks could provide immediate relief, and do so at a very modest investment (meaning it would generate a very high ROI).  Furthermore, any work to create competitive elements of experience services would also demand a high level of agility and efficiency, so the tools used to reduce cost could then be used to improve revenues.

One trap that has to be avoided at this point is the “new-services-mean-new-connection-services” trap, which operators seem to fall into with all the zeal of a fly into a flytrap.  I’ve done decades of surveys of businesses, and I know that there are no “new connection services” that have any useful property other than overall cost.  Give users elastic bandwidth and they’ll adopt it as long as it means that by supporting faster bursts some of the time, they can lower capacity and thus net cost overall.  We should have seen this by now, but we seem to be unable to grasp the reality.

Presumably “digital transformation” would lead to some more realistic revenue options, but the main recommendation that proponents of the notion make is that operators focus on “edge computing”.  Edge computing is an infrastructure strategy not a service strategy; you have to deploy something that uses it to add to your revenue line.  What that “something” is, remains maddeningly unclear.

We can be fairly sure that it’s not virtual functions.  Despite all the hype, it’s actually very difficult to address edge device opportunity with cloud-hosted virtual alternatives.  In the business market, you can buy a decent edge firewall and security box for under a thousand dollars, and in the consumer space a residential gateway including WiFi is about fifty dollars.  These devices have a five-year useful life, which means that the annualized cost would be two hundred dollars and ten dollars, respectively.  It’s very hard to see how hosted technology, including the licensing charges from VNF vendors, could meet even the business price point.  In any event, there just aren’t enough businesses to drive an enormous edge deployment.

Then we have another point that’s easy to forget.  Most network equipment vendors aren’t providers of computing in any form.  My model says that if we had optimum deployment of carrier cloud, server technology in some form would account for about 23% of total capex.  Total capex would actually increase by about 9%, which means that the rest would be scavenged from network equipment spending.  Edge computing is a transformation of revenue focus to computing, after all.

We are seeing in the numbers that no network vendor is safe from a contracting total addressable market (TAM).  Even Huawei will have to accept lower growth, and eventually lower revenue, if the pie gets smaller.  Or they and others will have to get into the computing business.

The big take-away from “digital transformation” in networking is that networking becomes a commodity market whether you do it or not.  If there’s no transformation, then operators eventually stop spending freely on infrastructure for lack of ROI.  If there is a transformation, they stop spending freely on network equipment to spend more on hosting experiences.  Vendors in the server space will see an enormous incremental opportunity from carrier cloud, much of which will indeed be “edge computing”.

Right now, I don’t think the server vendors themselves see the real shape of the market as it will develop.  They seem to think that operators will suddenly be illuminated by a dazzling edge-computing insight and rush out to dump billions on edge servers, with no specific opportunity in mind.  The old “Build it and they will come” approach.  Hopefully we’ve advanced beyond taking our inspiration for market evolution from a movie about supernatural baseball, but maybe not.

Could Cisco have a handle on this?  Certainly, Cisco has been more articulate about the need to transform itself than most vendors, but it’s never easy to separate what Cisco really sees and thinks from its (frankly manipulative) marketing.  We’ll have to wait and see how the company’s product strategy really changes, if it does, and whether other vendors will follow suit.  Ericsson and Nokia are two that clearly need to be doing something different, and they might respond to a Cisco initiative.  Might we see a server vendor acquired by a network player?  Could be, or we could see a network player adopt a commodity-server (Open Compute Project) architecture.  Something, I think, is going to happen because Wall Street is the ultimate driver of everything, and they’re getting restless.

Getting Control of the SD-WAN Wave

The topic of SD-WAN just keeps getting hotter.  Juniper, one of the network equipment vendors who hasn’t seen quarterly revenue declines, talked about it a bit on their earnings call.  Here are some interesting quotes to explore: “Telcos are going through a significant transformation as trends such as SD-WAN…” and “So as we talk to our customers around SD-WAN, for example…” and “…it’s around business models and it’s around software. It’s around selling the value through Contrail, virtual security, SD-WAN.”  Five citations in one earnings call.  Interesting?  And recall that Juniper isn’t a household name in SD-WAN either.

The first of these points is important because it’s clear that the network operator segment isn’t as much a factor for Juniper as it has been, or as it is for many of Juniper’s competitors.  Juniper sees that sector as in the midst of a transformation and “…it’s all very much around new architectures, new ways of driving service revenues…” and that “…the new build-out that drive these new modes of service delivery are just going to take some time.”  Clearly, Juniper sees SD-WAN as both a transformational element for the network operators, and a driver of change in itself.

SD-WAN is an edge-driven technology that wrestles VPN services out of a variety of infrastructure options, including the option of transformation of options.  Today, it’s a means of achieving lower costs through partial or complete elimination of operator-provided (usually MPLS) VPN services.  Tomorrow, Juniper thinks, it will be an item in a telco inventory, perhaps a way of generating a “service” that’s independent of infrastructure and thus able to ride out lower-layer transformations like SDN or NFV.  They mention Contrail, their SDN/virtualization offering, in the same sentence as SD-WAN, after all.

Juniper doesn’t say exactly what “customers” it’s talking with about SD-WAN, but given the company’s success with cloud providers and given its specific mention of the cable companies, I think it’s likely that they are already in dialog with both those who would use SD-WAN as an overlay and those who would use it as a part of an infrastructure service—network consumers and network operators, in short.  Juniper has SD-WAN material on both missions on their website.

One reason for all this “SD-WAN-washing” is the same as the reason behind other “washings”.  You always want to tie yourself to something that’s getting good ink.  However, as I pointed out in my blog yesterday, we seem to be seeing a greater recognition of reality in positioning these days, not only from vendors but from the media that wants their advertising.  I think Juniper believes that SD-WAN is important because it’s a decisive way of separating service technology from network technology.

Today, network services are created by coercing native behaviors from the underlying infrastructure.  An IP or Ethernet service emerges because of service features available in an IP or Ethernet network below.  Things like SDN and the MEF’s “Third Network” concept presuppose that services could be offered that are laid on top of infrastructure, with the service-layer features added by the add-on.  This vision would allow operators to reduce, eliminate, or shift their investment in protocol-specific technology within their infrastructure, which of course SDN and the MEF concepts presume is a goal.

Many of the benefits of SD-WAN, like the ability to extend services to smaller sites at lower costs, are attributes of any tunnel-overlay network approach, and there are of course many different tunneling models out there.  Seven or so years ago, there was a rush to work on “pseudowire” technologies that let users build networks over virtual wires that were laid on various infrastructure options, and some of these supported nearly anything.  So you could have accomplished a lot of what SD-WANs are supposed to do for half-a-decade or more.

What the current SD-WAN model does is productize the solution, which is essential given that it’s become clear that operational complexity is a greater risk to cost-effective networking than service costs or capital equipment cost.  Users don’t want to do all the network planning and operations heavy lifting, and a good SD-WAN product is a kind of plug-and-play approach, encapsulating all the features and steps needed and organizing and managing them as a whole.  Service management, independent of network management.

This is a topic that goes back even further than tunnel networking.  I remember doing a CIMI White Paper on service management about 20 years ago.  I’ve worked on the relationship between services and networks in standards groups for almost that long.  What may have happened with SD-WAN is a “vehicularization” of the service layer, and that would give us a way to manage services that can be reduced to managing the devices of that vehicularized service layer.

We’re not out of the woods with this yet.  I mentioned the MEF’s reliance on SD-WAN for its Third Network glue, but the MEF’s own material on SD-WAN takes a very limited view.  The figure that leads off the MEF Wiki defines an SD-WAN as “SDN implemented on WANs”.  I don’t agree with that, and not just for philosophical reasons.  SDN today implies central OpenFlow control of forwarding, and while of course SD-WAN devices or software instances have to be able to forward traffic based on header addresses, there is neither a consensus today that this is done via SDN nor a need to presume that.

It’s clear from their broader description that the MEF is looking at the “infrastructure” model of SD-WAN, one where SD-WAN services are partly or fully reliant on features of the underlying network. Yes, since that is one of the SD-WAN goals, it’s important we understand how that might work.  But first we need to decide on how SD-WAN services work, how they relate to the users and how they are managed as services.

It would be enormously helpful to have a model/MIB defined for the SD-WAN service interface, and another for the underlying network interface(s).  It would similarly be helpful to have a specification for the way that an SD-WAN VNF looked and was managed (by VNFM).  This would be a good time to get something like that going, before the SD-WAN market explodes and it becomes impractical to refit products and software to an emerging (and perhaps long-in-emerging) specification set.

If we know how an SD-WAN looks to a service-layer user, it would even be helpful in framing the way that it might call on infrastructure services.  Certainly service-layer management has to be able to obtain from the user any parameters or behaviors that have to be driven downward into the network.  Even the MEF work would benefit from that input, and so would NFV proponents that want SD-WAN VNFs.

We have, in the SD-WAN space, a rare opportunity to ride a credible market wave, but ride it far enough ahead of the crest that we still have options to standardize and structure things for interoperability and suitability to fit within other technology frameworks that are co-evolving.  I’d like to recommend to the MEF leadership, and to vendors and operators, that we take up that opportunity and run with it.

Are We Seeing Signs of “Realthink” in Networking?

Are we seeing signs of sanity in networking?  Just the other day, Light Reading did a piece on service lifecycle automation.  Yesterday they did one on edge computing.  I’ve been arguing (and blogging) on both topics for quite a while, so it’s no surprise that I think the coverage represents a sane shift.  I doubt that I’m the driver, though.  What is?  I think there are some common forces at work, but also that both the pieces reflect some issues specific to the markets they describe.

The two common factors behind the sanity shift I’m describing are reality in general and the value of a business case in particular.  Our industry is made up of technology vendors, whose advertising drives industry news, analyst views, and so forth.  The fact is that the vendors have always recognized the value of a good technology fable in driving engagement.  You don’t sell products through editorial mentions, after all.  You sell website visits, which sells sales calls, which sell products.  However, engagement only helps drive sales if that last step is taken.  The current set of fables hasn’t done that for the market overall, and that’s what I think is changing the message now.

Most fads in tech are easy to write about and interesting to read about.  Contrast the following.  First, little Dutch boy does well in secondary schools, enters a university to get an engineering degree, learns hydrodynamics and materials, and plans a system of dikes and locks to protect his home.  Second, little Dutch boy sticks his finger in a hole in a dike and saves his home.  Which story do we like, do we hear?  However appealing the second might be, though, it won’t actually accomplish much.  So, reality drives us eventually to the first.

Another attribute of tech fads is that they presume technology is good for technology’s sake.  It’s new, it’s modern, it’s justified.  For a lot of the history of computing and networking, that was true, because we had a period when requirements changed quickly and products had to follow suit.  You could presume that the latest thing did that following-suit better than older ones did.  But we’re not in that period now; most technologies have significant adoption costs and risks, and require significant benefits.  They need a business case, and so inevitably sellers will confront that need.

Let’s look at “automation”, which means “service lifecycle management automation” or the process of making the whole process of service creation—from architect’s or marketeer’s pipe dream to sale and sustaining—a machine-driven activity.  Automation is the alternative to “manualization” or human-action dependence.  You need to automate something if it’s too difficult to do manually, or too expensive.

SDN and NFV were, from the instant of their conception, wholly dependent on automation.  SDN says that a central process controls the forwarding of packets now done through adaptive topology discovery.  How does this central process know about conditions?  Analytics, presumably, operating on collected data.  How do changes in forwarding for a thousand switches get converged on a stable set of routes?  Same thing.  How do you then implement the result of all this analyzing?  Not by humans batting on keys, for sure.

NFV has this problem, and more.  We used to have this physical network function (PNF), and it was a box.  All its features and control elements were inside.  You prepped it by plugging it in and parameterizing it.  Now we’re going to take this PNF and turn it into a chain of hosted virtual network functions (VNFs), each requiring a server deployment and configuration individually, each having to be connected in the chain, and each then having to be managed separately but also collectively.  Is this simplification?  Hardly; VNFs are a lot more complicated than PNFs.  So it follows that VNFs needed operational simplification or they could never pay off.  So it follows we should have had service automation in NFV from Day One.

Then there’s edge computing.  I vividly recall a conversation I had with a US telco executive in 2013, before anyone had really even heard of the concept of edge computing and when NFV was just a gleam in a CTO’s eye.  I asked where he expected to have data centers, and the response was “Everywhere we have real estate!”  That, of course, is what AT&T was saying in the LR article, and what they said on their latest earnings call.

You cannot control real-time processes with computing facilities half-a-continent or more away.  In fact, industrial automation has long shown that you have to be able to control machines with a very short control loop, a few milliseconds or even less.  There is simply no way to do that with any modern network technology, using distant resources.

At 60 mph, a self-drive car travels 88 feet in a second or a bit more than an inch a millisecond.  The speed of a packet in fiber, assuming no handling, is a hundred miles per millisecond, so the round trip would be 50 miles per millisecond.  A thousand miles is ten milliseconds, enough for our car to move a foot.  Add on packet processing and you see the problem, and you’ve not even added the process that’s handling the controlling itself.  So when something comes up like IoT, the question should not be “how do we connect stuff?” but “how do we shorten the control loop?”.

If that’s not enough proof for you, consider that all the big cloud computing players have strategies that are aimed at harnessing local processing to augment the cloud in order to handle events.  Here’s Amazon, who most agree is a pretty savvy company.  They don’t have any credentials in premises computing whatsoever.  In fact, their competitor Microsoft has far better positioning there.  So, what do they do?  Invent Greengrass to shift event loads to the premises where necessary.  For that to be smart, you have to believe it’s essential.

There is absolutely no point to talking about connecting the IoT or about process control, or even about some personalized communications services, if you can’t host the processes that do the work near the point of activity.  There never has been a point.  Thus, it was dumb to think that edge computing wasn’t the essential ingredient in all of this stuff.  Arguably even SDN and NFV need it.

Revolutionary change is hard, complicated, and expensive.  You can trivialize it in a sales pitch or in a 350-word article, but you can’t implement it that way.  We just spent four years promoting a vision of SDN and NFV that fell far short of both expectations and potential, in large part because we never put the full picture before the marketplace.  Sellers wanted buyers to simply adopt and ask questions later, and that didn’t happen.  We’re facing a similar problem with 5G and IoT.

Media recognition of reality is a good sign, because media views are driven by advertising, which means vendors.  Vendors are at the heart of the issue here.  Everyone wants a quick sale that gives them their numbers in the next quarter.  That’s what they wanted in 2013, the time when the current slow decline in revenues for nearly all vendors started.  They didn’t get it then, nor will they get it now.  A quick sale in today’s market is a sale by Huawei.  Everyone else is going to have to do an inspirational, transformational, sale.  Facing the truth early on might be a good way to start it, and perhaps that’s about to happen.

Taking a Deeper Dive into Intent Modeling…and Beyond

One of the topics the people I speak with (and work with) are most interested in is “intent modeling”.  Cisco made an announcement on it (one I blogged on) and the ONF is turning over its intent-model-based northbound interface (NBI) work to the MEF.  Not surprisingly, perhaps, however popular the notion might be, it’s not clearly understood.  I wasn’t satisfied with the tutorials I’ve seen, so I want to explore the concept a bit here.

Intent modeling is obviously a subset of modeling.  In tech, modeling is a term with many uses, but the relevant one deals with virtualization, where a “model” is an abstract representation of something—a black box.  Black boxes, again in popular tech usage, are things that are defined by their visible properties and not by their contents.  It’s what they can do, and not how they can do it, that matters.

It’s my view that the popular tech notion of a model or black box has really been, or should have been, an “intent model” all along.  The difference that’s emerged in usage is that a model in virtualization normally represents an abstraction of the thing that’s being virtualized—a server, for example.  In intent modeling, the abstraction is at a higher level.  A good way to illustrate the difference is that you might use a model called “router” in virtualization, one that could represent either a physical box or a hosted instance of router software.  In strict intent modeling parlance, you’d probably have a model called “IP-Network” that represented the intent to do connectionless forwarding between points based on the IP header.

This point is important in understanding the notion of intent modeling, I think.  The approach, as the original ONF white paper on the topic shows, is to represent how a user system frames requests to a provider system.  Obviously, a user system knows the service of an IP network but not the elements thereof.  However, in a practical sense and in a virtualized-and-real-network world, a single model at the service level doesn’t move the ball much.  In the ONF work, since the intent model is aimed at the NBI of the SDN controller, there’s only one “box” underneath.  In the virtual world, there could be a global network of devices and hosted elements.

The main property of any virtualization model, especially an intent model, is that all implementations of the model are interchangeable; they support the same “intent” and so forth.  It’s up to the implementers to make sure that’s the case, but it’s the critical property that virtualization depends on.  You can see that this has important implications, because if you have a single model for a vast intent (like “network”) then only that vast intent is interchangeable.  You’d have to get a complete model of it to replace another, which is hardly useful.  You need things to be a bit more granular.

To me, then, the second point that’s important about an intent model is that intent models decompose into useful layers.  A “service” might decompose into “access” and “core”, or into “networks” and “devices”.  In fact, any given level of an intent-modeled service should be able to decompose into an arbitrary set of layers based on convenience.  What’s inside an intent model is opaque to the user system, and as long as it fulfills the user-system intent it’s fine.  It’s up to the modeling/decomposition process to pick the path of implementation.

Where I think intent modeling can go awry is in this layer stuff.  Remember that you can substitute any implementation of an intent model.  You want to decompose any layer of a model far enough to be sure that you’ve covered where you expect alternative implementations.  If you have a “router” model, you might want to have a “device-router” and “hosted-router” decomposition path, for example, and perhaps even an “SDN-router” path.  Good management of the modeling hierarchy is critical for good implementation.

It follows that a modeling approach that doesn’t support good management of a hierarchy isn’t going to be optimal.  That means that for those looking for service and network models, even those based on “intent”, it’s vital that you insure your modeling approach can handle versatile hierarchies of decomposition.  It’s also vital that you remember what’s at the bottom of each of the hierarchical paths—real deployment and lifecycle management.

A virtualization or intent model can be “decomposed” into lower-level models, or implemented.  This has to happen at the point where further abstraction isn’t useful in creating interoperability/interchangeability.  If the implementation of a “router” model is a device, for example, then the inside of that lowest level of model is a set of transformations that bring about the behaviors of the “router” that the user system would want to see.  That would probably happen by creating configuration/management changes to the device.  If the implementation is deployment of a software instance of a router, then the implementation would have to include the deployment, loading, configuration, etc.

This is the point where you have to think about lifecycle management.  Any intent model or virtualization model has to be able to report status, meaning that an implicit parameter if any layer of model is a kind of SLA representing expectations on the properties of the element being modeled.  Those could be matched to a set of parameters that represent the current delivery, and both decomposition and implementation would be responsible for translating between the higher-level “intent” and whatever is needed for the layer/implementation below.

The challenge with lifecycle management in a model-driven hierarchy is easy to see.  If Element B is a child of Element A, and so is Element C, then the state of A depends on the combined states of B and C.  How does A know those states?  Remember, this is supposed to be a data model.  One option is to have actual dedicated service-specific software assembled based on the model structure, so there’s a program running “A” and it can query “B” and “C”.  The other option is to presume that changes in the state of all the elements in a hierarchical model are communicated by events that are posted to their superior objects when needed.  “B” and “C” can then generate an event to “A”.

Intent modeling will surely help interoperability, because implementations of the same “intent” are more easily interchangeable.  It doesn’t necessarily help service lifecycle automation because intent modeling is a structure of an API.  That means it’s a process, a program component.  The trick in service automation is to create a description of event-to-process linkages.  Thus, data-driven event handling combines with intent-modeled processes to create the right answer.

This is the situation I think the TMF great-thinkers had in mind with their NGOSS Contract data-drives-events-to-processes notion.  It’s what I believe is the end-game for intent modeling.  If you can model not only the structure of the service but the way that processes handle lifecycle events, then you can truly automate the service.  I’ve fiddled with various approaches for a very long time (almost ten years at this point) and I came to that conclusion very quickly.  I’ve not been able to validate other options, but the market has to make its own decision here—hopefully soon.

What is a Smart City and Do They Have a Chance?

We read a lot about smart cities these days, but like many popular topics there’s a surprising lack of consistency in assigning a meaning to the term.  In fact, only about half the attributes of smart cities that governments and network operators name are recognized by at least two-thirds of the people I talk with.  Many of the things I think are fundamental to smart cities have far less recognition than that.

At a very high level, a “smart city” is one that employs information technology to optimize its own services and assets, and help its occupants (people and businesses) do the same.  The problem with this baseline is that many cities would fit the definition even if they did nothing at all.  Where the problem of meaning and consistency comes in is in figuring out how the baseline could be made into something meaningful.  Some of that problem arises from the usual “eye of the beholder” syndrome we see in tech—all vendors and service providers want to see a definition that fits what they do/sell.  More comes from a lack of a top-down view of the problems and opportunities.

Every city likely employs information technology to run itself; even small businesses rarely can avoid computer use.  Major cities are somewhat like large enterprises, though of course their geographic scope is constrained more.  They use a lot of IT, but in my ongoing surveys of both governmental and private IT, I have noticed that cities are more compartmentalized in their IT, meaning that cities operate a bit more like industrial conglomerates, with IT silos that represent various departments and a smaller central IT process that unites that which has to be united, largely relating to things like employee services, tax collection, and cost management.

I had the opportunity to get some input from city managers (or the equivalent, the COO in effect of city operations) on the topic.  They agree that improving their IT use would improve efficiency and cut costs.  In fact, they cite five specific areas where they thing “smartness” should be considered, with surprising consistency.  They do disagree a bit on the priority and the chances of a significantly positive outcome.

The first area noted, with a slightly larger number of managers citing it than the others, is improved use of analytics and data mining.  Almost every city manager thought that the commercial sector was far ahead of cities in this area.  Most of them think that data mining has the largest potential to improve operations, even more than IoT, and all of them thought it would likely require the least investment.

Why not do it, then?  The managers cite a number of problems.  First, the databases involved are often archaic and proprietary rather than standard structures.  Data mining software would have to be customized to work with them.  Second, the applications and data are often not centralized, so there’s no single place you could go to get at everything.  Third, there are a bewildering number of different regulations regarding the way that municipal data can be used.  Finally, there’s the question of how data mining and analytics could be budgeted.

About half the major metro areas in the US report that they are in the process of modernizing their own application base, and as a collateral benefit this would likely move them to a point where data mining was much easier.  Most managers think that modernization of their apps would erase the technical barriers, but the budget problem remains.  City budgeting is almost always done on a department basis, with the heads jealously guarding their prerogatives.  Getting consensus on spending on a cross-department tool like analytics/data mining would be challenging.

The second smart opportunity cited was self-service portals for residents and local businesses.  City managers say that they’re well behind businesses in offering their “customers” direct online access to things.  Many of the issues I’ve already noted are cited by managers as inhibitors in this opportunity area, but they expressed the greatest concern over the problem of security.  There are specific laws regarding the confidentiality of personal information, which has led to some concerns over whether a portal would open the city to hacking of personal and business information.

This particular opportunity does seem to be moving forward, though.  All of the city manager types I’ve talked with say that they are expanding what is available to their residents and businesses via online portals.  About a third say they’re exploring the use of cloud services to facilitate this, though there are still questions about maintaining data privacy according to local (and state, and Federal) laws.

Opportunity number three was mobile empowerment.  The percentage of the workforce that’s mobile and empowerable in cities isn’t much different from the percentage across commercial businesses (about 19% for cities versus about 24% in enterprises, considering “mobile” to mean that the worker is away from their regular desk/place of operation at least 15% of the time, and “empowerable” meaning the worker has a job that requires information access or generation).  The extent to which empowerment has even begun falls far short of commercial business standards.

There’s a lot of fuzz in the responses to the “where are you with this?” question.  About a quarter of city managers say they have some mobile empowerment strategies in place, most of whom say that it’s a feature of commercial software they use somewhere.  There doesn’t seem to be a broad mission to create mobile front-ends for all applications, and this is likely because of the separation of departments common in city government.  Who pays?

Opportunity number four was improved collaboration and collective decision-making.  This opportunity seemed to be down the list a bit (but remember that there wasn’t an enormous difference in support among the five listed areas), in part because city managers have noted that their departments tend to operate more autonomously and because the narrow geography of cities favors face-to-face meetings.

What seems to be most interesting here is what you could call “consultation” more than “collaboration”.  The implication is that it’s a two-party process, usually invoked by a worker with either a supervisor or an expert in a related area.  The specifics all seem to tie into mobile workers, however, and so this is seen often as related to that category of opportunity too.  Most city managers have seen this as a unified communications task, which is a departure from the view of commercial businesses who see it as relating to the application that’s spawning the questions and needs.  In any event, progress is slow.

The final opportunity was “Internet of Things” or IoT.  Needless to say, there was a lot of interest in this, but many city managers believe that the specific applications of, benefits from, and cost for implementation of IoT are all too vague at this point.  They can see some credible “things” that could be connected to generate a benefit (online meter readings are already at least in testing in some areas, for example), but areas like traffic sensors and computer control of traffic signals, a favorite media topic, seems to pose a lot of cost and risk and it’s been difficult to quantify the rewards.

Nobody wants to say that they’re not doing IoT, whether they’re in city government or business.  However, if you try to pin down specifics in both areas, what you find is some special projects that in the minds of many might not be “IoT” at all.  For example, is a local RFID or Bluetooth meter reading mechanism “IoT”?  Making an entire city intelligent by adding sensors on everything that can be measured and controllers on everything that can be tweaked, seems to be a long-term interest but not a near-term priority.

The sum of my discussions is clear; there’s not much progress in smartening cities overall, unless we pull back on our notions of what a smart city really has and does.  The biggest problem, which city managers are understandably reluctant to discuss, is the politics of funding.  Capital projects of any magnitude pose a political risk, and the more money that’s involved and the more city departments that are impacted, the more likely it is that the idea will get a few kisses blown at it, and passed on for later discussion.

Vendor initiatives can help accelerate things, according to city managers, but for larger cities there’s doubt that these initiatives could result in even a single element of smart-city modernization, just one of our five opportunities.  Could they address them all?  Not without considerable political and financial support from the cities themselves, meaning the governing, elected, officials.  That, they think, isn’t likely to develop as long as the smart-city concept is hot in the tech space and not elsewhere.  Public support means publicity on a broader scale.

Whether the “smart cities” hype helps develop support is an area where city managers are fairly evenly split.  They say that publicity can help develop public interest and internal support for change, but also that it can raise expectations and set unrealistic targets.  Nearly all of them say that there has been more discussion since the concept of smart cities started getting publicity, but nearly all say that progress is still limited.

The good news is that all five of the opportunity areas get support from all the city managers I’ve talked with.  There is interest here, and perhaps even the beginning of a willingness to proceed.  What’s objectively lacking is the benefit case.  Sound familiar?

More Signs of a Maturing Model of the Cloud

In just the last week, we’ve had cloud-related announcements that seem to suggest a drive toward harmonizing cloud and data center around a single architecture.  Amazon has an alliance with VMware, Microsoft is further improving compatibility and synergy between Azure and its data center elements, Google is expanding its Nutanix relationship for data center harmony, and Oracle is touting its Cloud at Customer offering.  What’s up here?

First and foremost, virtually all cloud providers realize that moving applications to the cloud isn’t going to bring much cloud success.  The future of the cloud is applications that are developed to exploit the cloud, meaning new development.  Those applications, because they do focus on cloud-specific benefits, usually employ cloud services hosted by the provider, beyond simple IaaS and maybe some database stuff.  Thus, the public cloud has been gradually turning more “PaaS-like” in its software model.

The second issue is that exploiting the cloud doesn’t mean moving everything to it.  There are a bunch of good reasons why companies will drag their feet with cloudifying many elements of current and future applications.  The future cloud is a hybrid, in short.  But if that’s true, then how do you deal with the cloud-hosted features you’ve come to rely on, when the piece of application you’re looking at has to run in your own data center?

Microsoft, whose Azure PaaS platform always had a lot of affinity with its data center Windows Server stuff, has been quietly gaining traction as enterprises realize that in the cloud era, it’s really going to be about creating apps that are part-cloud and part-data-center.  With the advent of IoT and increased emphasis on event processing, a data center presence gave Microsoft a way of adding at-the-edge handling of short-control-loop event apps, and Amazon was forced to offer its Greengrass unload-to-the-customer edge strategy as a counterpoint.  All the other stuff I cited above continues this trend.

For all the interest in this kind of hybridization, there’s no real consensus on just what it requires in terms of features, and even on whether you achieve cloud/data-center unity by pushing pieces of data center features into the cloud, pulling cloud features into the data center, or both.  All of the current fusions of cloud and data center seem to be doing a little of both, preparing perhaps for the market to make its own requirements clear.

That may take a while.  The enterprises I’ve talked with believe that applications for the future hybrid cloud are emerging, but there’s a bit of chicken-and-egg tension happening.  It’s difficult for enterprises to commit to a strategy for which there’s no clear implementation consensus.  It’s difficult for that consensus to arise without somebody committing to something, and in decent numbers.  The vendors will probably have to take some initiative to drive things forward.

The Amazon/VMware deal is probably the one with the greatest potential to drive market change, given Amazon’s dominance in the public cloud.  Unfortunately, we don’t have anything more than rumor on what the deal includes at this point.  The story I’ve heard is that Amazon would provide a VMware-based hosting capability for many or all of the AWS web services it offers in the cloud.  This would roughly mirror the Azure Stack notion of Microsoft.

Next on my list of influence drivers is the Google deal with Nutanix, largely because it embodies a function transfer from data center to cloud and not the other way around.  Nutanix is best known as a VMware competitor in on-prem virtualization, the subject of a few spats with VMware over time.  If Google wants to create a functional hybrid with feature migration, they need to have a partner who is interested.  Amazon’s dealings with VMware have already created a bridge into AWS from VMware, so it makes sense for Google to start with that as well.

At the very least, all of this demonstrates that you can’t have “public cloud” as a polar opposite of the data center.  At the most, it suggests that the cloud and the data center have to be in a tight enough partnership to require feature-shifting among the two.  If that’s the case, then it impacts how we design applications and also how clouds and data centers interconnect at the network level.  Either of these impacts would probably delay widespread adoption of a highly symbiotic cloud/data center application model.

That seems to be what Google, at least, expects.  The first phase of their Nutanix deal, which lets apps migrate from the data center into Google’s cloud, isn’t supposed to be ready till next year.  However, remember that Google has a lot more edge-like resources in their public cloud than anyone else, and they also have lower latency among the various hosting points in the Google cloud.  Thus, they could satisfy edge-event-processing requirements more easily in their own cloud than most other cloud providers.

What about those changes to the application development process and the network connectivity between cloud and data center?  Let’s summarize those two issues in order.

The goal of “new” application designs should be to separate the flow of transactions so that critical data processing and storage steps will be toward the end of each flow, which can then be hosted in the data center.  The front-end processes that either don’t need to access repository data at all, or can access read-only versions, could then be cloud-hosted.  It’s also possible that front-end processes could use summary databases, or even forego database access.  For example, it might be possible to “signal” that a given commodity is in inventory in sufficient quantity to presume that transactions to purchase it can go through.  Should levels fall too low, the front-end could be “signaled” that it must now do a repository dip to determine whether there’s stock, which might move that application component back along the workflow into the data center.

On the network side, cloud computing today is most often connected as a remote application via the Internet.  This isn’t going to cut it for highly interreactive cloud components that live in the data center sometimes too.  The obvious requirement is to shift the cloud relationship with the VPN to one of total efficient membership.  In effect, a cloud would be treated as another data center, connected with “cloud DCI” facilities.  Components of applications in the cloud would be added to the VPN ad hoc, or would be hosted on a private IP address space that’s then NATed to the VPN space.

Google has the smartest approach to next-gen cloud platforms of anyone out there, in my view.  They have the smartest view of what a next-gen network looks like too.  Are they now, by taking their time in creating a strong data center hybrid strategy, risking the loss of enterprises because the next-gen applications and network models for a hybrid could be developed before Google is an effective player?  That could be an interesting question.

Also interesting is the question of whether there’s a connection between all of this and Juniper’s decision to pick up Bikash Koley, a well-known Google-networking expert who played a strong role in the development of Google’s network/SDN approach.  Might Juniper want to productize the Google model (which, by the way, is largely open)?  We’ll see.

One thing is for sure; the conception of the cloud is changing.  The new one, which is what the conception should have been all along, is realistic and could drive a trillion-dollar cloud market.  For the first time, we might see an actual shift in computing, away from the traditional model.  For vendors who thought that their problems with revenue growth were due to the cloud, this isn’t going to be good news.  The cloud is just getting started, and it’s going to bring about a lot of changes in computing, software, and networking.

What Ericsson is Signaling about the Networking Industry

According to Light Reading, a senior Ericsson exec doesn’t think that 5G will kickstart telecom spending.  Ericsson also issued a profit warning, causing its stock to take a big hit.  That this is even a surprise is hard for me to understand, frankly.  Telcos have been telling me for years that they couldn’t continue to invest in infrastructure with their profit per bit declines.  That means they’ll spend less on vendors.  Even the notion that 5G would save things is baseless; technology advances don’t improve profits just because they’re “advances”, and linking 5G to a business case has been challenging.

Don’t count out 5G, though.  The value of 5G is less in its ability to drive change as in its potential to focus it.  Most operators have planned for 5G evolution, to the point where advance funds are being set aside for spending as early as 2018, and growing through 2022.  One of the challenges in transformation is finding a vehicle to showcase early activity, because rarely do technology shifts create benefits as fast as they drive costs.  So, Ericsson is right in a sense, but perhaps missing an opportunity in another.

There are really only two product areas that are assured budgeting in the next five years—wireless and fiber (particularly metro).  We are going to see a lot of incremental spending in these areas even if we don’t see a specific technology transformation like 5G.  Vendors who have strong assets in either space have an inside track in presenting a broader strategy, one that could address the problem of declining profit per bit and the growing interest in a new model for networking.

In the 5G space, the near-term opportunity is metro fiber deployment aimed at enhancing FTTN deployments with RF tail circuits to replace copper.  The one application for 5G that operators really like, and that they’re prepared to invest in quickly, is that FTTN-tail mission.  Everyone in the telco space is concerned about the race for residential bandwidth, a race that cable companies with their CATV infrastructure are better prepared to run, at least in the downstream capacity sense.  FTTH, like Verizon’s FiOS, isn’t practical for more than about a third of households, even presuming better passive optical technology down the road.  5G RF tails on FTTN would be a good solution.

5G-FTTN would obviously drive up metro bandwidth needs, but it could also pull through at least some additional 5G features, like network slicing to separate wireline broadband from mobile use of the same remotes.  Slicing might also be useful for operators who want to offer IPTV separate from the Internet.  SDN could well be accelerated by 5G-FTTN too, to provide efficient connection between metro content cache points and customers.  Even NFV might benefit, particularly if the 5G-FTTN remotes were applied to business sites.

Fiber players have an even better shot, though.  At the easy level, lower-cost fiber capacity with greater resiliency and agility (via agile optics) could reduce operations costs directly by reducing the number of “actionable events” that the higher layers see.  The big and still unanswered question of fiber deployment is the extent to which fiber could change the way services relate to infrastructure.  Could you combine fiber and SDN to create electro-optical virtual wires that would separate services and even customers?  Could that reduce reliance on traditional L2/L3, and the need for security devices?

A combination of fiber, SDN/electrical virtual wires, and hosted switch/router instances could build virtually all business services and also frame a different model of broad public services like the Internet.  The result could be a significant reduction in L2/L3 capex and operations cost and complexity.  My model says that you could largely resolve the profit-per-bit problem for 20 years or more, simply by combining service lifecycle automation and virtual-wire-and-hosted-instance service-layer infrastructure.

All this frames what may be the real problem for Ericsson.  We have fiber players—Ciena, Infinera, ADVA.  We have mobile players, like Nokia.  Just what kind of player is Ericsson?  They don’t have a strong device-and-technology focus, which means that they don’t have a natural way of engaging with the buyer, a foothold technology that could be leveraged to bigger and better things.

Professional services are a great way to turn limited product offerings into broader engagements, but you have to be able to present a minimum product offering to take advantage of that.  If Ericsson stands for anything at the product level, it would probably have to be software, and yet they’re not known for that either.  Either they have to make themselves the first real “network software” company, or they have to spend a lot of marketing capital making a service-and-integration-based model into the centerpiece for the network of the future.

The same problem exists at various levels for the other vendors, of course.  You can think of optical networking as selling more fiber, without facing the overall shifts that would drive the buyer to consume it.  You can think of 5G as a dry set of standards whose adoption (presumably simply because they’re “newer” than 4G) will be automatic, and never see the business cases that you’ll somehow have to support.  In those cases, you’re stuck with a limited model of your business that can succeed only if none of your competitors do things better.

The biggest problem network vendors face is in the L2/L3 area, where people like Cisco and Juniper now live.  There is nothing ahead for L2/L3 technology except commoditization or replacement by a virtual-wire-and-hosting model.  Cisco has hosting capability, and I think they understand that they have to change their business model.  Juniper still rides the limited data center networking trend, because they’re small enough to live there.  Neither has really faced the longer-term reality yet, which is that you can’t support the end game of network infrastructure evolution if you don’t play in the deals that drive it.

We are, in networking, facing the most significant set of changes that have ever been presented, far more significant than the transformation from TDM to IP and the Internet.  We are rebuilding society, civilization, around network technology.  That this would create enormous opportunity is a given; that the network vendors will fail to recognize it isn’t yet a given, but we’re running out of time.  That’s what Ericsson proves, to themselves and to the rest of the industry.

The Tangled Web of OSS/BSS Modernization

I had an opportunity to chat with some insightful network operator CIO staff types, the topic being the near-and-dear one of “What should the future of OSS/BSS be?”  I’ve noted in some past blogs that there’s a surprising diversity of viewpoints here, ranging from the “throw the bums out!” model to one of gradual evolution.  There may also be an emerging consensus, at least on some key points.

OSS/BSS systems are the network operator equivalent of “core applications”, similar to demand deposit accounting (DDA, meaning retail banking) for banks or inventory management for retailers.  Like the other named applications, OSS/BSS emerged as a traditional transactional model, largely aimed at order management, resource management, and billing.

Three forces have been responsible for the changing view of OSS/BSS.  One is the desire of network operators to avoid being locked in to products from a single vendor.  Most early OSS/BSS systems were monolithic; you bought one and used all of it.  That was no more popular in the networking industry than lock-in has been for any other vertical.  The second is the increased desire for customer self-care and the need to support online portals to provide for it.  The final one is the combination of increased complexity in resource control and decreased complexity in billing.  We used to have call journaling and now we have one-price-unlimited calling.  We used to have fixed bandwidth allocation and now we have packet networks with almost nothing fixed in the whole infrastructure.

The reason these forces are important is that they’ve operated on the same OSS/BSS market but taken it in different directions.  The lock-in problem has led to a componentized model of operations, with at least some open interfaces and vendor substitution.  That doesn’t necessarily alter the relationship between OSS/BSS and the business of the operators.  The self-care issue has led to the building of front-end technology to generate what used to be customer-service transactions as direct-from-user ones.  This has likewise not altered fundamentals much.

It’s the third force that’s been responsible for most of the talk about changes to OSS/BSS.  As networks moved from simple TDM to complicated, multi-layered, packet, the process of “provisioning”, the meaning of a “service level agreement” and even what customers are billed for have all changed.  The new OSS/BSS vision is the result of these individual drives, and more.  But what is that vision?

If you listen to conferences and read the media sites, the answer is probably “event-driven”.  I think there’s merit to the approach, which says in effect that a modern operations process has to be able to respond to a lot of really complex stuff, ranging from changes in the condition of services based on shared resources (packet networks, server farms, etc.) to changes in the market environment and competition.  Each change, if considered an “event”, could drive an operations component to do something.

Event-driven OSS/BSS could also take componentization and elimination of lock-in to a new level.  Imagine a future where every OSS/BSS structure is fixed, meaning that the processes that align with each service state and event are defined.  You could buy a single process for best-of-breed ultimacy.  Imagine!

This is a big change, though.  The question my OSS/BSS pundits were struggling with is whether you really need an event-driven OSS/BSS at all, or whether you need to somehow shortstop events so they never impact operations.  Can the networks themselves manage their own events?  Can service composition and lifecycle management be separated from “events” and kept largely transactional?  Could we avoid lock-in by simply separating the OSS/BSS into a bunch of integrated applications?  It might all be possible.

The primary near-term issue, according to experts, is insulating the structure of OSS/BSS from the new complexities of virtualization.  Doing that is fairly straightforward architecturally; you define the network as a small number (perhaps only one) virtual devices that provide a traditional MIB-like linkage between the network infrastructure and the OSS/BSS.  Then you deal with the complexities of virtualization inside the virtual device itself.  This is applying the intent-model principle to OSS/BSS modernization.

My OSS/BSS contacts say that this approach is actually the default path that we’re on, at least in one sense.  The way that SDN and NFV are depicted as working with OSS/BSS presumes a traditional interface, they say.  The problem is that the rest of the requirement, namely that there be some network-management process that carries the load of virtualization, hasn’t been addressed effectively yet.

The second issue, so the OSS/BSS experts say, is the problem of silos at the operations application level.  Everyone wants to sell their own suite.  In theory, that could be addressed by having everyone comply with TMF specifications and interfaces, but the problem is more complicated than that.  In order for there to be interoperability among disjointed components, you have to obey common functional standards for the components (they have to do the same thing), a common data model, and common interface specifications.  You also have to sell the stuff on a component basis.  Operators say that none of these are fully supported today.

The logical way to deal with things is probably to define a repository model and presume that the applications all work with that repository in some way.  However, operators who want some specialized tuning of data structures to accommodate the way they offer services, bill for them, etc. might have some issues with a simple approach.

It’s fair to ask whether the TMF could do what’s needed here, and the answer you get from operators is mixed.  There is a general view that the TMF perhaps does too much, meaning that its models and structures go further than needed in standardizing operations software and databases, and by doing so limits utility and agility.  All of the experts I chatted with believed that the TMF specifications were too complicated, too.  Almost all of them said that OSS/BSS specifications needed to be open to all, and the majority said that there should be open-source implementations.

Which, most say, we might end up with, and fairly soon.  The challenges of virtualization have led to a displacement of formal standardization by open-source projects.  That same thing could happen for OSS/BSS, and the experts said that they believed the move to open-source in operations would naturally follow the success of an open model for virtualization and service lifecycle management inside that virtual-device intent model.  They point to ONAP as a logical place for this.

I’ve been involved in telecom operations for decades, and I’ve learned that there is nothing in networking as inertial as OSS/BSS.  A large minority of my experts (but still a minority) think that we should scrap the whole OSS/BSS model and simply integrate operations tasks with the service models of SDN and NFV orchestration.  That’s possible too, and we’ll have to wait to see if there’s a sign that this more radical approach—which would really be event-driven—will end up the default path.