Taking a Deeper Dive into Intent Modeling…and Beyond

One of the topics the people I speak with (and work with) are most interested in is “intent modeling”.  Cisco made an announcement on it (one I blogged on) and the ONF is turning over its intent-model-based northbound interface (NBI) work to the MEF.  Not surprisingly, perhaps, however popular the notion might be, it’s not clearly understood.  I wasn’t satisfied with the tutorials I’ve seen, so I want to explore the concept a bit here.

Intent modeling is obviously a subset of modeling.  In tech, modeling is a term with many uses, but the relevant one deals with virtualization, where a “model” is an abstract representation of something—a black box.  Black boxes, again in popular tech usage, are things that are defined by their visible properties and not by their contents.  It’s what they can do, and not how they can do it, that matters.

It’s my view that the popular tech notion of a model or black box has really been, or should have been, an “intent model” all along.  The difference that’s emerged in usage is that a model in virtualization normally represents an abstraction of the thing that’s being virtualized—a server, for example.  In intent modeling, the abstraction is at a higher level.  A good way to illustrate the difference is that you might use a model called “router” in virtualization, one that could represent either a physical box or a hosted instance of router software.  In strict intent modeling parlance, you’d probably have a model called “IP-Network” that represented the intent to do connectionless forwarding between points based on the IP header.

This point is important in understanding the notion of intent modeling, I think.  The approach, as the original ONF white paper on the topic shows, is to represent how a user system frames requests to a provider system.  Obviously, a user system knows the service of an IP network but not the elements thereof.  However, in a practical sense and in a virtualized-and-real-network world, a single model at the service level doesn’t move the ball much.  In the ONF work, since the intent model is aimed at the NBI of the SDN controller, there’s only one “box” underneath.  In the virtual world, there could be a global network of devices and hosted elements.

The main property of any virtualization model, especially an intent model, is that all implementations of the model are interchangeable; they support the same “intent” and so forth.  It’s up to the implementers to make sure that’s the case, but it’s the critical property that virtualization depends on.  You can see that this has important implications, because if you have a single model for a vast intent (like “network”) then only that vast intent is interchangeable.  You’d have to get a complete model of it to replace another, which is hardly useful.  You need things to be a bit more granular.

To me, then, the second point that’s important about an intent model is that intent models decompose into useful layers.  A “service” might decompose into “access” and “core”, or into “networks” and “devices”.  In fact, any given level of an intent-modeled service should be able to decompose into an arbitrary set of layers based on convenience.  What’s inside an intent model is opaque to the user system, and as long as it fulfills the user-system intent it’s fine.  It’s up to the modeling/decomposition process to pick the path of implementation.

Where I think intent modeling can go awry is in this layer stuff.  Remember that you can substitute any implementation of an intent model.  You want to decompose any layer of a model far enough to be sure that you’ve covered where you expect alternative implementations.  If you have a “router” model, you might want to have a “device-router” and “hosted-router” decomposition path, for example, and perhaps even an “SDN-router” path.  Good management of the modeling hierarchy is critical for good implementation.

It follows that a modeling approach that doesn’t support good management of a hierarchy isn’t going to be optimal.  That means that for those looking for service and network models, even those based on “intent”, it’s vital that you insure your modeling approach can handle versatile hierarchies of decomposition.  It’s also vital that you remember what’s at the bottom of each of the hierarchical paths—real deployment and lifecycle management.

A virtualization or intent model can be “decomposed” into lower-level models, or implemented.  This has to happen at the point where further abstraction isn’t useful in creating interoperability/interchangeability.  If the implementation of a “router” model is a device, for example, then the inside of that lowest level of model is a set of transformations that bring about the behaviors of the “router” that the user system would want to see.  That would probably happen by creating configuration/management changes to the device.  If the implementation is deployment of a software instance of a router, then the implementation would have to include the deployment, loading, configuration, etc.

This is the point where you have to think about lifecycle management.  Any intent model or virtualization model has to be able to report status, meaning that an implicit parameter if any layer of model is a kind of SLA representing expectations on the properties of the element being modeled.  Those could be matched to a set of parameters that represent the current delivery, and both decomposition and implementation would be responsible for translating between the higher-level “intent” and whatever is needed for the layer/implementation below.

The challenge with lifecycle management in a model-driven hierarchy is easy to see.  If Element B is a child of Element A, and so is Element C, then the state of A depends on the combined states of B and C.  How does A know those states?  Remember, this is supposed to be a data model.  One option is to have actual dedicated service-specific software assembled based on the model structure, so there’s a program running “A” and it can query “B” and “C”.  The other option is to presume that changes in the state of all the elements in a hierarchical model are communicated by events that are posted to their superior objects when needed.  “B” and “C” can then generate an event to “A”.

Intent modeling will surely help interoperability, because implementations of the same “intent” are more easily interchangeable.  It doesn’t necessarily help service lifecycle automation because intent modeling is a structure of an API.  That means it’s a process, a program component.  The trick in service automation is to create a description of event-to-process linkages.  Thus, data-driven event handling combines with intent-modeled processes to create the right answer.

This is the situation I think the TMF great-thinkers had in mind with their NGOSS Contract data-drives-events-to-processes notion.  It’s what I believe is the end-game for intent modeling.  If you can model not only the structure of the service but the way that processes handle lifecycle events, then you can truly automate the service.  I’ve fiddled with various approaches for a very long time (almost ten years at this point) and I came to that conclusion very quickly.  I’ve not been able to validate other options, but the market has to make its own decision here—hopefully soon.

What is a Smart City and Do They Have a Chance?

We read a lot about smart cities these days, but like many popular topics there’s a surprising lack of consistency in assigning a meaning to the term.  In fact, only about half the attributes of smart cities that governments and network operators name are recognized by at least two-thirds of the people I talk with.  Many of the things I think are fundamental to smart cities have far less recognition than that.

At a very high level, a “smart city” is one that employs information technology to optimize its own services and assets, and help its occupants (people and businesses) do the same.  The problem with this baseline is that many cities would fit the definition even if they did nothing at all.  Where the problem of meaning and consistency comes in is in figuring out how the baseline could be made into something meaningful.  Some of that problem arises from the usual “eye of the beholder” syndrome we see in tech—all vendors and service providers want to see a definition that fits what they do/sell.  More comes from a lack of a top-down view of the problems and opportunities.

Every city likely employs information technology to run itself; even small businesses rarely can avoid computer use.  Major cities are somewhat like large enterprises, though of course their geographic scope is constrained more.  They use a lot of IT, but in my ongoing surveys of both governmental and private IT, I have noticed that cities are more compartmentalized in their IT, meaning that cities operate a bit more like industrial conglomerates, with IT silos that represent various departments and a smaller central IT process that unites that which has to be united, largely relating to things like employee services, tax collection, and cost management.

I had the opportunity to get some input from city managers (or the equivalent, the COO in effect of city operations) on the topic.  They agree that improving their IT use would improve efficiency and cut costs.  In fact, they cite five specific areas where they thing “smartness” should be considered, with surprising consistency.  They do disagree a bit on the priority and the chances of a significantly positive outcome.

The first area noted, with a slightly larger number of managers citing it than the others, is improved use of analytics and data mining.  Almost every city manager thought that the commercial sector was far ahead of cities in this area.  Most of them think that data mining has the largest potential to improve operations, even more than IoT, and all of them thought it would likely require the least investment.

Why not do it, then?  The managers cite a number of problems.  First, the databases involved are often archaic and proprietary rather than standard structures.  Data mining software would have to be customized to work with them.  Second, the applications and data are often not centralized, so there’s no single place you could go to get at everything.  Third, there are a bewildering number of different regulations regarding the way that municipal data can be used.  Finally, there’s the question of how data mining and analytics could be budgeted.

About half the major metro areas in the US report that they are in the process of modernizing their own application base, and as a collateral benefit this would likely move them to a point where data mining was much easier.  Most managers think that modernization of their apps would erase the technical barriers, but the budget problem remains.  City budgeting is almost always done on a department basis, with the heads jealously guarding their prerogatives.  Getting consensus on spending on a cross-department tool like analytics/data mining would be challenging.

The second smart opportunity cited was self-service portals for residents and local businesses.  City managers say that they’re well behind businesses in offering their “customers” direct online access to things.  Many of the issues I’ve already noted are cited by managers as inhibitors in this opportunity area, but they expressed the greatest concern over the problem of security.  There are specific laws regarding the confidentiality of personal information, which has led to some concerns over whether a portal would open the city to hacking of personal and business information.

This particular opportunity does seem to be moving forward, though.  All of the city manager types I’ve talked with say that they are expanding what is available to their residents and businesses via online portals.  About a third say they’re exploring the use of cloud services to facilitate this, though there are still questions about maintaining data privacy according to local (and state, and Federal) laws.

Opportunity number three was mobile empowerment.  The percentage of the workforce that’s mobile and empowerable in cities isn’t much different from the percentage across commercial businesses (about 19% for cities versus about 24% in enterprises, considering “mobile” to mean that the worker is away from their regular desk/place of operation at least 15% of the time, and “empowerable” meaning the worker has a job that requires information access or generation).  The extent to which empowerment has even begun falls far short of commercial business standards.

There’s a lot of fuzz in the responses to the “where are you with this?” question.  About a quarter of city managers say they have some mobile empowerment strategies in place, most of whom say that it’s a feature of commercial software they use somewhere.  There doesn’t seem to be a broad mission to create mobile front-ends for all applications, and this is likely because of the separation of departments common in city government.  Who pays?

Opportunity number four was improved collaboration and collective decision-making.  This opportunity seemed to be down the list a bit (but remember that there wasn’t an enormous difference in support among the five listed areas), in part because city managers have noted that their departments tend to operate more autonomously and because the narrow geography of cities favors face-to-face meetings.

What seems to be most interesting here is what you could call “consultation” more than “collaboration”.  The implication is that it’s a two-party process, usually invoked by a worker with either a supervisor or an expert in a related area.  The specifics all seem to tie into mobile workers, however, and so this is seen often as related to that category of opportunity too.  Most city managers have seen this as a unified communications task, which is a departure from the view of commercial businesses who see it as relating to the application that’s spawning the questions and needs.  In any event, progress is slow.

The final opportunity was “Internet of Things” or IoT.  Needless to say, there was a lot of interest in this, but many city managers believe that the specific applications of, benefits from, and cost for implementation of IoT are all too vague at this point.  They can see some credible “things” that could be connected to generate a benefit (online meter readings are already at least in testing in some areas, for example), but areas like traffic sensors and computer control of traffic signals, a favorite media topic, seems to pose a lot of cost and risk and it’s been difficult to quantify the rewards.

Nobody wants to say that they’re not doing IoT, whether they’re in city government or business.  However, if you try to pin down specifics in both areas, what you find is some special projects that in the minds of many might not be “IoT” at all.  For example, is a local RFID or Bluetooth meter reading mechanism “IoT”?  Making an entire city intelligent by adding sensors on everything that can be measured and controllers on everything that can be tweaked, seems to be a long-term interest but not a near-term priority.

The sum of my discussions is clear; there’s not much progress in smartening cities overall, unless we pull back on our notions of what a smart city really has and does.  The biggest problem, which city managers are understandably reluctant to discuss, is the politics of funding.  Capital projects of any magnitude pose a political risk, and the more money that’s involved and the more city departments that are impacted, the more likely it is that the idea will get a few kisses blown at it, and passed on for later discussion.

Vendor initiatives can help accelerate things, according to city managers, but for larger cities there’s doubt that these initiatives could result in even a single element of smart-city modernization, just one of our five opportunities.  Could they address them all?  Not without considerable political and financial support from the cities themselves, meaning the governing, elected, officials.  That, they think, isn’t likely to develop as long as the smart-city concept is hot in the tech space and not elsewhere.  Public support means publicity on a broader scale.

Whether the “smart cities” hype helps develop support is an area where city managers are fairly evenly split.  They say that publicity can help develop public interest and internal support for change, but also that it can raise expectations and set unrealistic targets.  Nearly all of them say that there has been more discussion since the concept of smart cities started getting publicity, but nearly all say that progress is still limited.

The good news is that all five of the opportunity areas get support from all the city managers I’ve talked with.  There is interest here, and perhaps even the beginning of a willingness to proceed.  What’s objectively lacking is the benefit case.  Sound familiar?

More Signs of a Maturing Model of the Cloud

In just the last week, we’ve had cloud-related announcements that seem to suggest a drive toward harmonizing cloud and data center around a single architecture.  Amazon has an alliance with VMware, Microsoft is further improving compatibility and synergy between Azure and its data center elements, Google is expanding its Nutanix relationship for data center harmony, and Oracle is touting its Cloud at Customer offering.  What’s up here?

First and foremost, virtually all cloud providers realize that moving applications to the cloud isn’t going to bring much cloud success.  The future of the cloud is applications that are developed to exploit the cloud, meaning new development.  Those applications, because they do focus on cloud-specific benefits, usually employ cloud services hosted by the provider, beyond simple IaaS and maybe some database stuff.  Thus, the public cloud has been gradually turning more “PaaS-like” in its software model.

The second issue is that exploiting the cloud doesn’t mean moving everything to it.  There are a bunch of good reasons why companies will drag their feet with cloudifying many elements of current and future applications.  The future cloud is a hybrid, in short.  But if that’s true, then how do you deal with the cloud-hosted features you’ve come to rely on, when the piece of application you’re looking at has to run in your own data center?

Microsoft, whose Azure PaaS platform always had a lot of affinity with its data center Windows Server stuff, has been quietly gaining traction as enterprises realize that in the cloud era, it’s really going to be about creating apps that are part-cloud and part-data-center.  With the advent of IoT and increased emphasis on event processing, a data center presence gave Microsoft a way of adding at-the-edge handling of short-control-loop event apps, and Amazon was forced to offer its Greengrass unload-to-the-customer edge strategy as a counterpoint.  All the other stuff I cited above continues this trend.

For all the interest in this kind of hybridization, there’s no real consensus on just what it requires in terms of features, and even on whether you achieve cloud/data-center unity by pushing pieces of data center features into the cloud, pulling cloud features into the data center, or both.  All of the current fusions of cloud and data center seem to be doing a little of both, preparing perhaps for the market to make its own requirements clear.

That may take a while.  The enterprises I’ve talked with believe that applications for the future hybrid cloud are emerging, but there’s a bit of chicken-and-egg tension happening.  It’s difficult for enterprises to commit to a strategy for which there’s no clear implementation consensus.  It’s difficult for that consensus to arise without somebody committing to something, and in decent numbers.  The vendors will probably have to take some initiative to drive things forward.

The Amazon/VMware deal is probably the one with the greatest potential to drive market change, given Amazon’s dominance in the public cloud.  Unfortunately, we don’t have anything more than rumor on what the deal includes at this point.  The story I’ve heard is that Amazon would provide a VMware-based hosting capability for many or all of the AWS web services it offers in the cloud.  This would roughly mirror the Azure Stack notion of Microsoft.

Next on my list of influence drivers is the Google deal with Nutanix, largely because it embodies a function transfer from data center to cloud and not the other way around.  Nutanix is best known as a VMware competitor in on-prem virtualization, the subject of a few spats with VMware over time.  If Google wants to create a functional hybrid with feature migration, they need to have a partner who is interested.  Amazon’s dealings with VMware have already created a bridge into AWS from VMware, so it makes sense for Google to start with that as well.

At the very least, all of this demonstrates that you can’t have “public cloud” as a polar opposite of the data center.  At the most, it suggests that the cloud and the data center have to be in a tight enough partnership to require feature-shifting among the two.  If that’s the case, then it impacts how we design applications and also how clouds and data centers interconnect at the network level.  Either of these impacts would probably delay widespread adoption of a highly symbiotic cloud/data center application model.

That seems to be what Google, at least, expects.  The first phase of their Nutanix deal, which lets apps migrate from the data center into Google’s cloud, isn’t supposed to be ready till next year.  However, remember that Google has a lot more edge-like resources in their public cloud than anyone else, and they also have lower latency among the various hosting points in the Google cloud.  Thus, they could satisfy edge-event-processing requirements more easily in their own cloud than most other cloud providers.

What about those changes to the application development process and the network connectivity between cloud and data center?  Let’s summarize those two issues in order.

The goal of “new” application designs should be to separate the flow of transactions so that critical data processing and storage steps will be toward the end of each flow, which can then be hosted in the data center.  The front-end processes that either don’t need to access repository data at all, or can access read-only versions, could then be cloud-hosted.  It’s also possible that front-end processes could use summary databases, or even forego database access.  For example, it might be possible to “signal” that a given commodity is in inventory in sufficient quantity to presume that transactions to purchase it can go through.  Should levels fall too low, the front-end could be “signaled” that it must now do a repository dip to determine whether there’s stock, which might move that application component back along the workflow into the data center.

On the network side, cloud computing today is most often connected as a remote application via the Internet.  This isn’t going to cut it for highly interreactive cloud components that live in the data center sometimes too.  The obvious requirement is to shift the cloud relationship with the VPN to one of total efficient membership.  In effect, a cloud would be treated as another data center, connected with “cloud DCI” facilities.  Components of applications in the cloud would be added to the VPN ad hoc, or would be hosted on a private IP address space that’s then NATed to the VPN space.

Google has the smartest approach to next-gen cloud platforms of anyone out there, in my view.  They have the smartest view of what a next-gen network looks like too.  Are they now, by taking their time in creating a strong data center hybrid strategy, risking the loss of enterprises because the next-gen applications and network models for a hybrid could be developed before Google is an effective player?  That could be an interesting question.

Also interesting is the question of whether there’s a connection between all of this and Juniper’s decision to pick up Bikash Koley, a well-known Google-networking expert who played a strong role in the development of Google’s network/SDN approach.  Might Juniper want to productize the Google model (which, by the way, is largely open)?  We’ll see.

One thing is for sure; the conception of the cloud is changing.  The new one, which is what the conception should have been all along, is realistic and could drive a trillion-dollar cloud market.  For the first time, we might see an actual shift in computing, away from the traditional model.  For vendors who thought that their problems with revenue growth were due to the cloud, this isn’t going to be good news.  The cloud is just getting started, and it’s going to bring about a lot of changes in computing, software, and networking.

What Ericsson is Signaling about the Networking Industry

According to Light Reading, a senior Ericsson exec doesn’t think that 5G will kickstart telecom spending.  Ericsson also issued a profit warning, causing its stock to take a big hit.  That this is even a surprise is hard for me to understand, frankly.  Telcos have been telling me for years that they couldn’t continue to invest in infrastructure with their profit per bit declines.  That means they’ll spend less on vendors.  Even the notion that 5G would save things is baseless; technology advances don’t improve profits just because they’re “advances”, and linking 5G to a business case has been challenging.

Don’t count out 5G, though.  The value of 5G is less in its ability to drive change as in its potential to focus it.  Most operators have planned for 5G evolution, to the point where advance funds are being set aside for spending as early as 2018, and growing through 2022.  One of the challenges in transformation is finding a vehicle to showcase early activity, because rarely do technology shifts create benefits as fast as they drive costs.  So, Ericsson is right in a sense, but perhaps missing an opportunity in another.

There are really only two product areas that are assured budgeting in the next five years—wireless and fiber (particularly metro).  We are going to see a lot of incremental spending in these areas even if we don’t see a specific technology transformation like 5G.  Vendors who have strong assets in either space have an inside track in presenting a broader strategy, one that could address the problem of declining profit per bit and the growing interest in a new model for networking.

In the 5G space, the near-term opportunity is metro fiber deployment aimed at enhancing FTTN deployments with RF tail circuits to replace copper.  The one application for 5G that operators really like, and that they’re prepared to invest in quickly, is that FTTN-tail mission.  Everyone in the telco space is concerned about the race for residential bandwidth, a race that cable companies with their CATV infrastructure are better prepared to run, at least in the downstream capacity sense.  FTTH, like Verizon’s FiOS, isn’t practical for more than about a third of households, even presuming better passive optical technology down the road.  5G RF tails on FTTN would be a good solution.

5G-FTTN would obviously drive up metro bandwidth needs, but it could also pull through at least some additional 5G features, like network slicing to separate wireline broadband from mobile use of the same remotes.  Slicing might also be useful for operators who want to offer IPTV separate from the Internet.  SDN could well be accelerated by 5G-FTTN too, to provide efficient connection between metro content cache points and customers.  Even NFV might benefit, particularly if the 5G-FTTN remotes were applied to business sites.

Fiber players have an even better shot, though.  At the easy level, lower-cost fiber capacity with greater resiliency and agility (via agile optics) could reduce operations costs directly by reducing the number of “actionable events” that the higher layers see.  The big and still unanswered question of fiber deployment is the extent to which fiber could change the way services relate to infrastructure.  Could you combine fiber and SDN to create electro-optical virtual wires that would separate services and even customers?  Could that reduce reliance on traditional L2/L3, and the need for security devices?

A combination of fiber, SDN/electrical virtual wires, and hosted switch/router instances could build virtually all business services and also frame a different model of broad public services like the Internet.  The result could be a significant reduction in L2/L3 capex and operations cost and complexity.  My model says that you could largely resolve the profit-per-bit problem for 20 years or more, simply by combining service lifecycle automation and virtual-wire-and-hosted-instance service-layer infrastructure.

All this frames what may be the real problem for Ericsson.  We have fiber players—Ciena, Infinera, ADVA.  We have mobile players, like Nokia.  Just what kind of player is Ericsson?  They don’t have a strong device-and-technology focus, which means that they don’t have a natural way of engaging with the buyer, a foothold technology that could be leveraged to bigger and better things.

Professional services are a great way to turn limited product offerings into broader engagements, but you have to be able to present a minimum product offering to take advantage of that.  If Ericsson stands for anything at the product level, it would probably have to be software, and yet they’re not known for that either.  Either they have to make themselves the first real “network software” company, or they have to spend a lot of marketing capital making a service-and-integration-based model into the centerpiece for the network of the future.

The same problem exists at various levels for the other vendors, of course.  You can think of optical networking as selling more fiber, without facing the overall shifts that would drive the buyer to consume it.  You can think of 5G as a dry set of standards whose adoption (presumably simply because they’re “newer” than 4G) will be automatic, and never see the business cases that you’ll somehow have to support.  In those cases, you’re stuck with a limited model of your business that can succeed only if none of your competitors do things better.

The biggest problem network vendors face is in the L2/L3 area, where people like Cisco and Juniper now live.  There is nothing ahead for L2/L3 technology except commoditization or replacement by a virtual-wire-and-hosting model.  Cisco has hosting capability, and I think they understand that they have to change their business model.  Juniper still rides the limited data center networking trend, because they’re small enough to live there.  Neither has really faced the longer-term reality yet, which is that you can’t support the end game of network infrastructure evolution if you don’t play in the deals that drive it.

We are, in networking, facing the most significant set of changes that have ever been presented, far more significant than the transformation from TDM to IP and the Internet.  We are rebuilding society, civilization, around network technology.  That this would create enormous opportunity is a given; that the network vendors will fail to recognize it isn’t yet a given, but we’re running out of time.  That’s what Ericsson proves, to themselves and to the rest of the industry.

The Tangled Web of OSS/BSS Modernization

I had an opportunity to chat with some insightful network operator CIO staff types, the topic being the near-and-dear one of “What should the future of OSS/BSS be?”  I’ve noted in some past blogs that there’s a surprising diversity of viewpoints here, ranging from the “throw the bums out!” model to one of gradual evolution.  There may also be an emerging consensus, at least on some key points.

OSS/BSS systems are the network operator equivalent of “core applications”, similar to demand deposit accounting (DDA, meaning retail banking) for banks or inventory management for retailers.  Like the other named applications, OSS/BSS emerged as a traditional transactional model, largely aimed at order management, resource management, and billing.

Three forces have been responsible for the changing view of OSS/BSS.  One is the desire of network operators to avoid being locked in to products from a single vendor.  Most early OSS/BSS systems were monolithic; you bought one and used all of it.  That was no more popular in the networking industry than lock-in has been for any other vertical.  The second is the increased desire for customer self-care and the need to support online portals to provide for it.  The final one is the combination of increased complexity in resource control and decreased complexity in billing.  We used to have call journaling and now we have one-price-unlimited calling.  We used to have fixed bandwidth allocation and now we have packet networks with almost nothing fixed in the whole infrastructure.

The reason these forces are important is that they’ve operated on the same OSS/BSS market but taken it in different directions.  The lock-in problem has led to a componentized model of operations, with at least some open interfaces and vendor substitution.  That doesn’t necessarily alter the relationship between OSS/BSS and the business of the operators.  The self-care issue has led to the building of front-end technology to generate what used to be customer-service transactions as direct-from-user ones.  This has likewise not altered fundamentals much.

It’s the third force that’s been responsible for most of the talk about changes to OSS/BSS.  As networks moved from simple TDM to complicated, multi-layered, packet, the process of “provisioning”, the meaning of a “service level agreement” and even what customers are billed for have all changed.  The new OSS/BSS vision is the result of these individual drives, and more.  But what is that vision?

If you listen to conferences and read the media sites, the answer is probably “event-driven”.  I think there’s merit to the approach, which says in effect that a modern operations process has to be able to respond to a lot of really complex stuff, ranging from changes in the condition of services based on shared resources (packet networks, server farms, etc.) to changes in the market environment and competition.  Each change, if considered an “event”, could drive an operations component to do something.

Event-driven OSS/BSS could also take componentization and elimination of lock-in to a new level.  Imagine a future where every OSS/BSS structure is fixed, meaning that the processes that align with each service state and event are defined.  You could buy a single process for best-of-breed ultimacy.  Imagine!

This is a big change, though.  The question my OSS/BSS pundits were struggling with is whether you really need an event-driven OSS/BSS at all, or whether you need to somehow shortstop events so they never impact operations.  Can the networks themselves manage their own events?  Can service composition and lifecycle management be separated from “events” and kept largely transactional?  Could we avoid lock-in by simply separating the OSS/BSS into a bunch of integrated applications?  It might all be possible.

The primary near-term issue, according to experts, is insulating the structure of OSS/BSS from the new complexities of virtualization.  Doing that is fairly straightforward architecturally; you define the network as a small number (perhaps only one) virtual devices that provide a traditional MIB-like linkage between the network infrastructure and the OSS/BSS.  Then you deal with the complexities of virtualization inside the virtual device itself.  This is applying the intent-model principle to OSS/BSS modernization.

My OSS/BSS contacts say that this approach is actually the default path that we’re on, at least in one sense.  The way that SDN and NFV are depicted as working with OSS/BSS presumes a traditional interface, they say.  The problem is that the rest of the requirement, namely that there be some network-management process that carries the load of virtualization, hasn’t been addressed effectively yet.

The second issue, so the OSS/BSS experts say, is the problem of silos at the operations application level.  Everyone wants to sell their own suite.  In theory, that could be addressed by having everyone comply with TMF specifications and interfaces, but the problem is more complicated than that.  In order for there to be interoperability among disjointed components, you have to obey common functional standards for the components (they have to do the same thing), a common data model, and common interface specifications.  You also have to sell the stuff on a component basis.  Operators say that none of these are fully supported today.

The logical way to deal with things is probably to define a repository model and presume that the applications all work with that repository in some way.  However, operators who want some specialized tuning of data structures to accommodate the way they offer services, bill for them, etc. might have some issues with a simple approach.

It’s fair to ask whether the TMF could do what’s needed here, and the answer you get from operators is mixed.  There is a general view that the TMF perhaps does too much, meaning that its models and structures go further than needed in standardizing operations software and databases, and by doing so limits utility and agility.  All of the experts I chatted with believed that the TMF specifications were too complicated, too.  Almost all of them said that OSS/BSS specifications needed to be open to all, and the majority said that there should be open-source implementations.

Which, most say, we might end up with, and fairly soon.  The challenges of virtualization have led to a displacement of formal standardization by open-source projects.  That same thing could happen for OSS/BSS, and the experts said that they believed the move to open-source in operations would naturally follow the success of an open model for virtualization and service lifecycle management inside that virtual-device intent model.  They point to ONAP as a logical place for this.

I’ve been involved in telecom operations for decades, and I’ve learned that there is nothing in networking as inertial as OSS/BSS.  A large minority of my experts (but still a minority) think that we should scrap the whole OSS/BSS model and simply integrate operations tasks with the service models of SDN and NFV orchestration.  That’s possible too, and we’ll have to wait to see if there’s a sign that this more radical approach—which would really be event-driven—will end up the default path.

Are The Multiple NFV MANO Candidates Helpful, or All Incomplete?

With the introduction of SK Telecom’s T-MANO into the mix, we have yet another promised improvement in the basic management and orchestration model for NFV.  In the past, I’ve tended to defend all these initiatives on the theory that somebody might end up getting it right.  Other than AT&T and Open-O with ONAP, I’ve not seen much progress in that direction, though.  In fact, as I’ve noted, it’s not clear whether ONAP is the “right” answer, meaning the answer that unlocks enough NFV benefits to actually drive significant deployment.

In my view as a programmer, software architect, and former director of software development commercially, there are two basic problems that implementations of NFV MANO have experienced.  Both arguably stem from limitations in the functional end-to-end model, though if you interpret the model as “functional” (as I’ve been assured it was intended to be), you could likely correct both.  A good implementation of NFV MANO would have to deal with these issues.

The first issue is taking an implied “serial” or sequential-step approach to the problem.  Functional models display the relationship between logical processes.  We have the NFV Orchestrator linked to Virtual Infrastructure Manager and to VNF Manager, for example.  It’s easy to interpret this as thinking the Orchestrator “calls” the other two, or is called by them, or both.  That, in fact, is generally how the model has been interpreted.

The problem with that is that deploying anything is either event-driven or serial/sequential in nature.  If the latter approach is chosen, because you’re following the ETSI E2E model literally, then you have to visualize these interactions (across the reference APIs) as being a “call” and a “response”.  The Orchestrator calls the VIM to deploy something, and the VIM responds with either a “you-got-it” or “it-failed”.  In the real world, this forces the calling process to wait for the result, or to keep checking back to see what the result was.  You can see that either of these options tends to tie up the calling process, which limits how much the calling process can do at once.

A solution to this, for at least some development, is “multi-threading”, which means that there can be multiple copies of a given process.  You can have several “Orchestrators” and “VIMs” in parallel.  Great in terms of increasing the number of things you can do at once, but this creates a problem when you’re assigning finite resources.  How do you insure that two threads don’t grab the same thing?

A second problem arises if, while you’re waiting for the hypothetical VIM to respond, and instead of that happening you get another request from somewhere else, like the OSS/BSS, or a condition arises in a piece of infrastructure you’ve already deployed.  You’re sitting waiting for your VIM, so how does this new condition get reported?  Most implementations simply hold the request (queue it, in programming terms) till the Orchestrator is free.  But what if the new condition means that you no longer can build the service as you envisioned?  The thing you’re waiting on the VIM for is now obsolete.

I wrestled with this almost a decade ago in the original ExperiaSphere project (done at the behest of some network operators who were involved with me in the TMF Service Delivery Framework work).  What I found was that controlling a service lifecycle even with multiple threads was very complicated.  We still see serialized single-thread pieces in OpenStack for the same reason.  You have to think in state/event terms when you have software that is intrinsically dependent on a bunch of “events” generated from asynchronous sources.  Every network fits that model, and every NFV implementation.

Event-driven software says that every multi-step process has a specific number of “states” it can be in, like waiting for a response from a VIM.  When the Orchestrator in an event-driven implementation asks the VIM for something, it enters a “waiting-for-VIM” state, and there awaits an event or a signal from the VIM.  Protocol handlers and protocol stacks, including that for IP, are traditionally implemented this way because you can’t predict what’s going to happen elsewhere.  There are at least two parties involved in all communications.

The big issue that serialized-versus-event-driven raises is scalability, particularly if a flood of issues arises out of some major thing like a trunk connection or data center failure.  If every request has to be handled in a single-threaded way, then there’s a high probability that remediation of any large-scale problem will involve a lot of time, far more than users would likely tolerate.

Issue number two is lack of an effective relationship between the virtual part of NFV and the real resources.  We still have people who read the ETSI End-to-End model and think there is a single virtual infrastructure manager, or that vendors would generate a VIM for their own equipment.  We still have people who think that you manage a virtual firewall by managing the software instance, which means that somehow the software instance has to be able to manage the real resources.  None of that is workable.

A single VIM can’t work because every server platform, every hypervisor, and every cloud stack has its own APIs and requirements.  Do we expect to see all of these implemented on one enormous VIM?  Then add in the now-recognized truth that we’ll also have to be able to manage some real network equipment, and you have a formula for a million silos.  We need multiple VIMs, and we need a way of linking them to a request for deployment.  See below for that point.

As far as managing software instances goes, meaning the VNF Manager, we have two problems.  One is that just as firewall devices all have their own management and configuration processes, so would the software versions of those devices.  That means every different firewall (or other VNF) would have to be paired with its own management tool.  Hardly a recipe for integration, onboarding, and interoperability.  What we need (and what I think SK Telecom is looking to provide), is a standard model for a management and configuration interface for any given device or virtual device type.  You then map a proprietary/specific interface to that standard.

The issue of how the real resources get managed is much harder.  The problem is that resource pools are inherently multi-tenant.  You can’t have every VNF running on pooled resources diddling with the behavior of those common resources, even indirectly.  That means that you have to be able to use the VIM as an intermediary, which is possible in the ETSI E2E model.  However, this complicates the issue of serial/sequential processing versus state/event, because the Orchestrator is also using the VIM and might actually be in the process of working with the same service!

The third issue is a lack of “models”, particularly intent models, to represent key elements.  If you’ve waded through the two prior issues, you realize that if service lifecycle management is a fully asynchronous system (which it is) and has to be state-event processed, then there has to be some way of recording the per-service data and state information.  In fact, if a service has multiple parts (access and WAN, east and west, US and EU, etc.) then each part has to have its own repository of data and state.  The TMF recognized this in its NGOSS Contract work, which proposed to use the service data model to steer events to processes.

The model-driven or “data model” approach has a lot of historicity, and the most recent innovation has been the introduction of a specific “intent model” which focuses on modeling elements of a service as functions described by what they are expected to do, not how they’re expected to do it.  There is nothing in the ETSI material on service modeling, and all of the implementations so far are also light at best in that area.  Even ECOMP lacks a comprehensive model picture.  That’s unfortunate, because the right approach to service modeling, coupled with the notion of the model as an event-steering or state/event hub, is critical to the success of any NFV implementation, and also in my view critical to operational transformation and cost reduction for networking overall.

One of the beauties of a service-model-driven, event-handling, process approach is that the processes themselves don’t save information within them, which means that any copy of software can operate on the service data model and obtain the same result.  In short, the software would be fully scalable (with the exception of pieces designed to serialize resource requests, such as some of the OpenStack stuff).  This points out that it’s very important to consider how resource-related stuff, which in ETSI NFV context would mean the VIM, has to be well-designed to allow for reliable resource control without sacrificing the ability to support multiple requests.

The challenge of NFV tests is that a limited service scope or deployment scope sets up a scenario where functional validation is possible but operational validation is not.  The three points I raise here are critical to the latter, and until we have implementations that address them, we don’t have a practical approach to NFV, to SDN, or to transformation overall.

How Far Might Streaming IP Video Go, and How Would it Get There?

There is little question that change is roiling through the video market.  The challenge is figuring out what might be changing, and what it’s changing to.  The choices seem fairly clear—we have an OTT-driven video option or a more traditional channelized cable-and-telco-TV or satellite option.  We’ve seen growing interest in streaming OTT video services as a “cut the cord” strategy, and Verizon has been (and is reportedly continuing to be) dabbling in the notion of OTT video.  Is OTT going to win?

Channelized video services have been a mainstay of profits for wireline providers.  The cablecos, of course, all offer that kind of service, and many telcos (Verizon and AT&T in the US) launched some form of telco TV.  The fact that AT&T has pulled back from wireline telco TV to satellite shows that there are challenges to the wireline service—not all customers can be supported because of the limits of the loop plant and the cost of running new fiber.  Even Verizon FiOS doesn’t cover everyone in Verizon’s territory.

Apart from the technical challenges of traditional video, we have the problem of increased OTT competition, first generated by the multi-screen user.  As mobile devices become the mainstay of at least many younger users, the expectation of viewing TV on these devices has increased.  This launched both a set of initiatives for traditional TV service providers to offer mobile/broadband addon services, and companies who are catering to the untethered masses.  Add to that the Netflix and Amazon services that offer large libraries of both TV shows and movies, and you see the reason for OTT growth.

Even the OTT momentum so far raises the question of whether channelized TV delivery is in for trouble, and we’ve not even gotten to the questions about networks going it OTT alone, rising franchise fees, and so forth.  The possibility that regulators might mandate unbundled channels seems to be falling, at least in the US, but that would add to the mix of issues.  So where are we, and where are we headed?

If mobile devices increase their dominance of consumer entertainment, there’s little doubt that streaming OTT video will grow further.  That in turn will accelerate the desire of the networks to deliver their own video, and that will move the whole industry toward an unbundled channel model.  The question is how fast mobile dominance of entertainment could increase.  Does it happen at the pace of generational aging, or would it advance faster for some reason?

One possible reason would be disenchantment with traditional TV, which is obviously growing.  The popularity of Amazon Prime Video and Netflix demonstrate that people don’t ask “What’s on next?” as often, and though there is market data that shows only a minimal decline in the number of hours of traditional TV viewing, there is still a decline.  There are more commercials on traditional TV than ever, and more incentive to record shows and then skip them.  All of this weans users away from the rigid channelized model, making them more amenable to an OTT option.

The biggest variable in a faster advance is the question of at what point the TV providers decide that it would be cheaper to simply move to an IP streaming model.  When that happens, “channelized TV” becomes only a kind of private version of OTT streaming video, which means that it would probably make sense for operators to offer it outside their own infrastructure.  That’s the biggest advantage of a streaming IP model for video, for those operators who already do channelized TV.  You can poach on a competitor’s turf, even ride on their infrastructure and disintermediate them.

OK, how about retarding forces?  We have had streaming video for a long time, and even though it’s hardly likely it would totally displace traditional TV without further stimulus, surely these forces would have pushed users to switch from channelized TV to OTT streaming in greater numbers than the data shows.  Sure, the data could be wrong/biased, but still….

One obvious factor is inertia, legal and otherwise.  People are locked into multi-year contracts for cable/telco/satellite TV in many cases.  Where they are not, they’re still fighting a trend of depending on traditional TV that goes back to their childhood.  They would have to figure out what channels they actually watch (everyone in the household), and then figure out who offers those channels as OTT streaming.  Probably nobody does, at least at this point.

Which is the second factor.  We have some OTT TV services that are actually real-time streaming of multi-channel video, but not nearly as many as we have “library” services that let you pick from archival episodes.  You have a risk of missing live broadcasts, and in the case of either news or sporting events, that’s a big deal.

I think this leaves us with a notion of the balance of forces in favor of OTT streaming, and I think that lets us assess how things might change over time.  My view is that the demand-and-behavior issues are unlikely to sweep traditional TV from the markets.  At the very least, satellites offer a very low per-user delivery cost potential for multi-channel viewing.  Market data from Wall Street suggests that when new offerings come along, they sway people on the fence, but the fact remains that hours of traditional viewing have not declined significantly up to now.  Something has to change that, beyond changes in consumer behavior.

We probably won’t see a regulatory mandate to offer single-channel a la carte viewing.  We probably won’t see cable companies abandoning their naturally multi-channel CATV benefits.  We certainly won’t see AT&T or Verizon or any global telco who has copper plant jumping onto a streaming offering, because the increased demand for available broadband capacity makes it difficult to achieve both multichannel IPTV viewing and broadband on a copper loop.  Technically it can be done, but is it worth the investment given that the broadband part can be exploited by OTT video players?

I think we are heading toward a decline in the importance of channelized TV.  I think that households with children will likely remain loyal to the model because it’s easier to support simultaneous multi-channel viewing with it.  I think that adults will increasingly move to on-demand viewing based on IP, and I think that networks will increasingly try to offer their video more broadly to OTT streaming players, or deliver it online themselves.

The wild card in all of this is the settlement payments that could come along if regulatory relief were offered.  In the US, the FCC is reversing its prior decision to classify broadband Internet as a telecommunications service and apply limits to paid prioritization and inter-provider settlement.  If the classification is reversed, it wouldn’t be necessary for the FCC to bar these practices; there’d be no legal basis for the FCC to regulate them.  Thus, OTTs might either have to pay for video delivery on a larger scale than today, pay for priority handling, or both.  That increase in revenue to the telcos might induce them to spend more on broadband and less on OTT video themselves, and that might shift things more in the OTT direction.

Watch what happens in this critical area.  If there’s going to be a radical change in the TV viewing market, this is the way it will likely develop.

The Relationship Between 5G and Edge Fiber

According to a recent report from Deloitte, “deep fiber” is critical to support the evolution to 5G.  There’s truth in this view—I think it’s clear that fiber is critical to 5G success.  The questions are whether “deep fiber” is the kind of fiber that’s critical, whether fiber is a sufficient guarantee of 5G, and what might get both fiber and 5G to where they need to be.

The first of the questions is two-part—is the term correct and is fiber critical.  The first is easy to answer based on the report itself.  Deloitte departs from normal practice by saying that “deep” means “toward the edge” when normally it’s taken to mean “toward the core”.  If we substitute some terms here, we could paraphrase the report by saying that “the United States is not as well prepared to take full advantage of the potential [of 5G], lacking needed fiber infrastructure close to the end customers, at the network’s edge.  The second, I think, is also easy.

Anything that proposes to increase access bandwidth significantly is going to have to rely on fiber closer to the network edge.  One of the elements of 5G that operators tell me is most interesting to them is the notion that 5G could be used in wireline deployments as a kind of tail connection to fiber-to-the-node systems.  In that role, it would replace copper/DSL.  Whatever happens to 5G overall, I am hearing that this element is going to move forward.

With a credible high-performance tail connection, FTTN deployment becomes a lot more sensible, and that would of course drive additional fiber deployment.  However, fiber to the prem (FTTP, usually in passive-optical network or PON form) is arguably the logical strategy for deployment of consumer and business broadband to any area where CATV cable is not already in place.  Even in some CATV-equipped areas, FTTH/PON might be required for competitive reasons (or not, as we’ll see).  Thus, edge fiber doesn’t depend on 5G as a driver, though unquestionably it would benefit.

However, edge fiber at the edge is a necessary condition for 5G.  Is it a sufficient condition, meaning that nothing else matters?  Probably yes in the limited sense of 5G as a tail circuit for FTTN, but not for all the rest.  In fact, it’s not clear whether 5G is really the driver here or just radio tails to FTTN.  It’s the fact that operators associate that mission with 5G that makes it a 5G mission, not the technical requirements.  That’s why the rest of 5G isn’t pulled through by the FTTN tail mission, and why we still need broader 5G drivers for the rest.

If all this is true (which I think it is), then it’s really the need to deploy more edge bandwidth—mobile and wireline—that’s the driver for more fiber.  Is that the only driver?  I don’t think so.  At the same time as we see an increased need for edge bandwidth, we also see a growing need for the deployers of that bandwidth to monetize that which they are doing.  That’s where carrier cloud, edge computing, and process interconnection come along—all topics of recent blogs.

Access deployment is dominated by consumer broadband.  Consumer broadband is dominated by asymmetrical bandwidth needs—more upstream toward the user than in the other direction.  Process interconnection tends to be symmetrical in terms of its requirements, and because latency in a process connection impacts QoE broadly, it’s more important to avoid it.  I think process interconnection will be a more significant force in fiber edge deployment than consumer broadband, and certainly both will be more than enough to drive a lot of new fiber at the network’s edge or very near to it—in the process edge.

The main point of the Deloitte report is one of the main report headings: “The current wireline industry construct does not incent enough fiber deployment.”  There, they have a point.  I’ve blogged a lot about the declining profit-per-bit problem, which means there’s a problem with return on infrastructure investment.  Even if you don’t believe my arguments, it’s hard to argue with the fact that network equipment vendors (except price-leader Huawei) have been facing difficult quarters revenue-wise.

Could opex savings from infrastructure modernization help?  The report notes that operations expenses are typically five to six times capex; this aligns fairly well with my surveys that show that on the average operators are spending about 18 cents of every revenue dollar on opex, another 18 cents returned as profits, and the remainder as operations and administration.  They suggest that modernization of legacy TDM networks, which are expensive to operate, has a lot to do with that.  My surveys don’t bear this out; only about 30 cents of every revenue dollar are associated with “process opex” meaning the cost of network operations and network-related administrative costs, and only about four and a half cents are pure network operations.  A TDM-to-packet transformation would therefore not impact much of the total OAM&P costs at all.

I’m a believer in reducing opex, and if you looked at the total process opex pie (30 cents per revenue dollar) and could reduce it by about half (which operators say is credible) you’d almost equal the total elimination of capex in terms of bottom-line impact.  The problem is that most of the savings come from service-level automation, not from improving network technology.  As a fiber driver, I don’t think modernizing out of TDM cuts the mustard.

Regulatory policy may hold the answer, according to the report, and I agree at the high level but disagree on what policies might help.  The report talks about fairly esoteric measures like encouraging cross-service deployments.  In a related section, it proposes improving monetization by encouraging OTT partnerships or even joint ownership of OTTs.  I think the answer is a combination of those points.  If you want operators to deploy more fiber, you make it profitable to do so.  If you want to make it profitable, you wring out operations costs through service-layer lifecycle automation, and you eliminate barriers to Internet settlement for QoS and traffic handling.

I think there’s a lot of good stuff in the report, but I also think it misses a major truth.  Any large-scale change in network infrastructure is going to require large-scale benefits to justify.  That’s true of edge fiber and it’s just as true of 5G.  We are forever past the phase of networking where a technology change can be seen as self-justifying, meaning it’s the “next generation”.

Fiber at/near the edge is, I think, a given because there are plenty of things that are driving it, and those things in turn have clear benefits.  5G is still proving itself on a broad scale, and it’s likely that its fusion with FTTN is going to be essential in making it a success.

How ONAP Could Transform Networking–or Not

A story that starts with the statement that a technology is “entering a new phase” is always interesting.  It’s not always compelling, or even true, but it’s at least interesting.  In the case of Network Functions Virtualization (NFV) the story is interesting, and it’s even true.  It remains to be seen whether it’s compelling.

NFV was launched as a specification group, a form of an international standards process.  Within ETSI the NFV ISG has done some good stuff, but it started its life as a many-faceted balancing act and that’s limited its utility.  Think operators versus vendors.  Think “standards” versus “software”.  Think “schedule” versus “scope”.  I’ve blogged on these points in the past, so there’s no need to repeat the points now.  What matters is the “new phase”.

Which is open-source software.  NFV had launched a number of open-source initiatives from the ISG work, but what has generated the new phase is the merger of one of these (the Open-O initiative) with AT&T’s ECOMP.  ECOMP mingles the AT&T initiatives toward an open and vendor-independent infrastructure with some insights into SDN, NFV, and even OSS/BSS.  The result is a software platform that is designed to do most of the service lifecycle management automation that we have needed from the first and were not getting through the “normal” market processes.

ECOMP, it’s clear now, is intended to be not only what the acronym suggests (“Enhanced Control, Orchestration, Management & Policy”) but more what the title of the new merged (under the Linux Foundation) initiative suggests, an Open Network Automation Platform.  I like this name because it seizes on the real goals of openness and automation.

I also like AT&T’s focusing of its venture arm on building new stuff on top of ONAP, and AT&T’s confidence in the resulting ecosystem.  In an article by Carol Wilson in Light Reading, Igal Elbaz, VP of Ecosystem and Innovation for AT&T Services says, “We believe [ONAP] is going to be the network operating system for the majority of the network operators out there.  If you build anything on top of our network from a services perspective, obviously you want to build on top of ONAP. But many operators are adopting all of a sudden this solution so you can create a ubiquitous solution that can touch a large number of customers and end users around the world.”

It’s that last notion that catapults NFV into its new age.  Some network operators, through support for open-source initiatives, have defined the glue that holds future network infrastructure and future services together.  Some of this involves virtual functions; more probably will involve some form of software-defined networking.  All of it could totally change the dynamic of both SDN and NFV, by creating an open model for the future network.  If ONAP can create it, of course.

The comment by AT&T’s Elbaz raises the most obvious question, which is that of general adoption of ONAP by network operators.  There is certainly widespread interest in ONAP; of the 54 operators I know to have active transformation projects underway, ONAP is considered a “candidate” for use by 25 off them.  That’s not Elbaz’s majority of operators, but it’s a darn good start.  I think we can assume that ONAP can reach the more-than-half goal, and likely surpass it.  It might well garner two-thirds to three-quarters of operators, in my view.

A related question is vendor support.  Obviously if a majority of network operators adopted ONAP, vendors would fall into line even if they were in tears as they stood there, which many might well be.  However, the only alternative to supporting ONAP would be rolling your own total service automation solution, which vendors have obviously not been linking up to do since NFV came along.  Would they change their tune now, with a competing open-source solution from and accepted by operators?  I don’t think so, and so I think that once ONAP really gets where it needs to be, it kills off not only other vendor options but any competing open strategies as well.

Where does ONAP need to get to, though?  I think the best answer to that is “where Google already is with their Google Cloud Platform”.  The good news for the ONAP folks is that Google has been totally open about GCP details, and has open-sourced much or even most of the key pieces.  The bad news is that GCP is a very different take on the network of the future, a take that first and foremost is not what launched ECOMP and ONAP, or even what launched NFV.  It may be very challenging to fit ONAP into a GCP model now.  Particularly given that GCP’s founding principle is that networks are just the plumbing that can mess up the good stuff.

Google’s network, as I’ve noted before, was designed to connect processes that are in turn composed to create experiences/applications.  Operators today are struggling to make business sense of pushing bits between endpoints in an age of declining revenue per bit.  Google never saw that as a business.  In fact, Google’s approach to “virtual private clouds” is to pull more and more cloud traffic onto Google’s network, to take even the traffic that’s today associated with inter-cloud connectivity off the Internet or an external VPN.  You could make a strong case for the notion that Google views public networking as nothing more than the access on-ramp that gets you to the closest Google peering point.

Google’s relationship with the Internet is something like this; everything Google does rides on Google’s network and is delivered to a peering point near the user.  GCP carries this model to cloud computing services.  Google also takes a lot of time managing the address spaces of cloud services and its own features.  User data planes are independent SDN networks, each having its own public (RFC 1918) address space.  Processes can also be associated with public IP addresses if they have to be exposed to interconnection.

Nothing of this sort is visible in the ECOMP/ONAP material, but it’s also true that nothing in the material would preclude following the model.  The big question is whether the bias of the ECOMP/ONAP model or architecture has framed the software in an inefficient way.  Google has planned everything around process hosting.  If process hosting is the way of the future, then NFV has to be done that way too.

The SDN and NFV initiatives started off as traditional standards-like processes.  It’s understandable that these kinds of activities would then not reflect the insight that software architects would bring—and did bring to Google, IMHO.  Now, with ONAP, we have another pathway to SDN and NFV, a path that takes us through software rather than through standards.  That doesn’t guarantee that we’ll see a successful implementation, but it does raise the prospects considerably.

We also have to look ahead to 5G, which as a traditional standards process has made the same sort of bottom-up mistakes that were made by those processes in the past.  We have a lot of statements about the symbiosis between 5G and SDN and NFV.  I’ve read through the work on 5G so far and while I can see how SDN or NFV might be used, I don’t see clear service opportunities or infrastructure efficiency benefits that are linked to any of those applications.  The big question might turn out to be whether AT&T or the others involved in ONAP can create a credible link between their work and credible 5G drivers.  Same with IoT.

A software revolution in networking is clearly indicated.  Nothing we know about future network services or infrastructure evolution harkens back to device connections and bit-pushing for differentiation.  We may be on the path for that software revolution—finally.  That would be good news if true, and it would be a disaster for the industry if it’s not.

The Economics Shaping Edge Computing

If event-handling and process hosting are the way of the near future, then (as I suggested last week) we would likely shift a lot of hosting and software development off traditional server platforms.  There are technical considerations here, of course (and I noted the key ones last week), but the primary issue is financial.

Event processing depends on two dimensions of event characteristics.  One is how often the event happens, and the second is how “valuable” in a commercial sense the event is.  Obviously both these characteristics are graded and not polar, but it’s helpful to think of the extremes, which in this case would mean event frequencies ranging from infrequent to very often, and event values ranging from negligible to significant.  The four combinations present great opportunities on one extreme, and nothing but cost and risk on the other.

Let’s start by looking at something we can dismiss quickly.  Low-value events that rarely happen probably don’t justify any event processing at all.  “An airplane flew across the face of the moon, relative to an observer in Iceland” might be one.  If you want to get IoT funding for that one, good luck!

Now let’s search for some relevant insight, starting with the most extreme on the negative side, very frequent events of little value.  A good example would be “cars passing an intersection”.  OK, you probably can think of things that could be done with this information, but if you think about a place like New York City and the number of cars that pass each of the intersections in the course of a single busy hour, and you have a formula for minimal ROI.

Where we have this situation, the goal is to prevent the “event” from ever getting out of local processing in discrete event form.  This would mean two things; that you want to use a cheap local technology to collect the events, and that you want to “summarize” the events in some way to reduce their frequency.  Cars per minute?  In a busy intersection that would probably reduce the events by several orders of magnitude.

Logically, the way to do this would be to have “agents” associated with groups of event sources.  The agents would use some cheap technology to collect events, and then a second cheap technology to summarize them in some way.  The agents would generate summary events (“Car Count for Intersection 5th and 42nd, 10:40 AM”).  If we needed only summaries based on time, you could do something like this with a custom chip, at a very low cost.

Something else would consume these summary events, of course, and since there are fewer such events and they’d not likely require very short response times, you could place these other processes deeper in the network.  In addition, it’s likely that the summary processes would be doing some kind of analytics, changing the nature of the process from strictly event-handling to something more “transactional”.  Keep that in mind!

At the other extreme?  That’s easy too—think of system failure or critical condition alerts.  There’s water in a tunnel, there’s smoke in a warehouse.  These happen rarely (hopefully) but when they do they could represent human lives and millions of dollars.  Not only that, each of these events (largely because of their value) could logically be seen as having a wide scope of interest and potential to trigger a lot of other processes.  Fire in a warehouse?  You might have to dispatch engines, but also activate an emergency plan for traffic lights, extending far outside the area of the fire or the path of the engines, to insure emergency vehicles associated with other incidents could still move.

This event type might well create what could be called a “tree” or “cascade”, the opposite of the aggregation that happened in our car-and-intersection example.    We’d want to provide a multicast mechanism, or publish-and-subscribe, to distribute this kind of event.  Each of the secondary recipients (after our primary processing) would then handle the event in a way appropriate to their interests.

The examples of dispatching engines or implementing an emergency light plan show that these events might well need fast response times, but there are other cascaded processes that could be more tolerant of delay.  I think it’s likely that many processes will require edge hosting, while others could tolerate deeper hosting.  All, however, are likely to be local in scope, since an emergency condition can’t be handled easily at a distance.  This high-value, limited-frequency stuff is thus a primary edge driver.

Now the in-between.  High-value events that happen more frequently would likely be traditional commercial transactions.  Think ATMs, mobile purchases, bank tellers, store register systems, automatic toll collection, and so forth.  Another class of high-value events would be contextual events associated with a social or location service, including and perhaps especially advertising.

Commercial transactions, huh?  That’s a theme common to the high-value-low-frequency stuff we covered earlier.  I think we can safely say that event-handling in general will have two distinct pieces—a front-end part where we’re either summarizing or cascading events and applying limited response processing to highly time-sensitive conditions, and a longer-cycle transactional back-end.

The cloud computing players like Amazon, Google, and Microsoft all see this combination-structure for event software, and all in fact show it clearly in some applications.  The front-end parts of the processes are “serverless” meaning that they’re designed to be spawned at need where needed, rather than assigned to a specific place.  That requirement, and the requirement that they be spawned quickly to be responsive, means that you have to avoid complicated new connection structures, access to distant resources, etc.

All of this shows that the heavy lifting on edge computing would have to be handled by the relatively valuable and infrequent events, and secondarily by perhaps expanding the traditional notion of “commercial” transactions to include social/location/advertising services.  It’s much harder to deploy and connect edge resources based on aggregations of low-value high-frequency stuff because you’d probably have to find a lot of applications and users to justify the deployment, and it’s hard to see how that happens without anything deployed.

It also shows that the cloud computing players like the three I already named have an advantage over the telcos and cablecos, even though the latter two have more edge-located real estate to leverage.  The software structure of event-handling is well-known to those cloud guys.  Google, at least, has explicitly evolved its own network to be efficient at process linking.  Amazon, of course, is addressing its own limited edge asset position through Greengrass, which lets users host event processes outside of (forward of) the cloud entirely.  Knowing what to do, not just where to do it, could be the deciding factor in the cloud.

Operators in their part have a tendency to see software the way they see devices.  If you look at SDN or (in particular) NFV design, you see the kind of monolithic boxes lined with APIs that are the exact opposite of the design for optimum event-handling systems.  Arguably a big reason for this is that operators tend to think of software elements that replace boxes—virtual functions replace physical functions.  But if the future is things like IoT, then it’s about events not data flows, and that’s where the operators need to do some thinking outside the box.