What the Court Ruling on the Net Neutrality Order Really Means

The DC Court of Appeals has upheld the FCC’s latest neutrality order, and as is nearly always the case with these regulatory things the move has created a combination of misinformation and apocalyptic warnings.  However, it’s always dangerous to think that these regulatory moves are just the usual political jostling.  The telecom industry has been driven more by regulation than by opportunity, and that may still be the case.

In the Court of Appeals decision, the court cited the troubled and tangled history of regulating consumer broadband and the Internet.  The FCC’s position evolved as the mechanism for connection evolved.  In the beginning, we accessed the Internet over telephone connections.  Today we access telephone services over the Internet, and this is the shift that more than anything justifies the FCC’s position that regulations should be changed to accommodate market reality.  The order in question here was the third attempt to do that—both previous attempts failed because the courts said the FCC didn’t have the statutory authority to regulate as it wanted since it had previously classified the Internet and broadband as information services.  That led to the latest order from the FCC, which classified them as common carrier utility services, and the court has now upheld that decision.  There are some important points behind this, though.

First, in telecom regulation the FCC is not a legislative body but what’s called a quasi-judicial agency, meaning that in these matters the FCC is equivalent to the court of fact.  An appeal from a court of fact cannot be made on facts, but must be made because there’s a possibility that the law was incorrectly applied.  Thus, the court did not agree with the points the FCC took in the order, just the FCC’s right to make those points.

Second, the fact that the FCC has declared broadband Internet to be a common carrier service doesn’t mean that all the burdens of common carrier regulation will be applied to it.  The Telecommunications Act of 1996, which forms the legal foundation for the FCC’s regulations, provides (in the famous Section 706) that the FCC can “forbear” from applying such regulations as needed to promote Internet availability.  The FCC has already said it’s doing that in the order in question.

Third, the issue could in theory be appealed to the Supreme Court and that could change the result.  There have been cases where the Supreme Court disagreed with Court of Appeals findings on telecom, and I’m not qualified to judge the fine details of these issues so I won’t comment on the chances of appeal or of success.  We’ll have to wait.  I do want to say that I’ve read every FCC order on the Internet and the text of every court ruling, and I think I have some understanding of the tone.  In this order, the DC Court of Appeals was pretty firm.

The end result is that an order that disappoints pretty much every group in some way has been upheld, and for years the ruling of the court is likely to remain law whether there’s an appeal or not.  They say that the sign of a good ruling is that everyone dislikes something about it, in which case this is a good one.  The question is where it leaves us.

First, the fact that there is equal treatment for mobile and fixed broadband in virtually all neutrality matters means that there’s no value to operators to shift users to a mobile access technology just to get more favorable regulatory treatment.  I think there will continue to be great operator interest in moving those wireline customers who can’t justify FTTH to fixed wireless simply for infrastructure economy, though.  The deciding issue is content delivery, because fixed wireless isn’t likely a good way to deliver streaming multi-channel video to an ever-more-demanding HDTV market.  Particularly competing with cable, which isn’t likely to change.

Second, the FCC is not going to force wireline or wireless ISPs to unbundle their assets so despite dire comments to the contrary, the fear of that isn’t likely to deter broadband investment.  It’s also not going to spawn another stupid CLEC-type notion that reselling somebody else’s infrastructure is “competition”.  The big problem with broadband investment is ROI, and that’s the next point.

Three, the big problem with the order was, and still is, the whole notion of paid prioritization and interconnect policy.  I understand as much as any Internet user the personal value of having streaming video available to me at the lowest possible cost—free if possible.  I also understand, as a long-standing industry/regulatory analyst, that something that drives up traffic without driving up revenue is going to put pressure on return on infrastructure investment.  Having settlement for content peering and having paid prioritization could make broadband Internet access more responsive to traffic cost trends.  The order removes that possibility.

So what does this all add up to?  There were really two parts to the Neutrality Order and we have to look at them independently.

The first part is largely what the Court ruled on—the authority part.  The FCC has finally done what it had to do (and should have done from the first) by asserting that broadband Internet was a telecommunications service, which gives it the authority to regulate it fully.  The FCC seems to have navigated the “forbearance” point well, so the risks it’s created to the market by the way it established its authority is minimal.

The second part is the policy part, which is what the FCC intends to do with its authority.  Here’s where I personally parted with the Commission on the Order and where I remain at odds.  The Neutrality Order that was promulgated by the Genachowski FCC (the current one is the Wheeler FCC) was in my view more sensible in that it seemed to explicitly allow the FCC to address abuses of either settlement-based interconnection or paid prioritization, but didn’t foreclose either option.

The important point here is that there was a policy difference between the orders, and that nobody really said that the FCC couldn’t change its approach even though the reason for reversing prior orders was in the “authority” area.  The FCC is not completely bound by its own precedent; it can adapt the rulings to the state of the market and it’s done that many times in the past.  Thus, the current Neutrality Order could still be altered down the line.  A new administration, no matter which party comes to power, usually ends up with a new FCC Chairman, and that new Chairman might well decide to return to Genachowski’s views, or formulate something totally different from the views of either of the two prior Chairmen.

And, of course, there’s Congress.  The parties, not surprisingly, are split on the best policies, with Democrats favoring a consumeristic-and-OTT-driven vision and Republicans favoring network operators.  However, since 1996 there have been many Congressional hearings on Internet policy and precious little in the way of legislation (most would say “none”).  We may well see the usual political posturing on the fact that the Court didn’t reverse the FCC, but we’ll probably not see Congress really act.

There are new revenue opportunities besides things like settlement or paid prioritization, but they are either targeted at only business services (which isn’t where profit-per-bit is plummeting) or they’re above the connection layer of the network, where OTT players compete.  The FCC order largely forecloses a revenue-based path out of the profit dilemma, which leaves us only with cost management.  Going that way is going to continue pressure on network vendors and ultimately risks curtailing network expansion.

What the Network Operator Literati Think Should Be Done to Accelerate NFV

I am always trying to explore issues that could impact network transformation, especially relating to adopting NFV.  NFV offers a potentially radical shift in capex and architecture, after all.  I had a couple emails in response to some of my prior blogs that have stimulated me to think of the problem from a different angle.  What’s the biggest issue for operators?  According to them, it’s “openness”.  What are the barriers to achieving that?  That’s a hard topic to survey because not everyone has a useful response, so I’ve gathered some insight from what I call the “literati”, people who are unusually insightful about the technical issues of transformation.

The literati aren’t C-level executives or even VPs and so they don’t directly set policy, but they are a group who have convinced me that they’ve looked long and hard at the technical issues and business challenges of NFV.  Their view may not be decisive, but it’s certainly informed.

According to the literati, the issues of orchestration and management are important but also have demonstrated solutions.  The number of fully operational, productized, solutions ranges from five to eight depending on who you talk with, but the point is that these people believe that we have already solved the problems there, we just need to apply what we have effectively.  That’s not true in other areas, though.  The literati think we’ve focused so much on “orchestration” we’ve forgotten to make things orchestrable.

NFV is an interplay of three as-a-service capabilities, according to the literati.  One is hosting as a service to deploy virtual functions, one is connection as a service to build the inter-function connectivity and then tie everything to network endpoints for delivery, and one is function as a service which relates to implementations of network functions with virtual network functions (VNFs).  The common problem with these things is that we don’t base them on a master functional model for each service/function, so let’s take the three elements in the order I introduced them to see how that could be done.

All hosting solutions, no matter what the hardware platform is, or hypervisor, or whether we’re VMs or containers, should be represented as a single abstract HaaS model.  The goal of this model is to provide a point of convergence between diverse implementations of hosting from below and the composition of hosting into orchestrable service models above.  That creates an open point where different technologies and implementations can be combined, a kind of a buffer zone.  According to the literati, we should be able to define a service in terms of virtual functions and then, in essence, say “DEPLOY!” and have the orchestration of the deployment and lifecycle management then harmonize to a single model no matter what actual infrastructure gets selected.

Connection-as-a-service, or NaaS if you prefer, is similar.  The goal has to be to present a single NaaS abstraction that gets instantiated on whatever happens to be there.  This is particularly important for network connectivity services because we’re going to be dealing with infrastructure that evolves at some uncontrollable and probably highly variable pace, and we don’t want service definitions to have to reflect the continuous state of transition.  One abstraction fits all.

The common issue that these two requirements address is that of “brittleness”.  Service definitions, however you actually model them in structure/language terms, have to describe the transition from an order to a deployment to the operational state, and other lifecycle phases that then are involved with maintaining that state.  The service-level stuff, if it has to reference specific deployment and connection technology, would have to be changed whenever that technology changed, and if new technologies like SDN and NFV were deployed randomly across infrastructure as they matured, it’s possible that every service definition would have to be multifaceted to reflect how the deployment/management rules would change depending on where the service was offered.

The as-a-service goal says that if you have an abstraction to represent hosting or connection, you can require that vendors who supply equipment supply the necessary software (a “Virtual Infrastructure Manager” for example, in ETSI ISG terms) to rationalize their products to the abstractions of the as-a-service elements their stuff is intended to support.  Now services are insulated from changes in resources.

The literati say that this approach could be inferred from the ETSI material but it’s not clearly mandated, nor are the necessary abstractions defined.  That means that any higher-level orchestration process and model would have to be customized to resources, which is not a very “open” situation.

On the VNF side we have a similar problem with a different manifestation.  Everyone hears, or reads, constantly about the problem of VNF onboarding, meaning the process of taking software and making it into a virtual function that NFV orchestration and management can deploy and sustain.  The difficulty, says the literati, is that the goal is undefined in a technical sense.  If we have two implementations of a “firewall” function, we can almost be sure that each will have a different onboarding requirement.  Thus, even if we have multiple products, we don’t have an open market opportunity to use them.

What my contacts say should have been done, and still could be done, is that virtual functions should be divided into function classes, like “Firewall”, and each class should then have an associated abstraction—a model.  The onboarding process would then begin by having the associated software vendor (or somebody) harmonize the software with the relevant function class model.  Once that is done, then any service models that reference that function class would deploy the set of deployment instructions/steps that the model decomposed to—no matter what software was actually used.

The problem here is that while we have a software element in NFV that is at least loosely associated with this abstract-to-real translation (though it lacks the rigorous model definitions needed and a full lifecycle management feature set built into the abstraction) we have nothing like that on the VNF side.  The closest thing we have is the notion of the specialized VNF Manager (VNFM), but while this component could in theory be tasked with making all VNFs of a function class look the same to the rest of NFV software, it isn’t tasked that way now.

There are similarities between the view of the literati and my own, but they’re not completely congruent.  I favor a single modeling language and orchestration approach from the top to the bottom, and the literati believe that there’s nothing whatsoever wrong with having different models and orchestration at the service layer and to decompose the abstractions I’ve been talking about.  They also tend to put service modeling up in the OSS/BSS and the model decomposition outside/below it, where I tend to integrate both OSS/BSS and network operations into a common model set.  But even in these areas, I think I’ve indicated that I can see the case for the other approach.

One point the literati and I agree on is that orchestration and software automation of service processes is the fundamental goal here, not infrastructure change.  Most of them don’t believe that servers and hosting will account for more than about 25% of infrastructure spending even in the long run.  They believe that if opex could be improved through automation, some of the spending pressure (that for example resulted in a Cisco downgrade on Wall Street) would be relieved.  SDN and NFV, they say, aren’t responsible for less spending—the profit compression of the converging cost/price per bit curve is doing that.  The literati think that the as-a-service abstraction of connection resources would let operators gain cost management benefits without massive changes in infrastructure, but it would then lead in those changes where they make sense.

It seems to me that no matter who I talk with in network operator organizations, they end up at the same place but by taking different routes.  I guess I think that I’ve done that too.  The point, though, is that there is a profit issue to be addressed that is suppressing network spending and shifting power to price leaders like Huawei.  Everyone seems to think that service automation is the solution, but they’re all seeing the specific path to achieving it in a different light.  Perhaps it’s the responsibility of vendors here to create some momentum.

What Microsoft’s LinkedIn Deal Could Mean

Microsoft announced it was acquiring business social network giant LinkedIn and the Street took that as positive for LinkedIn and negative for Microsoft.  There are a lot of ways of looking at the deal, including that Microsoft like Verizon wants a piece of the OTT and advertising-sponsored service market.  It seems more likely that there’s more direct symbiosis, particularly if you read Microsoft’s own release on the deal.

LinkedIn, which I post on pretty much every day, is a good site for business prospecting, collaboration, and communication.  It’s not perfect, as many who like me have tried to run groups on it, but it’s certainly the winner in terms of engagement opportunity.  There are a lot of useful exchanges on LinkedIn, and it borders on being a B2B collaboration site without too much of the personal-social dimension that makes sites like Facebook frustrating to many who have purely business interests.

Microsoft has been trying to get into social networking for a long time, and rival Google has as well, with the latter launching its own Google+ platform to compete with Facebook.  There have been recent rumors that Google will scrap the whole thing as a disappointment, or perhaps reframe the service more along LinkedIn lines, and that might be a starting point in understanding Microsoft’s motives.

Google’s Docs offerings have created cloud-hosted competition for Microsoft, competition that could intensify if Google were to build in strong collaborative tools.  Google also has cloud computing in something more like PaaS than IaaS form, and that competes with Microsoft’s Azure cloud.  It’s illuminating, then, that Microsoft’s release on the deal says “Together we can accelerate the growth of LinkedIn, as well as Microsoft Office 365 and Dynamics as we seek to empower every person and organization on the planet.”

Microsoft’s Office franchise is critical to the company, perhaps as much as Windows is.  Over time, like other software companies, Microsoft has been working to evolve Office to a subscription model and to integrate it more with cloud computing.  The business version of Office 365 can be used with hosted exchange and SharePoint services.  Many people, me included, believe that Microsoft would like to tie Office not only to its cloud storage service (OneDrive) but also to its Azure cloud computing platform.

Microsoft Dynamics is a somewhat venerable CRM/ERP business suite that’s been sold through resellers, and over the years Microsoft has been slow to upgrade the software and quick to let its resellers and developers customize and expand it, to the point where direct customers for Dynamics are fairly rare.  There have also been rumors that Microsoft would like to marry Dynamics to Azure and create a SaaS version of the applications.  These would still be sold through and enhanced by resellers and developers, targeting primarily the SMB space but also in some cases competing with Salesforce.

Seen in this light, a LinkedIn deal could be two things at once.  One is a way of making sure Google doesn’t buy the property, creating a major headache for Microsoft’s cloud-and-collaboration plans, and the other is a way to cement all these somewhat diverse trends into a nice attractive unified package.  LinkedIn could be driven “in-company” as a tool for business collaboration, and Microsoft’s products could then tie to it.  It could also be expanded with Microsoft products to be a B2B platform, rivaling Salesforce in scope and integrating and enhancing Microsoft’s Azure.

Achieving all this wondrous stuff would have been easier a couple years ago, frankly.  The LinkedIn community is going to be very sensitive to crude attempts to shill Microsoft products by linking them with LinkedIn features.  Such a move could reinvigorate Google+ and give it a specific mission in the business space, or justify Google’s simply branding a similar platform for business.  However, there is no question that there is value in adding in real-time collaboration, calling using Skype, and other Microsoft capabilities.

The thing that I think will be the most interesting and perhaps decisive element of the deal is how Microsoft plays Dynamics.  We have never had a software application set that was designed for developers and resellers to enhance and was then migrated to be essentially hybrid-cloud hosted.  Remember that Azure mirrors Microsoft’s Windows Server platform tools, so what integrates with it could easily integrate with both sides of a hybrid cloud and migrate seamlessly between the two.  Microsoft could make Dynamics into a poster child for why Azure is a good cloud platform, in face.

Office in general, and Office 365 in particular, also offer some interesting opportunities.  Obviously Outlook and Skype have been increasingly cloud-integrated, and you can see how those capabilities could be exploited in LinkedIn to enhance group postings and extend groups to represent private collaborative enclaves.  Already new versions of Office will let you send a link to a OneDrive file instead of actually attaching it, and convey edit rights as needed to the recipient.

So why doesn’t the Street like this for Microsoft, to the point where the company’s bond rating is now subject to review?  It’s a heck of a lot of cash to put out, but more than that is the fact that Microsoft doesn’t have exactly an impressive record with acquisitions.  This kind of deal is delicate not only for what it could do to hurt LinkedIn, but what it could do to hurt Microsoft.  Do this wrong and you tarnish Office, Azure, and Dynamics and that would be a total disaster.

The smart move for Microsoft would be to add in-company extensions to LinkedIn and then extend the extensions to B2B carefully.   That way, the details of the integration would be worked out before any visible changes to LinkedIn, and it’s reasonable to assume that B2B collaboration is going to evolve from in-company collaboration because it could first extend to close business partners and move on to clients, etc.

From a technology perspective this could be interesting too.  Integrating a bunch of tools into a collaborative platform seems almost tailor-made for microservices.  Microsoft has been a supporter of that approach for some time, and its documentation on microservices in both Azure and its developer program is very strong.  However, collaboration is an example of a place where just saying “microservices” isn’t enough.  Some microservices are going to be highly integrated with a given task, and thus things you’d probably want to run almost locally to the user, while others are more rarely accessed and could be centralized.  The distribution could change from user to user, which seems to demand an architecture that can instantiate a service depending on usage without requiring that the developer worry about that as an issue.  That could favor a PaaS-hybrid cloud like that of Microsoft.

This is also a pretty darn good model of what a “new service” might look like, what NFV and SDN should be aiming to support.  Network operators who are looking at platforms for new revenue have to frame their search around some feasible services that can drive user spending before they worry too much about platforms.  This deal might help do that.

Perhaps the most significant theme here is productivity enhancement, though.  We have always depended as an industry on developments that allow tech to drive a leap forward in productivity.  That’s what has created the IT spending waves of the past, and what has been lacking in the market since 2001.  Could this be a way of getting it all back?  Darn straight, if it works, and we’ll just have to wait to see what Microsoft does next.

Server Architectures for the Cloud and NFV Aren’t as “Commercial” as We Think

Complexity is often the enemy of revolution because things that are simple enough to grasp quickly get better coverage and wider appreciation.  A good example is the way we talk about hosting virtual service elements on “COTS” meaning “commercial off-the-shelf-servers”.  From the term and its usage, you’d think there was a single model of server, a single set of capabilities.  That’s not likely to be true at all, and the truth could have some interesting consequences.

To understand hosting requirements for virtualized features or network elements, you have to start by separating them into data-plane services or signaling-plane services.  Data-plane services are directly in the data path, and they include not only switches/routers but also things like firewalls or encryption services that have to operate on every packet.  Signaling plane services operate on control packets or higher-layer packets that represent exchanges of network information.  There are obviously a lot less of these than the data-plane packets that carry information.

In the data plane, the paramount hosting requirements include high enough throughput to insure that you can handle the load of all the connections at once, low process latency to insure you don’t introduce a lot of network delay, and high intrinsic reliability because you can’t fail over without creating a protracted service impact.

If you looked at a box ideal for the data plane mission, you’d see a high-throughput backplane to transfer packets between network adapters, high memory bandwidth, CPU requirements set entirely by the load that the switching of network packets would impose, and relatively modest disk I/O requirements.  Given that “COTS” is typically optimized for disk I/O and heavy computational load, this is actually quite a different box.  You’d want all of the data-plane acceleration capabilities out there, in both hardware and software.

Network adapter and data-plane throughput efficiency might not be enough.  Most network appliances (switches and routers) will use special hardware features like content-addressable memory to quickly process packet headers and determine the next hop to take (meaning which trunk to exit on).  Conventional CPU and memory technology could take a lot longer, and if the size of the forwarding table is large enough then you might want to have a CAM board or some special processor to assist in the lookup.  Otherwise network latency could be increased enough to impact some applications.

The reliability issue is probably the one that gets misunderstood most.  We think in terms of having failover as the alternative to reliable hardware in the cloud, and that might be true for conventional transactional applications.  For data switching, the obvious problem is that the time required to spin up an alternative image and make the necessary network connections to put it into the data path to replace a failure would certainly be noticed.  Because the fault would probably be detected by a higher level, it’s possible that adaptive recovery at that level might be initiated, which could then collide with efforts to replace the failed image.  The longer the failure the bigger the risk of cross-purpose recovery.  Thus, these boxes probably do have to be five-nines, and you could argue for even higher availability too.

Horizontal scaling is less likely to be useful for data-plane applications for three reasons.  First, it’s difficult to introduce a parallel path in the data plane because you have to introduce path separation and combination features that could cause disruption just because you temporarily break the connection.  Second, you’ll end up with out-of-order delivery in almost every case, and not all packet processing will reorder packets.  Third, your performance limitations are more likely to be on the access or connection side, and unless you paralleled the whole data path you’ve not accomplished much.

The final point in server design for data plane service applications is the need to deliver uniform performance under load.  I’ve seen demos of some COTS servers in multi-trunk data plane applications, and the problem you run into is that performance differs sharply between low and high load levels.  That means that a server that’s assigned to run more VMs is going to degrade everything, which means you can’t run multiple VMs and adhere to stringent SLAs.

The signaling-plane stuff is very different.  Control packets and management packets are relatively rare in a flow, and unlike data packets that essentially demand a uniform process—“Forward me!”—the signaling packets may spawn a fairly intensive process.  In many cases there will even be a requirement to access a database, as you’d see in mobile/IMS and EPC control-plane processing.  These processes are much more like classic COTS applications.

You don’t need as high hardware reliability in the signaling-plane services because you can spawn a new copy more easily, and you can also load-balance these services without interruption.  You don’t need as much data-plane acceleration unless you plan on doing a lot of different signaling applications on a single server, because the signaling packet load is smaller.

Signaling-plane services are also good candidates for containers versus virtual machines.  It’s easier to see data-plane services being VM-hosted because of their greater performance needs and their relatively static resource commitments.  Signaling-plane stuff needs less and runs less, and in some cases the requirements of the signaling plane are even web-like or transactional.

This combination of data and signaling plane requirements makes resource deployment more complicated.  A single resource pool designed for data-plane services could pose higher costs in signaling-plane applications because they need less resources.  Obviously a signaling-plane resource is sub-optimal in the data plane.  If the resource pool is divided up by service type, then it’s not uniform and thus not as efficient as it could be.

You also create more complexity in deployment because every application or virtual function has to be aligned with the right hosting paradigm, and the latency and cost of connection has to be managed in parallel with the hosting needs.  This doesn’t mean that the task is impossible; the truth is that the ETSI ISG is already considering more factors in hosting VNFs than would likely pay back in performance or reliability.

It seems to me that the most likely impact of these data-plane versus signaling-plane issues would be the creation of two distinct resource pools and deployment environments, one designed to be high-performance and support static commitments, and one to be highly dynamic and scalable—more like what we tend to think of when we think of cloud or NFV.

The notion of COTS hosting everything isn’t reasonable unless we define “COTS” very loosely.  The mission for servers in both cloud computing and NFV varies widely, and optimizing both opex and capex demands we don’t try to make one size fit all.  Thus, simple web-server technology, even the stuff that’s considered in the Open Compute Project, isn’t going to be the right answer for all applications, and we need to accept that up front and plan accordingly.

“The Machine” and the Impact of a New Compute Model on Networking

The dominant compute model of today is based on the IBM PC, a system whose base configuration when announced didn’t even include floppy disk drives.  It would seem that all the changes in computing and networking would drive a different approach, right?  Well, about eight years ago, HPE (then HP Labs) proposed what it called “The Machine”, which is a new computer architecture based on a development that makes non-volatile memory (NVM) both fast and inexpensive.  Combine this with multi-core CPUs and optical coupling of elements and you have a kind of “computer for all time”.

Today, while we have solid-state disks, the performance of the NVM is far slower than traditional memory, which means that you still have to consider a two-tier storage model (memory and disk).  With the new paradigm NVM would be fast enough to support traditional memory missions and of course be a lot faster for flash/rotating media missions.  It’s fair to ask what the implications could be for networking, but getting the answer will require an exploration of the scope of changes The Machine might generate for IT.

One point that should be raised is that there aren’t necessarily any profound changes at all.  Right now we have three memory/storage options out there—rotating media, flash, and standard DRAM-style volatile memory.  If we assumed that the new memory technology was as fast as traditional volatile memory (which HPE’s material suggests is the case) then the way it would likely be applied would be cost-driven, meaning it would depend on its price relative to DRAM, flash, and rotating media.

Let’s take a best-case scenario—the new technology is as cheap as rotating media on a per-terabit basis.  If that were the case, then the likely result would be that rotating media, flash, and DRAM would all be displaced.  That would surely be a compute-architecture revolution.  As the price rises relative to the rotating/flash/DRAM trio, we’d see a price/speed-driven transition of some of the three media types to the new model. At the other extreme, if the new model were really expensive (significantly more than DRAM), it would likely be used only where the benefits of NVM that works at DRAM speed are quite significant.  Right now we don’t know that the price of the stuff will be, so to assess its impact I’ll take the best-case assumption.

If memory and storage are one, then it makes sense to assume that operating systems, middleware, and software development would all change with respect to how they use both memory and storage.  Instead of the explicit separation we see today (which is often extended with flash NVM into storage tiers) we’d be able to look at memory/storage as being seamless, perhaps addressed by a single petabyte address space.  File systems and “records” are now like templates and variables.  Or vice versa, or they’re both supported by a new enveloping model.

One obvious benefit of this in cloud computing and NFV is that the time it takes to load a function/component would be shorter.  That means you could spin up a VNF or component faster and be more responsive to dynamic changes.  Of course, “dynamic changes” means you’d also be spin up an instance of a component faster.

The new-instance point has interesting consequences in software development and cloud/NFV deployments.  What happens today when you want to instantiate some component or VNF?  You read a new copy from disk into memory.  If memory and disk are the same thing, in effect, you could still do that and it would be faster than rotating media or flash, but wouldn’t it make sense just to use the same copy?

Not possible, you think?  Well, back in the ‘60s and ‘70s when IBM introduced the first true mainframe (The System 360) and programming tools for it, they recognized that a software element could have three modes—refreshable, serially reusable, and reentrant.  Software that is refreshable needs a new copy to create a new instance.  If a component is serially reusable it can be restarted with new data without being refreshed, providing that it’s done executing the first request.  If it’s reentrant, then it can be running several requests at the same time.  If we had memory/storage equivalence, it could push the industry to focus increasingly on developing reentrant components.  That concept still exists in modern programming languages, by the way.

There are always definitional disputes in technology, but let me risk one by saying that in general a reentrant component is a stateless component and statelessness is a requirement for RESTful interfaces in good programming practice.  That means that nothing used as data by the component is contained in the component itself; the variable or data space is passed to the component.  Good software practices in creating microservices, a hot trend in the industry, would tend to generate RESTful interfaces and thus require reentrant code.  Thus, we could say that The Machine, with a seamless storage/memory equivalence, could promote microservice-based componentization of applications and VNFs.

Another interesting impact is in the distribution of “storage” when memory and storage are seamless.  We have distributed databases now, clusters of stuff, DBaaS, and cloud storage and database technologies.  Obviously all of that could be made to work as it does today with a seamless memory/storage architecture, but the extension/distribution process would break the seamlessness property.  Memory has low access latency, so if you network-distribute some of the “new memory” technology you’d have to know it was distributed and not use it where “real” memory performance was expected.

One way to mitigate this problem is to couple the distributed elements better.  HPE says The Machine will include new optical component coupling.  Could that new coupling be extended via DCI?  Yes, the speed of light would introduce latency issues that can’t be addressed unless you don’t believe Einstein, but you could surely make things better with fast DCI, and widespread adoption of the seamless memory/storage architecture would thus promote fast DCI.

The DCI implications of this could be profound for networking, of course, and in particular for cloud computing and NFV.  Also potentially profound is the need to support a different programming paradigm to facilitate either reentrancy/statelessness or REST/microservice development.  Most programming languages will support this, but many current applications/components aren’t reentrant/RESTful, and for virtual network functions it’s difficult to know whether software translated from a physical device could easily be adapted to this.  And if management of distributed seamless memory/storage is added as a requirement, virtually all software would have to be altered.

On the plus side, an architecture like this could be a windfall for many distributed applications and for something like IoT.  Properly framed, The Machine could be so powerful an IoT platform that deals like the one HPE made with GE Digital (Predix, see my blog yesterday) might be very smart for HPE, smart enough that the company might not want to step on partnership deals by fielding its own offering.

The cloud providers could also benefit mightily from this model.  Platforms with seamless memory/storage would be, at least at first, highly differentiating.  Cloud services to facilitate the use of distributed seamless memory/storage arrays would also be highly differentiating (and profitable).  Finally, what I’ve been calling “platform services” that extend a basic IaaS model into PaaS or expand PaaS platform capabilities could use this model to improve performance.  These services are then a new revenue source for cloud providers.

If we presumed software would be designed for this distributed memory/storage unity, then we’d need to completely rethink issues like placement of components and even workflow and load balancing.  If the model makes microservices practical, it might even create a new programming model that’s based on function assembly rather than writing code.  It would certainly pressure developers to think more in functional terms, which could accelerate a shift in programming practices we already see called “functional programming”.  An attribute of functional programming is the elimination of “side effects” that could limit RESTful/reentrant development, by the way.

Some of the stuff needed for the new architecture are being made available by HPE as development tools, but they seem to want to make as much of the process open-source-driven as they can.  That’s logical, providing that HPE insures that the open-source community focuses on the key set of issues and does so in a logical way.  Otherwise it will be difficult to develop early utility for The Machine, and there will be a sensitivity to price trends over time if pricing factors can be expected to change the way the new memory model is used, because these changes could then impact programming practices.

A final interesting possibility raised by the new technology is taking a leaf from the Cisco/IBM IoT deal.  Suppose that you were to load up routers and switches with this kind of memory/storage and build a vast distributed, coupled, framework?  Add in some multi-core processors and you have a completely different model of a cloud or vCPE, a fully distributed storage/compute web.  Might something like that be possible?  Like other advances I’ve noted here, it’s going to depend on price/performance for the new technology, and we’ll just have to see how that evolves.

IoT is Creeping Toward a Logical Value Proposition!

All too often in technology, we see concepts with real possibilities go stupid on us.  Lately many of the key concepts have doubled down on stupidness, departing so far from relevant market benefits that there’s little hope of success.  IoT, probably the most hyped of all technology concepts in recent times, has surely had its own excursions into the realm of stupid, but unlike many tech notions it seems to have a chance of escaping back to reality.  You can see useful points being recognized, and it’s not too late for IoT to realize its potential.

The notion of a literal “Internet of Things” where all manner of sensors and controllers are put online to be accessed, exploited, and probably hacked by all, is one of the dumber excursions from a value perspective.  The notion that an IoT strategy is a strategy to manage the devices themselves isn’t any better.  From the first, it should have been clear that IoT is a big-data, analytics, and cloud application, and that it has to first exploit those sensors already deployed on largely private networks, often using local non-IP protocols.  Now we’re seeing signs of a gradual realization of what the real IoT needs to be.

Startups in the IoT space have provided some support for data storage and analytics for well over a year.  A Forbes article summarizes some of the key players in the space, but if IoT is a potential market revolution then startups are really selling themselves and not their technology.  IoT adopters will generally want to bet on somebody with a big name, and that’s particularly true of network operators looking for a realistic IoT service strategy.

In November of last year, I blogged about the GE Digital Predix platform, which as I said was the first credible IoT story from a major provider.  With strong analytics and a good strategy for capturing current sensor data, Predix has all the pieces to be a universal IoT framework, but the company has stressed “industrial IoT” rather than the universality inherent in its platform.  One thing the breadth of Predix may have done is to encourage other IoT vendors to focus their efforts on specific applications in either a horizontal or vertical sense.

One example of focus is addressing what many IoT users would see as the high first-cost barrier to IoT applications.  The cloud is a natural heaven for a logical ease-into-IoT model, and so it’s not surprising that cloud providers have IoT service offerings:

  • Amazon, the cloud leader, has an IoT offering that focuses on a unified model for device and cloud applications and facilitates the integrated use of a variety of “foundation services” hosted by AWS. Their approach is more development-centric than productized.
  • Google has a streaming and publish/subscribe distribution model that adds predictive analytics and event processing to IoT, all based on Google’s ubiquity as a cloud provider. Their Cloud Dataflow programming model may be a seminal reference for both batch and streaming IoT development.
  • Microsoft offers both premises tools for developing IoT applications and Azure cloud tools. They also integrate Cortana capability with inquiries and analytics, and they’ve won some very public deals recently.
  • Oracle offers its IoT Cloud Service, which focuses explicitly on the two key truths about IoT—you have to exploit current sensors connected through legacy private networks and you have to focus on data storage and analytics.
  • Salesforce’s IoT Cloud extends sensor and analytics concepts to websites and other customer information, and offers event triggering of cloud processes. The focus, not surprisingly, is on CRM but it appears that broader in-company use would also be possible.
  • Operators like AT&T and Verizon have IoT services that focus on connectivity, but AT&T also provides industry-specific integrated solutions.

Then, back in May, HPE talked about its IoT model in “platform” terms, which is how the media and market is now distinguishing between the sensor-driven IoT nonsense and the more logical application-and-repository concept.  The HP story had an unfortunate slant in its title, in my view: “Hewlett Packard Enterprise Simplifies Connectivity Across the IoT Ecosystem”.  The announcement does contain device and connectivity management elements, and the title tended to focus everyone on that aspect.  But HPE also provided a repository, data conversion, and analytics platform vision that should have been the lead item.  HPE is also partnering with GE Digital to power the Predix platform, which may suggest the company wants to be an IoT host for multiple software frameworks.

The most recent announcement is from IBM and Cisco.  The companies have agreed to provide Cisco hosting of Watson analytics so that event processing can be managed locally, making control loops shorter.  The move is not only potentially critical for both Cisco’s and IBM’s IoT differentiation, it’s an illustration that one of the key values of IoT in process management and event handling would best be supported by functionality hosted close to the sensors.  This explains why so many cloud IoT stories are gravitating toward complex event processing, and it also illustrates why IoT could be a very powerful driver for NFV.  Data centers close to the edge could host IoT processes with a shorter control loop, and that could help justify the data center positioning.  Edge data centers could then host service features.

Cisco’s ultimate IoT position could be critical.  The company has, in the past, been dazzlingly simplistic in its view of the future—everything has to come down to more bits for routers to push.  You could view complex-event-process (CEP) IoT that way, of course.  On the other hand, you could view it as an example/application of “fog computing” or the distribution of intelligence across the network.  The latter view would be helpful to Cisco, IBM, and IoT overall, and given that Cisco has recently had some management changes that suggest it’s moving in a different direction than Chambers had taken it, perhaps there’s a chance we’ll see some real IoT insight and not just another report on traffic growth.

Insight is what IoT is really about.  We could expect to capture more contextual data from wide use of IoT if we could dodge the first-cost problems, privacy issues, and security challenges of the inherently destructive model of “everything on the Internet”.  This is a big data and analytics application in one sense, complex event processing in another sense, and that’s how IoT has to develop if it’s going to develop in any real sense.  At some point, some platform vendor is going to step up and frame the story completely, and that could put IoT on the fast track.  Wouldn’t it be nice for some “revolutionary” technology to actually revolutionize?

How Will Virtual-Network Services Impact Transport Configuration?

One of the important issues of multi-layer networking, and in fact multi-layer infrastructure, is how things at the top percolate down to the bottom.  Any kind of higher-layer service depends on lower-layer resources, and how these resources are committed or released is an important factor under any circumstances.  If the agility of lower layers increases, then so does the importance of coordination.  Just saying that you can do pseudowires or agile optics, or control optical paths with SDN, doesn’t address the whole problem, which is why I propose to talk about some of the key issues.

One interesting point that operators have raised with me is that the popular notion that service activities would directly drive transport changes—“multi-layer provisioning”—is simply not useful.  The risks associated with this sort of thing are excessive because it introduces the chance that a provisioning error at the service level would impact transport networks to the point where it would impact other services and customers.  What operators want is not integrated multi-layer provisioning, but rather a way to coordinate transport configuration.

Following this theme, there are two basic models of retail service, coercive/explicit and permissive/implicit.  Coercive services commit resources on request—you set them up.  Permissive services preposition resources and you simply connect to them.  VPNs and VLANs are coercive and the Internet is permissive.  There are also two ways that lower-layer services can be committed.  One is to link the commitment below to a commitment above, which might be called a stimulus model, and the other is to commit based on aggregate conditions, which we might call the analytic model.  This has all been true for a long time, but virtualization and software-defined networking is changing the game, at least potentially.

Today, it’s rare for lower network layers, often called “transport” to respond to service-level changes directly.  What happens instead is the analytic model, where capacity planning and traffic analysis combine to drive changes in transport configuration.  Those changes are often quite long-cycle because they often depend on making physical changes to trunks and nodes.  Even when there’s some transport-level agility, it’s still the rule to reconfigure transport from an operations center rather than with automatic tools.

There are top-down and bottom-up factors that offer incentive or opportunity to change this practice, providing it can stay aligned with operator stability and security goals.  At the bottom, increased optical agility and the use of protocol tunnels based on anything from MPLS to SDN allow for much more dynamic reconfiguration, to the point where it’s possible that network problems that would ordinarily have resulted in a higher-layer protocol reaction (like a loss of a trunk in an IP network) can instead be remediated at the trunk level.  The value of lower-layer agility is clearly limited if you try to drive the changes manually.

From the top, the big change is the various highly agile virtual-network technologies.  Virtual networks, including those created with SDN or SD-WAN, are coercive in their service model, because they are set up explicitly.  When you set up a network service you have the opportunity to “stimulate” the entire stack of service layers, not to do coupled or integrated multi-layer provisioning but to reconsider resource commitments.  This is what I mean by a stimulus model, of course.  It’s therefore fair to say that virtual networking in any form has the potential to change the paradigm below.

There are two possible responses, then, to the way lower-layer paths and capacity are managed.  One is to adopt a model where service stimulus from above drives an analytic process that rethinks the configuration of what’s essentially virtual transport.  An order with an SLA would then launch an analytics process that would review transport behavior based on the introduction of the new service and, if necessary, re-frame transport based on how that meeting that SLA would alter capacity plans and potentially impact target resource utilization and the SLAs of other services/customers.  Another is to shorten the cycle of the analytic model, depending on a combination of your ability to quickly recognize changes in traffic created by new services and your ability to harness service automation to quickly alter transport characteristics to address the changes.  Which way is best?  It depends on a number of factors.

One factor is the scale of service traffic relative to the total traffic of the transport network.  If a “service” is an individual’s or SMB’s personal connectivity commitment, then it’s very likely that the SLA for the service would have no significant impact on network traffic overall, and it would not only be useless to stimulate transport changes based on it, it would be dangerous because of a risk of overloading control resources with a task that had no helpful likely outcome.  On the other hand, a new global enterprise VPN might have a very significant impact on transport traffic, and you might indeed want to reflect the commitment the SLA for such a service reflects even before the traffic shows up.  That could prevent congestion and problems, not only for the new service but for others already in place.

Another factor is the total volatility at the service layer.  A lot of new services and service changes in a short period of time, reflecting a variety of demand sources that might or might not be stimulated by common drivers, could generate a collision of requests that might have the same effect as a single large service user.  For example, an online concert might have a significant impact on transport traffic because a lot of users would view it in a lot of places.  It’s also true that if services are ordered directly through an online portal rather than through a human intermediary there’s likely to be more and faster changes.  The classic example (net neutrality aside for the moment) is the “turbo button” for enhanced Internet speed.

The final factor is SLA risk.  Even a fast-cycle, automated, analytic model of transport capacity and configuration management relies on traffic changes.  If those changes ramp rapidly, then it’s likely that remediation will lag congestion, which means you’re going to start violating SLAs.  There’s a risk that your remedy will create changes that will then require remediation, creating the classic fault avalanche that’s the bane of operations.

I think where this ends up is that virtual networking at multiple layers will need to have layer or layer-group control, with behavior at the higher layer coupled by analytics and events to behavior at the lower layer.  You don’t provision transport with services, but you do stimulate the analysis or capacity planning of lower layers when a service-layer change is announced.  That lets you get out in front of traffic changes and prevent new services from impacting existing ones.  Since virtual networks are explicit rather than permissive, they present a unique opportunity to do this, and it might be that the ability to stimulate transport-layer analytic processes will be a critical byproduct of virtual network services.

Events–The Missing Link in Service Automation

In my blog yesterday I talked about service modeling, and it should be clear from the details I covered that lifecycle management, service automation, and event handling are critical pieces to NFV.  The service model ties these elements together, but the elements themselves are also important.  I want to talk a bit more about them today.

Almost a decade ago, the TMF had an activity called “NGOSS Contract” that later morphed into the GB942 specification.  The centerpiece of this was the notion that a service contract (a data model) would define how service events related to service processes.  To me, this was the single most insightful thing that’s come out of service automation.  The TMF has, IMHO, sadly under-realized its own insight here, and perhaps because of that the notion hasn’t been widely adopted.  The TMF also has a modeling specification (“the SID”, or Service Information and Data model) that has the essential features of a model hierarchy for services and even a separation of the service (“Customer-Facing”) and resource (“Resource-Facing”) domains.

Service automation is simply the ability to respond to events by invoking automated processes rather than manual ones.  In yesterday’s blog I noted that the rightful place to do the event steering to processes is in the model (where the TMF’s seminal effort put it and incidentally where Ciena’s DevOps toolkit stuff makes it clear that TOSCA and Ciena also can put it).  What we’re left with is the question of the events.  Absent service events there’s no steering of events to processes and no service automation.

The event stuff can’t be ignored, and it’s more complicated than it looks.  For starters, there’s more than one kind of service event.  We have resource events that report the state of resources that host or connect service features.  We have operations events that originate with the OSS/BSS, customer service rep, network operations center, or even directly with the customer.  We also have model events that originate within a service model and signal significant conditions from one model element (a lower-level dependent one) to another (the higher-level one), for example.  Finally, with NFV, we have virtual network function (VNF) events.  Or we should.

One of the glaring gaps in NFV work so far is the relationship between virtual functions as elements of a service and both the resources below and the service structures above.  The current NFV work postulates the existence of an interface between a virtual function (which can be made up of multiple elements, including some management elements) and the rest of the NFV logic, meaning the orchestration and management components.  That’s at least an incomplete approach if not the wrong one; the connection should be based on events.

The first reason for this is consistency.  If service automation is based on event steering to appropriate processes you obviously need events to be steered, and it makes little sense to have virtual functions interact with service processes in a different way.  Further, if a virtual function is simply a hosted equivalent of a physical device (which the NFV work says it is) and if physical devices, through management systems, are expected to generate resource events, so should VNFs.

The second reason for this is serialization and context.  Events are inherently time-lined.  You can push events into a first-in-first-out (FIFO) queue and retain that context while processing them.  If you don’t have events to communicate service conditions at all levels, you can’t establish what order things are happening in, which makes service automation totally impossible.

Reason number three is concurrency and synchronization, and it’s related to the prior one.  Software that handles events can be made multi-threaded because events can be queued for each process and for multiple threads (even instances of a single process).  That means you can load-balance your stuff.  If load balancing is an important feature in a service chain, doesn’t it make sense that it’s an important feature in the NFV software itself?  And still, with all of this concurrency, you can always synchronize your work through events themselves.

Generating events is a simple matter; software that’s designed to be event-driven would normally dispatch an event to a queue, and there the event could be popped off and directed to the proper process or instance or thread.  Dispatching an event is just sending a message, and you can structure the software processes as microservices, which is again a feature that Ciena and others have adopted in their design for NFV software.  When you pop an event, you check the state/event table for the appropriate service element and you then activate the microservice that represents the correct process.

State/event processes themselves generate events as one of their options.  In software, the typical behavior of a state/event process is to accept the input, generate some output (a protocol message, an action, or an event directed to another process) and then set your “next-state” variable.  Activating an ordered service works this way—you get an Activate event, you dispatch that event to your subordinate model elements so they activate, and you set your next-state to ACTIVATING.  In this state, by the way, that same Activate event is a procedure error because you’re already doing the activating.

Can we make a VNF generate an event?  Absolutely we can, just as we can make a hardware management system generate one.  Who do we dispatch a VNF event to?  To the service model element that deployed the VNF.  That element must then take whatever local action is appropriate to the event, and then dispatch events to higher- or lower-level elements as needed.

Phrased this way, the NFV notion of having “local” or “proprietary” VNF Manager (VNFM) elements as well as “central” elements actually can be made to work.  Local elements are part of the service object set that deploys the VNF—a resource-domain action in my terms.  Central elements are part of the service object set that defines functional assembly and collection—the service-domain behaviors.  In TMF terms these are Resource-Facing- and Customer-Facing Services (RFS and CFS, respectively).

If everything that has to be done in NFV—all the work associated with handling conditions—is triggered by an event that’s steered through the service data model, then we have full service automation.  We also have the ingredients needed to integrate VNFs (they have to generate an event that’s handled by their own controlling object) and the tools needed to support a complete service lifecycle.

You also have complete control over the processes you’re invoking.  Any event for any service element/object can trigger a process of your choice.  There’s no need for monolithic management or operations systems (though you can still have them, as collections of microservices) because you can pick the process flavor you need, buy it, or build it.  This, by the way, is how you’d fulfill the goal of an “event-driven OSS/BSS”.

This approach can work, and I think any software architect who looks at the problem of service automation would agree.  Most would probably come up with it themselves, in fact.  It’s not the only way to do this, but it’s a complete solution.  Thus, if you want to evaluate implementations of NFV, this is what you need to start with.  Anything that has a complete hierarchical service model, can steer events at any level in the model based on a state/event relationship set, and can support event generation for all the event types (including VNF events and including model events between model elements) can support service automation.  Anything that cannot do that will have limitations relative to what I’ve outlined, and as an NFV buyer you need to know about them.

A Deep Dive into Service Modeling

The question of how services are modeled is fundamental to how services can be orchestrated and how service-lifecycle processes can be automated.  Most people probably agree with these points, but not everyone has thought through the fact that if modeling is at the top of the heap, logically, then getting it right is critical.  A bit ago I did a blog on Ciena’s DevOps toolkit and made some comments on their modeling, and that provoked an interesting discussion on LinkedIn.  I wanted to follow up with some of the general modeling points that came out of that discussion.

Services are first and foremost collections of features.  The best example, and one I’ll use through all of this blog, is that of a VPN.  You have a “VPN” feature that forms the interior of the service, and it’s ringed by a series of “Access” features that get users connected.  The Access elements might either be simple on-ramps or they might include “vCPE” elements.  When a customer buys a VPN they get what almost looks like a simple molecule; the central VPN ringed with Access elements.

Customers, customer service reps, and service architects responsible for building services would want to see a service model based on this structure.  They’d want Access and VPN features available for composition into VPN services, but they would also want to define a “Cloud” service as being a VPN to which a Cloud hosting element or two is added.  The point is that the same set of functional elements could be connected in different ways to create different services.

Logically, for this to work, we’d want all these feature elements to be self-contained, meaning that when created they could be composed into any logical, credible, service and when they were ordered they could be instantiated on whatever real infrastructure happened to be there.  If a given customer order involved five access domains, and if each was based on a different technology or vendor, you’d not want the service architect to have to worry about that.  If the VPN service is orderable in all these access domains, then the model should decompose properly for the domains involved, right?

This to me is a profound, basic, and often-overlooked truth.  Orchestration to be optimally useful has to be considered at both the service level and the resource level.  Service orchestration combines features, and resource orchestration deploys those features on infrastructure.  Just as we have a “Service Architect” who does the feature-jigsaws that add up to a service, we have “Resource Architects” who build the deployment rules.  I’d argue further that Service Architects are always top-down because they define the retail service framework that is the logical top.  Resource architects could be considered to be “bottom-up” in a sense, because their role is to expose the capabilities of infrastructure in such a way that those capabilities can couple to features and be composed into services.

To understand the resource side, let’s go back to the Access feature, and the specific notion of vCPE.  An Access feature might consist of simple Access or include Access plus Firewall.  Firewall might invoke cloud hosting of a firewall virtual network function (VNF), deployment of a firewall VNF in a piece of CPE, or even deployment of an ordinary firewall appliance.  We have three possible deployment models, then, in addition to the simple Access pipeline.  You could see a Resource Architect building up deployment scripts or Resource Orchestrations to handle all the choices.

Again applying logic, AccessWithFirewall as a feature should then link up with AccessWithFirewall as what I’ll call a behavior, meaning a collection of resource cooperations that add up to the feature goal.  I used the same name for both, but it’s not critical that be done.  As long as the Service Architect knew that the AccessWithFirewall service decomposed into a given Resource Behavior, we’d be fine.

So, what this relationship would lead us to is that a Resource Architect would define a single Behavior representing AccessWithFirewall and enveloping every form of deployment that was needed to fulfill it.  When the service was ordered, the Service Architect’s request for the feature would activate the Resource model, and that model would then be selectively decomposed to a form of deployment needed for each of the customer access points.

If you think about this approach, you see that it defines a lot of important things about modeling and orchestration.  First, you have to assume that the goal of orchestration in general is to decompose the model into something else, which in many cases will be another model structure.  Second, you have to assume that the decomposition is selective, meaning that a given model element could decompose into several alternative structures based on a set of conditions.

So a higher-level model element can “contain” alternatives below, and can represent either decomposition into lower-level elements or deployment of resources.  Are there other general properties?  Yes, and they fit into the aggregation category.

If two lower-level things (Access and VPN, for example) make up a service, then the status of the service depends on the status of these things, and the deployment of the service is complete only when the deployment of each of the subordinate elements is complete.  Similarly, a fault below implies a fault above.  To me, that means that every object has an independent lifecycle process set, and within it responds to events depending on its own self-defined state.

Events are things that happen, obviously, and in the best of all possible worlds they’d be generated by resource management systems, customer order systems, other model elements, etc.  When you get an event directed at a model element, that element would use the event and its own internal state to reference a set of handler processes, invoking the designated process for the combination.  These processes could be “internal” to the NFV implementation (part of the NFV software) or they could be operations or management processes.

If we go back to our example of a VPN and two AccessWithFirewall elements, you see that the service itself is a model element, and it might have four states; ORDERED, ACTIVATING, OPERATING, and FAULT.  The AccessWithFirewall elements include two sub-elements, the Firewall and the AccessPipe.  The Firewall element could have three alternative states—FirewallInCloud, FireWallvCPE, and FirewallAppliance.  The first two of these would decompose to the deployment of the Firewall VNF either in a pooled resource or in an agile premises box, and the latter would decompose to an external-order object that waited for an event that said the box had been received and installed.

If we assume all these guys have the same state/events, then we could presume that the entire structure is instantiated in the ORDERED state, and that at some point the Service model element at top receives an Activate event.  It sets its state to ACTIVATING and then decomposes its substructure, sending the AccessWithFirewall and VPN model elements an Activate.  Each of these then decompose in turn and also enter the ACTIVATING state, waiting for the lowest-level deployment to report an Operating event.  When a given model element has received that event from all its subordinates, it enters the OPERATING state and reports that event to its own superior object.  Eventually all these roll up to make the service OPERATING.

If a lower-level element faults and can’t remediate according to its own SLA, then it reports a FAULT event to its superior.  The superior element could then either try remediation or simply continue to report FAULT up the line to eventually reach the service level.  When a fault is cleared, the model element that had failed now reports Operating and enters that state, and the clearing of the fault then moves upward.  At any point, a model element can define remedies, invoke OSS processes like charge-backs, invoke escalation notifications to an operations center, etc.

Another aggregation point is that a given model element might define the creation of something that lower-level elements then depend on.  The best example here is the creation of an IP subnetwork that will then be used to host service features or cloud application components.  A higher-level model defines the subnet, and it also decomposes into lower-level deployment elements.

I would presume that both operators and vendors could define model hierarchies that represent services or functional components of services, and also represent resource collections and their associated “behaviors”.  The behaviors form the deployment bottom process sets, and so if two different vendors offered slightly different requirements they could still perform interchangeably if they rationalized to the same behavior model, which could then be referenced in a service.

This is a lightweight description of how a service model could work, and how it could then define the entire process of service creation and lifecycle management.  All the software features of SDN, NFV, and OSS/BSS/NMS are simply referenced as arguments in a state/event table for various model elements.  The model totally composes the service process.

The final logical step would be to develop a tool that let operator architects drag and drop element to create higher-level service models all the way up to the service level, and to define resources and deployment rules below.  These tools could work from the service model catalog, a storage point for all the model elements, and they could be based on something open-source, like the popular Eclipse interactive development environment used for software-building.

You might wonder where the NFV components like MANO or VNFM or even VNFs are in this, and the answer is that they are either referenced as something to deploy in a behavior, or they’re referenced as a state/event-driven process.  You could build very generic elements, ones that could be referenced in many model element types, or you could if needed supply something that’s specialized.  But there is no single software component called “MANO” here; it’s a function and not a software element, and that’s how a software architect would have seen this from the first.

A data-model-driven approach to NFV is self-integrating, easily augmented to add new functions and new resources, and always updated using model-data-driven activities not software development.  A service model could be taken anywhere and decomposed there with the same kind of model software, and any part of a model hierarchy could be assigned to a partner provider or a different administrative zone or a different class of resources.

This is how NFV should be done, in my view, and my view on this has never changed from my first interactions with the ETSI process and the original CloudNFV project.  It’s what my ExperiaSphere tutorials define, for those interested in looking up other references.  This is the functional model I compare vendor implementations against.  They don’t have to do everything the way I’d have done it, but they do have to provide as workable an approach that covers at least the same basis.  If you’re looking for an NFV implementation this is the reference you should apply to any implementations out there, open source or otherwise.

There’s obviously a connection between this approach and the management of VNFs themselves.  Since this blog is already long, I’ll leave that issue for the next one later this week.