Is Verizon’s Data Center Sale Bad News for Carrier Cloud?

The announcement that Verizon is truly selling its cloud data centers, we now know to Equinix, has to put fear in the heart of carrier cloud proponents.  It should, and it should even call into question the future of NFV.  But all is not lost here; the fact is that Verizon was premature in getting into the cloud business and in the wrong part of it.  This is the jumping-off point for lessons to be learned from the deal, and they’re not all negative.

For the telcos, getting into the cloud business was always a kind of love/hate thing.  On one hand, providing cloud data centers seems a lot like providing connectivity.  It’s a high-capex business, favoring those who can tolerate a fairly low ROI.  Perfect, in short, for an industry made up largely of former public utilities.  On the other hand, it seems to double down on a business model that’s based on someone else exploiting you.  Not to mention that you don’t exactly have herds of experienced salespeople to sell it.

Nearly all operators had cloud plans five years ago, and most of them haven’t really followed through on them.  Verizon, representing the highest-demand-density area of the US, was a bit more bullish on the concept.  They believed that corporate America (who were twice as likely to have their HQ in Verizon’s territory than elsewhere in the US) would flock to them as a credible cloud source, favoring them over upstarts like Amazon.

What put all of this off track was simple; the cloud hasn’t really been driven by corporate America but by Silicon Valley.  OTT businesses represent, I’m told, more than 2/3rds of Amazon’s cloud revenues.  Not only that, enterprises were probably the hardest of cloud sells.  No new business strategy can survive a protracted early failure, and Verizon’s cloud is no exception.

The first point to ponder is that cloud computing services were never a major credible driver of carrier cloud.  The total contribution that cloud computing services could make to carrier cloud data centers in 2017 is only 11% of the total, and even at its best (in 2030, the limit of my modeling) it only reaches 27%.  Thus, in the near term, operators don’t deploy a lot of cloud resources driven by cloud services.  Given the small number of data centers, there’s no significant edge-hosting promoted, which means that nothing happens that could accelerate NFV anyway.

The second point is that for the business customer out in 2030, cloud services is the largest overall driver of potential carrier data centers, and it’s third among all the drivers.  Cloud services could drive four times the number of business data centers as vCPE/NFV, for example.  Thus, in the long term, cloud computing services is a very important driver of carrier cloud.  What got operators so excited about cloud services was just this long-term value, and also the fact that the service framework was well understood.  Contextual services, my own favorite, has more potential than cloud services but requires a lot more work on building a functional framework that could actually support its potential opportunity.

This is how the “Verizon was too early” view can be justified.  The true trajectory of enterprise cloud services is really set by the pace at which cloud-value-added platform features like the rich set offered through web-service APIs by Amazon and Microsoft can influence developers.  The future of enterprise cloud spending is determined not by what you move to the cloud but what you never had elsewhere and can now do using special cloud-distributed features.  That could never have taken off as fast as enterprise commitment would have to have exploded to make Verizon’s early bet a good one.  Nevertheless, cloud services could be responsible for a quarter of some telcos’ revenues by 2030.

Verizon can still get their share, but the problem is the real driver of enterprise cloud adoption is those previously-mentioned API-based services that will facilitate the creation of cloud-specific or cloud-exploiting applications.  IaaS isn’t enough to get a seat at that table, and the proof is Amazon’s adoption of the API strategy itself.  So Verizon would need to change its approach to the cloud, to focus either on those APIs or on creating an IaaS framework that API-and-middleware vendors would then augment to go after the enterprise.  IBM has sort-of-committed to that model, promoting middleware that runs either in the cloud or in the enterprise and rebuilds applications to exploit the cloud better.  It’s a viable model.

The biggest problem that Verizon’s turnaround could create is in the NFV space, even though the small 2017 cloud service opportunity wasn’t large enough to matter.  There are three reasons for this.  First, cloud computing services will always be a better opportunity to deploy carrier cloud data centers than vCPE.  Since vCPE is supposed to be the prime driver for NFV if you read the rags, giving up cloud computing services could truly starve NFV for early hosting points.  Second, cloud computing services are logically related to IoT and contextual services for the enterprise.  If telcos want to succeed with IoT and contextual services, they’ll create cloud computing opportunities that they can’t afford to cede to OTTs.  Worse, a package of enterprise cloud computing and contextual and IoT services from OTTs could make telco success in any of the three spaces problematic.

This doesn’t mean that in the end Verizon should have kept the data centers.  The key for Verizon and for any telco is getting carrier cloud deployed in central offices and mobile infrastructure hub points where IMS/EPC elements are currently deployed.  In 2019, when cloud services takes its first big jump in number of data center opportunities, it’s still only 12% of the total opportunities.  What Verizon (and everyone else) needs to do is look at the other 88%.

This is where the third reason (bet you thought I’d forgotten it, or counted wrong!) comes in.  It’s very likely that Verizon expected cloud service success to position a lot of data center assets for other applications to exploit.  The marginal cost of hosting those applications would be reduced.  Absent those assets, the other applications will have to be developed more efficiently to make their business case stronger and bear more of the first costs.

It may be harder for Verizon to come up with these other NFV drivers now.  Not only has consideration of the other drivers likely gone fallow while people salivated over cloud computing services, Verizon’s SDN/NFV model bets on vendor support rather than on doing a roll-your-own open implementation like rival AT&T has elected to do.  Vendors have been next to useless in NFV in general and at making a broad business case in particular, even those who actually have all the elements of a complete solution.

The net of this is that selling off Verizon’s cloud computing assets doesn’t really hurt its cloud computing strategy—that would have needed reworking in any event (which surely is why Verizon is selling it).  The bigger issue is NFV, because NFV needs a jump-start in terms of hosting points to open a variety of VNF options in a variety of service areas.  Otherwise the first cost and initial risk of any one of those service areas would likely outstrip credible early benefits.

Making Analytics Work as the Basis for Management of Virtual Services

Anyone used to the SLAs of TDM services knows that SLAs have changed.  We used to talk about “error-free seconds”, and now we’re usually talking about monthly uptime.  Mobile devices have changed our view on things like call quality—“Can you hear me now?” is almost a classic example of a trend to assign “good” to something without fatal impairment.  People accept that, and I’m not complaining, but this trend might explain why Juniper bought AppFormix, and it might also help us understand how analytics and SLAs combine in SDN and NFV services.

Any “virtual” service, meaning any service that partitions resources or hosts components, has a potential issue with SLAs.  First, sharing any resource depends on statistical multiplexing, meaning matching expected overall utilization with a level of real resources.  There are no fixed assignments, and thus it’s not possible to write an exact SLA of the type we had with TDM, where resources were fixed.  Second, it’s difficult to reflect the state of a virtual service in terms of the behavior of the real resources.  In many cases, the real resource isn’t even something the service user expects to have—servers in a firewall service are a good example.

Sharing resources also creates issues in management, no matter what your specific SLA goals might be.  If a thousand virtual services utilize a single physical element, there’s a risk that the management systems of the virtual services would all try to read status or even write control changes to the shared element.  At best, this could create what’s almost a management-level DDoS attack where the “attack” is really a flood of legitimate (but ill-advised) status inquiries.  At worst, one user might optimize their service at the expense of others that shared the resource.

Early on in the evolution of virtual services, I suggested that the solution to both problems lies in what I called “derived operations”.  The real status of real functional elements—shared or not—would all be stored in a common repository by a status-gathering process independent of users.  This repository would then be queried to get status and status trends, meaning that analytic processes would run against this repository.  A management query then becomes a database query, and the relationship between each real-resource status and the overall status of a dependent service would be reflected in the formula whereby repository data was analyzed.  VPN status equals sum of status of VPN elements, so to speak.

A lot of possible SLA and management views can derive from this model.  One is the classic analytics-centric management vision.  Many NFV implementations propose analytics as the source of events that drive service lifecycle management.  Obviously, you have to be able to do database dips and perform over-time correlative analysis to be able to derive actionable events from mass status information.  If the trend in link utilization is a determinant into whether to declare the link becoming “congested” we need to see the utilization data sequence over time, not at this instant.  Many operators and vendors want to manage all virtual services this way, even those based totally on legacy network elements.

The issue with the pure analytics vision is that “fault” or “condition” is in the eyes of the beholder.  There is no single standard for event generation.  Even the hard failure of a given trunk might not create a service event if the service’s traffic doesn’t transit the failed resource.  Thus, all faults aren’t correlated with a given SLA, and that means you have to be able to understand what faults really have to be handled.

Fault correlation for effective event generation is what I meant by the notion that repository data was “formulized” to synthesize status for virtual elements.  However, the relationship between virtual services and fixed resources is variable, reflecting the current state of deployment.  That means that the status formulas have to be adjusted with changes in service resource assignment.  When you provision a service (or redeploy it) you essentially build the status formula for it.  Actually, you build an entire set of them, one for each event you want to generate.

This model of derived operations is, IMHO, the most important element in any analytics/repository model of virtual service management.  Yes, you need a repository, and yes you need a collector to populate it.  With these and fixed analytics models to generate events, you still have nothing.  A fixed model can’t really differentiate between a condition and an event, the former being something that happens and the latter being something that impacts some service’s (or services’) lifecycle.

A formula-linked approach to derived operations is a critical step, but not the only one.  You still have the problem of distributing the information, and here one of the management issues of the NFV model emerges.  VNFs, which represent the “service” element, have an autonomous component of the VNF manager co-resident, and that component would be a “logical” place to dip into a repository for the status of related resources.  The problem is that it’s not clear how the VNF (which plays no role in the hosting or connection decisions associated with deployment/redeployment) would know what resources it had.  Even if it did, you can’t have every VNF polling for status when it feels threatened; you have the same pseudo-DDoS problem that arises if you poll resources directly.

For anyone wanting to get cloud or virtual network service analytics right, you have to get events to service lifecycle management processes.  That means that you can’t use general tools that aren’t linked to building a service, which means that any vendor who buys an analytics player will have to augment the basic tools they’d offer with some specific mechanism to author and modify those formulas.  You also can’t let the lifecycle processes of parts of the service operate in an autonomous way; they are not the entire service and so their actions have to be coordinated.

The process of formula-building would be facilitated if we presumed that there was, for each class of resource and each class of virtual function, a baseline “MIB” that all member elements were expected to populate.  Now we could envision the “repository” as holding a time-series of element MIBs, and it’s then the responsibility of the collector functions to fit the real variables for the devices they collect from to the class-MIB that’s appropriate for storage in the repository.

If we can get resource status standardized by class and collected in a repository (the never-approved IETF work on “Infrastructure to Application Exposure” or i2aex could have done this) then we can use any number of publish-and-subscribe models to disseminate the relevant conditions as events.  Then all we need is state/event-driven lifecycle management to organize everything.

Right now, analytics as a solution to the challenges of virtualization-driven services is a cop-out because it’s not being taken far enough.  We’ve had too many of these cop-outs in virtualization (SDN, NFV, whatever) so far, and we eventually have to deal with the critical issues of software automation of the service lifecycle.  If we don’t then we’ll never have broad virtualization-based transformation.

Taking the Carrier Cloud Beyond CORD and the Central Office

CORD, the new darling of telco transformation using open source, is a great concept and one I’ve supported for ages.  I think it’s a necessary condition for effective transformation, but it’s not a sufficient condition.  There are two other things we need to look at.  The first is what makes up the other carrier-cloud data centers, and the second is what data-center-like central offices are being driven by.

If we re-architect all the world’s data centers into a CORD model, we’d end up with about 50,000 new carrier-cloud hosting points.  If we added all the edge offices associated with mobile transformation, we’d get up to about 65,000.  My latest model says we could get to about 102,000 carrier cloud data centers, so it’s clear that fully a third of carrier-cloud data centers aren’t described by central office evolution.  We need to describe them somehow or we’re leaving a big hole in the story.

An even bigger hole results if we make the classic mistake technology proponents have made for at least twenty years; focus on what changes and not why.  The reason why COs would transform to a CORD model is that service focuses on hosting things and not on connecting things.  The idea that this hosting results because we’ve transformed connection services from appliance-based to software-based is specious.  We’ve made no progress in creating a business justification for that kind of total-infrastructure evolution, nor will we.  The question, then, is what does create the hosting.

Let’s start (as I like to do) at the top.  I think most thinkers in the network operator space agree that the future of services is created by the thing that made the past model obsolete—the OTT services.  Connection services have been commoditized by a combination of the Internet pricing model (all you can eat, bill and keep) and the consumerization of data services.  Mobile services are accelerating the trends those initial factors created.

A mobile consumer is someone who integrates network-delivered experiences into their everyday life, and in fact increasingly drives their everyday life from mobile-delivered experiences.  All you have to do is walk down a city street or visit any public place, and you see people glued to their phones.  We can already see how mobile video is changing how video is consumed, devaluing the scheduled broadcast channelized TV model of half-hour shows.  You can’t fit that sort of thing into a mobile-driven lifestyle.

One thing this has already done is undermine the sacred triple-play model.  Video delivery has fallen to the point where operators like AT&T and Verizon are seeing major issues.  AT&T has moved from an early, ambitious, and unrealistic notion of universal IPTV to a current view that they’ll probably deliver only via mobile and satellite in the long term.  Verizon is seeing its FiOS TV customers rushing to adopt the package plan that has the lowest possible cost, eroding their revenues with each contract renewal.

Mobile users demand contextual services, because they’ve elected to make their device a partner in their lives.  Contextual services are services that recognize where you are and what you’re doing, and by exploiting that knowledge make themselves relevant.  Relevancy is what differentiates mobile services, what drives ARPU, and what reduces churn.  It’s not “agility” that builds revenue, it’s having something you can approach in an agile way.  Contextual services are that thing.

There are two primary aspects of “context”, geographic and the other is social.  We have some notion of both of these contextual aspects today, with geographic location of users being communicated from GPS and social context by the applications and relationships we’re using at any given moment.  We also have applications that exploit the context we have, but mining of social context from social networks, searches, and so forth, and expanding geographic context by inserting a notion of mission and integrating location with social relationships will add the essential dimension.  IoT and the next generation of social network features will come out of this.

And it’s these things that operators have to support, so the question is “How?”  We have to envision an architecture, and what I propose we look at is the notion of process caching.  We already know that content is cached, and it seems to follow that applications that have to “know” about user social and location context would be staged far enough forward toward the (CORD-enabled) CO that the control loop is reasonable.  If you like things like self-drive cars, they require short control loops so you stage them close.  Things moving at walking speed can deal with longer delay, and so forth.

The second-tier offices, the stuff beyond the two-thirds of cloud data centers that are essentially edge-located, would represent second-tier process cache points, numbering about 25,000, and metro-area repositories (about 4,000 globally).  From there we have roughly 7,000 deeper specialized information cache points and places where analytics are run, which gets us up to the 102,000 cloud data centers in the model.

All of the edge office points would have to be homed to all of the second-tier repositories in their metro area, and my model says you home directly to three and you have transit connectivity to them all.  The metro points would connect to a global network designed for low latency, and these would also connect the specialized data centers.  This is basically how Google’s cloud is structured.

From a software structure, it’s my view that you start with the notion of an agent process that could be inside a device or hosted on an edge-cloud.  This process draws on the information/contextual resources and then frames things for both queries into the resource pool (“How do I get to…”) and for responses to the device users.  These agent processes could be multi-threaded with user-specific context, or they could be dedicated to users—it depends on the nature of the user and thus the demands placed on the agent.

This same thing is true for deeper processes.  You would probably handle lightweight stuff in a web-like way—multiple users accessing a RESTful resource.  These could be located fairly centrally.  When you start to see more demand, you push processes forward, which means first that there are more of them, and second that they are closer to users.

The big question to be addressed here isn’t the server architecture, but how the software framework works to cache processes.  Normal caching of content is handled through the DNS, and at least some of that mechanism could work here, but one interesting truth is that DNS processing takes a fairly long time if you have multiple hierarchical layers as you do in content delivery.  That’s out of place in applications where you’re moving the process to reduce delay.  It may be that we can still use DNS mechanisms, but we have to handle cache poisoning and pushing updates differently.

There is a rather obvious question that comes out of the distribution of carrier cloud data centers.  Can we start with a few regional centers and then gradually push applications outward toward the edge as economics dictates?  That’s a tough one.  Some applications could in fact be centrally hosted and then distributed as they catch on and earn revenue, but without edge hosting I think the carrier cloud is going to be impossible to differentiate versus the cloud providers like Google and Amazon.  Operators are already behind in experience-based applications, and they can’t afford to adopt an approach that widens the gap.

A less obvious problem is how revenue is earned.  Everyone expects Internet experiences to be free, and what that really means is that they’d be ad-sponsored.  The first issue there is that ads are inappropriate with many contextual applications—self-driving cars comes to mind, but any corporate point-of-activity empowerment application would also not lend itself to ad sponsorship.  The second issue is that the global advertising budget is well under a fifth of total operator revenues globally.  We have to pick stuff people are willing to directly pay for to make all this work, and that may be the knottiest problem of all.

Comcast Joins ONOS/CORD: Why We Should Care a Lot

Comcast just joined the ONOS project, and I think that raises an important question about SDN, NFV, and the whole top-down or bottom-up model of transformation.  Couple that with the obvious fact that you read less about SDN and NFV these days, and you get a clear signal that something important might be happening.  Several things are, in fact.

For those who haven’t followed the ONOS and CORE projects (I blogged on the concept here), they’re a software-centric model for service provider evolution that presumes the future will be created by software hosted on generic virtualized servers.  CORE is the “Central Office Re-architected as a Data center”, and it’s a conceptual architecture that has been realized through the Open Network Operating System (ONOS).  What I liked about this approach is that it’s a kind of top-down vision-driven way of approaching transformation.  Your goal is to make your CO more data-center-centric, right?  Then CORD is clearly applicable.  Apparently to Comcast too, which raises a broad and a narrow point.

The broad point is the “why” of CORD overall.  Why is the CO being architected as a data center interesting?  Clearly, the superficial reason is that’s what people think they’re going to do, and the deeper question is why they think that.

There isn’t a single network market segment today that’s not already seeing “bit commoditization”.  Bandwidth isn’t intrinsically valuable to consumers or businesses—it’s a resource they can harness to do something that is valuable.  I’ve talked for years about the problem of the convergence of the price-per-bit and cost-per-bit curves.  The key now is to forget causes for the moment and focus on the facts:  This is happening everywhere and this is never going to reverse itself.  Transport, meaning connecting bandwidth between points, is not a growth market.  We all sort of know this because we all use the Internet for what’s there, not how we get it.

Which means, of course, that the profit of the future lies in providing stuff that people want to get to, not the means of getting to it.  OTT stuff, by definition, is the way of the future as much as transport is not.  The “top” that it’s “over” is transport networking.  So, what is an OTT’s “central office?”  Answer: A data center.  Google has a bunch of SDN layers, but they’re not to provide SDN services, they’re to connect Google data centers, and link the result onward to users, advertisers, and so forth.  In this light, CORD is just a response to a very clear market trend.

It’s a response in a terminological way, for sure, but how about realism?  Realistically, CORD is about virtualization at a level above what SDN and NFV describe.  You virtualize access, for example, in CORD.  You don’t get specific about how that’s done yet.  CORD also has an orchestration concept, and that concept is aimed at the same higher level, meaning that it’s above things like SDN and NFV.  But even higher than that is the simple truth that there are real network devices in data centers.  CORD isn’t trying to get rid of them, it’s trying to harness them in a way that submits to software automation of the lifecycle processes of the resources and the services built on them.

If I take a CORD-like approach to transformation, I might say to an operator CFO “Focus your transformation investment on software automation of service lifecycles and in the first year you can expect to obtain opex savings of about 2 cents per revenue dollar.  Focus the same on SDN/NFV transformation of infrastructure and in Year One your savings will be one tenth of that.  To achieve even that, you’ll spend 30 times as much.”  Even in 2020, the CORD-like approach would save more opex than SDN/NFV transformation would, and with the same efficiency of investment.

Which brings us to Comcast.  Do they advertise the beauty of DOCSIS and the elegance of a CATV cable, or do they push Xfinity, which is a platform, meaning it’s hosted software?  Even if you go to Comcast for Internet, you’re not going there for SDN or NFV.  You’d not see that level of transformation, but you already see (if you’re a Comcast customer) the transformation to a data-center-centric service vision.

Which raises the most interesting point of all.  If the transformation future of network operators is to look more like OTTs in their service formulation, and if re-architecting their COs to look like data centers is the high-level goal, then what role do SDN and NFV play?  Answer: Supporting roles.  SDN’s success so far has been almost entirely in the data center.  NFV is a partially operator-centric feature cloud strategy.  If my CO is a data center I can for sure connect things with SDN and host virtual functions.

Given this, is Comcast buying into a specific data-center-centric approach to future services by joining CORD/ONOS?  Or is it simply acknowledging that’s where they’re already committed to going, and looking for standards/community help along the way?  I think it’s the latter, and I think that’s a profound shift for network operators, equipment vendors, and those promoting infrastructure modernization as a step toward transformation.

Future service revenues will not come from tweaking the behaviors of connection services, but from the creation of an agile OTT platform.  That platform may then utilize connection services differently, but the platform transformation has to come first.  We have connectivity today, using legacy technologies.  We have successful OTTs today, depending for their business on that connectivity.  Operators who want the future to be bright have to shine an OTT light on it, not try to avoid OTT commitments by burying their heads in the sands of tweaking the present services.

And by “operators” here, I mean all operators.  If you run a network and provide connection services using fiber or copper, mobile or satellite, IP or Ethernet or maybe even TDM, then you have the same basic challenge of bandwidth commoditization.  The Financial Times ran a piece on this in the satellite space, for example, saying that if capacity was what the industry sold, then capacity demand was already outstripped by capacity supply even before a new generation of higher-capacity birds started to fly.

How do you meet that challenge?  You reduce current service cost and you chase new service revenues.  How do you do that?  You evolve from a business model of connecting stuff (which provably means you connect your OTT competitors to customers and disintermediate yourself) to being the stuff that users want to connect with.  Which is why CORD is important, and why Comcast’s support for it is also important.

Verizon’s SDN/NFV Architecture in Depth

I noted in my introductory blog in AT&T’s and Verizon’s SDN/NFV approach that Verizon has taken a totally different tack with its architecture.  Where AT&T is building open-source glue to bind its vendor-controlling D2 architecture, Verizon is defining an open architectural framework for vendor integration.  Standards from the ONF, TMF, and NFV ISGs fit deep in the AT&T ECOMP model, but the Verizon model is built around them.  That’s a critical difference to keep in mind as you read this.

Just as there’s a critical difference between the Verizon and AT&T models, there’s a critical commonality.  Both operators are saying that the current standards work isn’t going far enough fast enough.  In particular, both admit to the fact that the scope of the current work is too focused, so assuring broad-based benefits is nearly impossible.  So, in either architecture, standards are somehow supplemented to create something with enough functional breadth to be truly useful.

One obvious extension in the Verizon model is its functional heart, which is End-to-End (E2E) Orchestration.  Verizon builds E2E around the NFV Orchestration (usually called “MANO”) element, and from E2E it extends control downward in two forks—the SDN and NFV forks.  Network connectivity of all types is managed along the SDN path, with real devices (Physical Network Functions or PNFs in the Verizon diagram) and SDN-OpenFlow elements both under a set of SDN controllers.  On the NFV side, the structure is fairly standard, except for the critical point of SDN/NFV separation that we’ll get to.  There are also two “flanks”, one for FCAPS NMS and the other for OSS/BSS.

The way that SDN is handled is the most different thing about the Verizon approach.  Rather than proposing to have a single “Network-as-a-Service” model that then decomposes into vendor- or technology- or geographic-specific domains, Verizon has created three firm subdivisions—Access, WAN, and Data Center (DC) (along with an undefined “other”).  They appear to link the DC with the NFV elements only.

The classic interpretation of the NFV ISG model is that connection services are a subset of infrastructure services, meaning that they’d be expected to be supported by a (Virtual) Infrastructure Manager or VIM.  Verizon’s splitting of the data center connections off into the SDN control space firmly divides the SDN from the NFV, with cooperative behavior created above in the E2E function.  This somewhat mirrors the separation that AT&T proposes in its ECOMP model, where “Controllers” handle the cloud, the network, and applications.

The management flank is “Service Assurance”, and it consists of the traditional NMS FCAPS applications plus log management and correlation tools.  There are NMS links into both the SDN and NFV forks we’ve already described, and the links are both to E2E and to the lower forks, which implies a complex management relationship.  The OSS/BSS flank comprises connections to the OSS/BSS system from E2E and also from PNFs.  The “management” functions in the Verizon model are designed around the notion that function management is the same for either PNFs or VNFs.  Thus, you deploy a VNF using NFV tools, but you manage the functional aspects using a management toolset evolving from today’s EMSs to something like SDN control.

Verizon’s document starts its detailed description with the NFV Infrastructure (NFVI) element.  Verizon goes into great detail explaining the relationship between hardware elements (physical infrastructure) and software elements (virtual).  They also explain things like how a VIM “discovers” what its infrastructure is capable of, which is a nice advance in thinking from the ETSI starting point.  They do the same on the SDN side, including framing the future direction around intent-based interfaces.  All of this facilitates the interworking of components of the architecture with each other, critical if your intent is (as I think Verizon’s is) defining a framework in which vendor elements can be interworked confidently.

This is one area where Verizon’s document shines.  They’ve gone a long way toward defining the hardware for NFV, right down to CPU features, and they’ve also done well in defining how the physical infrastructure would have to be managed for consistency and reliability.  Every operator interested in carrier cloud should look at the document for this reason alone.

Another area where Verizon has the right approach is service modeling.  Verizon’s architecture shows a kind of succession of layers—service to functional to structural.  Each layer is governed by a model, and that allows vendors to incorporate model-driven deployment they may already have offered.  You can also model different layers in different ways, or even use two different models in the same layer.  YANG, for example, is a good architecture to model real network configurations, but I firmly believe that TOSCA is better for cloud deployments and functional/service-layer stuff.

As always, there are issues, and the big one here starts with the goal (can you hope to define an open architecture for vendors, and if so can you move the ball relative to the standards groups) and moves to some details.  I think that two issues are paramount.,

One area I think may pose a problem is the lack of specific support for multiple Infrastructure Managers, virtual or otherwise.  The biggest risk of lock-in in NFV comes because a vendor provides a VIM/IM for its own gear to the exclusion of all other gear.  If multiple VIM/IMs are allowed that’s not a major problem, but clearly it’s a killer if you can have only one VIM/IM in the architecture and several (incompatible) vendors want to be it!

In both my CloudNFV architecture and my ExperiaSphere architecture, I proposed that the equivalent of the VIM/IM be explicitly referenced in the model element that connects to the infrastructure.  That would allow any suitable VIM/IM implementation to be used, no matter how many, but it does require that the E2E model have the ability to include the specific reference, which Verizon says it doesn’t do.  I think they’ll need to fix this.

My other area of concern is the VNF Manager.  Verizon has retained the ETSI approach, which defines both “generic” VNFMs that can support multiple VNFs, and Specific VNFMs (S-VNFMs) that are specific to a given VNF.  I’ve cataloged all my concerns about this approach in previous blogs, and those interested can use the Search function on my blog page to find them.  For now, let me just say that if you don’t have a standardized way of managing all VNFs, you’ll end up with a significant onboarding issue, which is where we are now with the ISG model.

Part of the VNFM issue, I think, arises from a bit of vCPE myopia on Verizon’s part.  Yes, Verizon has the geography where vCPE is most likely to deploy (they have over three times the opportunity that AT&T has in its own home territory, for example).  However, Verizon’s customers are also long-standing users of business WAN services, and its therefore less likely that they 1) need a managed service approach and 2) don’t already have a solution for it if they do.  The focus on NFV in the model, and then on vCPE as the NFV application of choice, could fall short of justifying a major NFV commitment, which would make the architecture moot.

I think it’s clear from the Verizon material that the goal is to guide vendor implementations of what’s supposed to be an open architecture.  Candidly, I think this is going to be a hard road to travel for Verizon.  First, it’s far from clear that vendors are interested in an open approach.  Second, once you get outside the very limited boundaries of SDN and NFV standards, there’s nothing to guide an open model or even to pick a specific approach.  Verizon’s architecture identifies a lot of things that are critical, essential, to a business case.  The problem is that they don’t define them in detail, and so implementations in these extended areas have no reference to guide converging approaches.

Whether this will work depends on the vendors.  Those same vendors, it must be said, who have not stepped up to NFV, in no small part because most of them either see a small reward at the end of a long road, or no reward at all.  Creating a framework for vendor participation does little good if they don’t want to participate, and whether they do is still an open question.  That question, if answered in the affirmative, will only expose another question—whether the Verizon framework delivers on its technical mission, and there I have questions of my own.  The dependence on the formal standards that Verizon has created is risky when those standards don’t cover enough to make the business case.  Will Verizon fix that?  I don’t know.