Networking the Fog

I blogged yesterday about the economics-driven transformation of networking from the data center out.  My point was that as operators drive toward greater profits, they first concentrate on higher-layer content and cloud-hosted services, then concentrate these service assets in the metro areas, near the edge.  This creates a kind of metro-virtual-data-center structure that users connect to via access lines, and that connect to each other with metro links.  Traditional hub-and-spoke networking is phased out.

What, in detail, is phased in?  In particular, how are services and service management practices impacted by this model?  If network transformation is going to be driven by economics, remember that saving on opex is the lowest apple available to operators.  How would the metro model impact service automation targeting opex?

Going back to the primary principle for fog-distributed data centers and DCI, you’d want to oversupply your links to eliminate the risk of reducing the size of your on-call resource pool because some resources would be accessible only through congested links.  This concept would have a pretty significant impact on resource management and service management practices.

One of the high-level impacts is on equalizing the resource pool.  Both SDN and NFV have to allocate bandwidth and NFV also has to allocate hosting resources.  Allocation is easy when resources are readily available, and that means that you can grab almost anything that’s preferred for some reason other than residual capacity and know that capacity won’t interfere with your choice.  Want to avoid power-substation dependency?  No problem.  Want data center redundancy in hosting?  No problem.  NFV benefits from having everything in the fog look equivalent in terms of delay and packet loss, which makes allocating fog resources as easy as allocating those in a single central data center.

The next layer down, impact-wise, is in broad management practices.  There are two general models of management—one says that you commit resources to a service and manage services and resources in association and the other that you commit a pool to a service, then presume that the pool will satisfy service needs as long as the resources in the pool are functional.  With virtualization, it’s far easier to do service management using the latter approach.  Not only is it unnecessary to build management bridges between resource status and service status, you can use resources that aren’t expected to be a part of the service at all without confusing the user.  Servers, for example, are not normal parts of firewalls, but they would be with NFV.

With fog-distributed resource pools, you’d want to do capacity planning to size your pools and network connections.  You’d then manage the resources in their own terms, with technology-specific element management.  You’d use analytics to build a picture of the overall traffic state, and compare this with your “capacity-planned” state. If the two were in correspondence, you’d assume services are fine.  If not, you’d have some orchestration-controlled response to move from the “real” state to one that’s at least acceptable.  Think “failure modes”.

If we expand from “management” to “management and orchestration” or “lifecycle management”, the impact could be even more profound.  For SDN, for example, you could use the analytics-derived picture of network traffic state to select optimum routes.  This is important because you want to have a holistic view of resources to pick a route, not find out about a problem with one when you’ve committed to it.  The SDN Controller, equipped with the right analytics model, can make good choices based on a database dip, not by running out and getting real-time status.

For NFV this is even more critical.  First, the “best” or “lowest-cost” place to site a VNF will depend not only on the hosting conditions but on the network status for the paths available to connect that VNF to the rest of the service.  The NFV ISG has wrestled with the notion of having a request for a resource made “conditionally”, meaning that MANO could ask for the status of something and deploy conditionally based on it.  That’s a significant issue if you’re talking about “real-time specific” status, because of course there’s a risk that a race condition would develop among resource requests.  If we assume overprovisioning, then the condition is so unlikely to arise that an exception could be treated as an error.

The evolutionary issues associated with SDN and NFV deployment are also mitigated.  Migrating a VNF from virtual CPE on the customer prem to a fog-edge cloud point is going to have a lot less impact on the service than migrating to a central cloud that might take a half-dozen hops to reach.  For SDN, having edge hosts means you have a close place to put controllers, which reduces the risk of having network state issues occur because of telemetry delays.

Even regulatory problems could be eased.  The general rule that regulators have followed is that paths inside higher-layer services are exempt from most regulations.  That includes CDN and cloud facilities, so our fog-net connectivity would be exempt from neutrality and sharing requirements.  Once you hop off the access network and enter the fog, the services you obtain from it could be connected to your logical access edge point with minimal regulatory interference.

The primary risk to this happy story is the difficulty in getting to the fog-distributed state, for which there’s both an economic dimension and a technical one.  On the economic side, the challenge is to manage the distributed deployment without breaking the bank on what operators call “first cost”, the investment needed just to get to a break-even run rate.  On the technical side, it’s how to transition to the new architecture without breaking the old one.  Let’s start here, because the technical dimension impacts costs.

The logical way to evolve to a fog model is to deploy a small number of servers in every edge office (to reprise the operator remark, “Everywhere we have real estate.”) and connect these with metro fiber.  Many paths are already available, but you’d still have to run enough to ensure that no office was more than a single hop from any other.  Here’s where the real mission of hyperconverged data centers comes in; you need to be able to tuck away these early deployments because they won’t displace anything in the way of existing equipment.

The CORD model, standing for Central Office Re-architected as a Data Center, does a fairly nice job of presenting what this would look like, but the weakness is that it doesn’t focus on “new” revenue or benefits enough.  I think it’s most likely that the evolution to a fog-distributed metro model would come about through 5G deployment, because that will require a lot of fiber and it’s already largely budgeted by operators.  But even if we figure out a way to get fog deployed, we still have to make money on it, which means finding service and cloud-computing missions for the new fog.

All we need to do here is get started, I think.  The operational and agility benefits of the fog model could be realized pretty easily if we had the model in place.  But the hardest thing to prove is the benefit of a fully distributed infrastructure, because getting that full distribution represents a large cost.  The rewards would be considerable, though.  I expect that this might happen most easily in Europe, where demand density is high, or in Verizon’s territory here in the US where it’s high by US standards.  I also think we’ll probably see some progress on this next year.  If so, it will be the first real indicator that there will be a truly different network in our futures.