A Carrier’s Practical View of SDN

Yesterday I talked about the views of a particular operator on NFV trials and evolution, based on a conversation with a very knowledgeable tech guru there.  That same guru is heavily involved in SDN evolution and it’s worthwhile to explore the operator’s SDN progress and directions.

A good place to start is with the focus of SDN interest, and where the operator thinks SDN trials and testing have to be concentrated.  According to this operator, metro, mobile, and content delivery are the sweet spots in the near term.  It’s not that they don’t believe in SDN in the data center or SDN in the cloud or in NFV, but that these applications are less immediately critical and offer less potential benefit.  In the case of data center SDN, obviously, the drive would depend on a large enough data center build-out to justify it, so it’s contingent on cloud and NFV deployment.

The issue the operator wants to address in the metro is that metro networks are in general aggregation networks and not connection networks, but we build them with connection network architectures.  Metro users are connected not to each other, directly, but to points of service where user experiences (including messaging or calling) are provided.  One logical question, asked by my contact here, is “What is the optimum connection architecture for aggregation in the metro?”  Obviously that will be different for residential wireline, wireless backhaul, and CDNs.  With SDN they should be able to create it.

For residential wireline networks, for example, the operator is very interested in using SDN as a means of managing low-layer virtual pipes that groom agile optics bandwidth.  One obvious question is whether emerging SDN-optical standards have any utility, and the operator thinks that will depend on the nature of top-layer management.  “Logically we’d probably control each layer separately, with the needs of the higher layer driving the commitments of the layer below.  But what if there is no top-layer management?”  The operator sees having an SDN controller do everything as a fall-back position should there be no manager-of-managers or policy feedback to link optical and electrical provisioning.

Even here, the operator is changing their view.  At one time they believed that it was essential for optical equipment to understand the ONF OpenFlow-for-optics spec, but now they’re increasingly of the view that having OpenDaylight speak a more convenient optical-control language out of one plugin and OpenFlow out of another would be a more logical approach.

Mobile SDN, as I’ve said in other blogs, seems to cry out for the notion of a new SDN-based service model that would through forwarding control create the agile path from the PGW to the cell where the user is currently located.  But the operator would also like to see some thinking around whether mobile Internet and content in particular don’t suggest a completely different model for forwarding everything to mobile users.  “Why couldn’t I make every mobile user a kind of personal-area network and direct traffic into that network from cache points, gateways, whatever?  We need some outside the box thinking here.”

This particular point raises a question for SDN management, the one that’s the most interesting to this particular operator.  If a collection of devices is designed to provide a well-known service like Ethernet, we have established FCAPS practices that we can draw on, based on the well-understood presumptions of correct behavior and established standards.  How do you represent something that isn’t a well-known service?  What would the “management formula” for it be?  According to my contact here, the utility of SDN may depend on the question of how management interplays with controller behavior when you create something new and different.

Management in SDN is an issue in any event, and at many levels.  First, while it is true that central control of forwarding can create a “service”, can that central point provide a management view?  Obviously what the controller thinks the state of the nodes and paths in an OpenFlow network should be doesn’t mean that the real world conforms to that view.  In fact, if we could assume that sort of thing we’d have declared “management by endorsement” the right answer to all our problems ages ago.  But what is the state of a node?  Absent adaptive behavior on a nodal level, what happens when a node fails?  If the adjacent nodes “see” their trunk to the failed node having dropped, they could poison all the forwarding entries to that trunk, in which case the controller would presumably get route requests for the packets the impacted rules had forwarded.  But will it?  Is there still a path to the controller?  And how about the state of the hardware itself?  Don’t we need to read device MIBs?  If we do, how is the state of a node correlated with the state of a service?

The second level is representing service-independent devices in a service-driven management model where we expect Ethernet services to be built using gadgets that have Ethernet MIBs.  Here’s a specific question from the operator:  Assume that you have a set of white-boxes providing Ethernet and IP forwarding at the same time, for a number of VPN and VLAN services.  These boxes have to look like something to a service management system, so what do they look like?  Is every box both a router and a switch depending on who’s looking?  Is there a big virtual router and switch instance created to manage?  If so, who creates it and parses out the commands that manage it?

This particular operator ran into these questions when considering the question of how NFV would see, use, or create SDN services.  Look at a service chain as an example.  In “virtual terms” it’s a string of pearls, a linear threading of processes by connections.  But what connections in particular?  How does NFV “know” what the process elements in the service chain expect to see in the way of connectivity.  The software has to be written to some communications API, which presumes some communication service below.  What is it?  A “logical string of pearls” might be three processes in an IP subnet or linked with GRE tunnels, or whatever.  How do we describe to NFV what the processes need so we can set them up, and how do we combine the needs of the processes with the actual infrastructure available for connecting them to come up with specific provisioning commands?  And remember, if we say that a given MANO “script” has all the necessary details in it, then how do we make that script portable across different parts of the network, different vendors?

Metro missions seem to dodge many of these issues because the metro network is already kind of invisible in residential broadband, mobile, and CDN applications.  Progress there, this operator hopes, might answer some of the questions that could delay other SDN missions, and management hopes that progress will come—not only from their efforts but from trials and deployments of other operators.  I hope so too.