Could a “Digital Twin” Model of a Network Help with NMS/NOC? – Welcome to CIMI Corporation's Public Blog

The hierarchy/intent modeling approach I’ve blogged about, similar to the TMF SID, seems to serve the mission of service management automation well. It also seems possible to use a similar modeling technique to represent real-world “digital twin” relationships, and (finally) it seems possible to use a digital-twinning approach that represents features rather than real-world elements to implement service management. That sums up where we’ve gone with modeling so far. Now, I want to conclude the series by looking at the application of digital twin modeling to network management.

Classical network management, meaning the OSI model of network management, presumes we have three layers. At the top is “service management”, which we’ve aligned with the hierarchy/intent modeling approach. Next we have “network management”, which is aimed at managing systems of devices. I’d contend that this is something that could be served by the hierarchy/intent approach or by digital twin modeling of the administrative interfaces that represent those systems of devices. It’s the bottom layer, “element management” that we have to look more closely at.

A network, or a data center, are two interdependent things, as the lower two layers of the OSI management model suggest. One thing is a system of real devices (routers in the case of networks, servers and switches in the case of data centers, for example) and the other is a cooperative collection of resources. If we’re representing the real world, which is the case for the lower element management layer, then we’re implying a digital twin model. If we’re representing the network management layer, we have a choice of using a digital twin or a hierarchy/intent approach. Given that, I’m going to break from my tradition of top-down and start with the bottom or element management layer.

If you can model a metaverse or IoT with a digital-twin model, you could surely model a network. Each device would be represented by an object that had interfaces, properties, parameters, and so forth, and the relationship between objects would be represented by interface bindings that would map to the trunk connections. There’s no question that you could use the model to query device status, change parameters, and so forth. The question is whether that would be valuable, neutral, or risky, and the answer to that depends on just what the “devices” are and whether they are members of a higher-level cooperative group like “a network”.

Routers exhibit “adaptive” behavior, as do most switches. They interact with each other to define the way that “the network” behaves, and while they assert management APIs and report status, their normal operation is largely autonomous. In a sense, “element management” for the devices is a management process that parallels or repairs normal operations. However, MPLS supports explicit routing, and a digital-twin model of the router network could be helpful there, and nearly all network operators use MPLS.

If we were to shift focus from “router” to “SDN switch” it’s a different story completely. An SDN switch does not have adaptive behavior, it depends on a central management process to provide it with operating data, including the routing tables used in “normal” operation. There’s a presumption of a central controller, and since that controller manages the routes, it would be logical to assume that knowing the state of the devices and trunks would be helpful.

So far, IMHO, we can then say that for SDN devices and router networks that supported explicit routing, digital-twin modeling could be valuable. For (largely enterprise) router networks without MPLS explicit routing, then, the question would be whether it would be useful or risky.

The risk in a digital-twin model to control a router network is, ironically, that it would encourage something to exercise control over a network behavior that’s supposed to be adaptive. There is an easy solution to this, though; don’t provide a mechanism to alter things that are supposed to be adaptive. Yes, it would be possible to indirectly alter adaptive behavior by, for example, disabling an interface or the device, but that risk exists with any management interface that exposes those capabilities, directly or indirectly. We can discount any incremental risk, then.

But lack of risk doesn’t constitute a benefit. We can assume that the network, at least, would have a management system that supported device alarms, so having the ability to generate those alarms off a digital-twin model doesn’t add much, if anything. Can we identify anything interesting we could do with the digital-twin model? Yes, but not much.

The most obvious possible benefit of a digital-twin model is to provide an abstraction layer that could support a mixture of SDN and adaptive routing within a single network administration. Digital twinning, as I’ve said, would be a natural partner for an SDN controller, since that controller is responsible for route management in the network. SDN is likely to have to phase in or be a part of a network, rather than to be everything, as Google’s Andromeda illustrates. We could see “virtual” elements of the digital-twin model representing router enclaves, and “real” elements representing the SDN switches.

A related benefit would be the creation of an abstraction layer for EMS/NMS processes to work through. If we think of a router network as being a resource enclave, similar to a cloud resource pool, we know from experience with the latter that “virtualization” would normally involve creating an abstraction (a “virtual machine” is one in the cloud/server space) that would then be mapped to the underlying resources. Could we view this as the bottom layer of service lifecycle management? In any event, a standard abstraction layer could allow a single NMS toolkit to work with every vendor, every device.

The final benefit, I think, could depend on the extent to which the first two benefits are considered significant. If we did deploy a digital-twin model as an abstraction layer, might we then unify the “network” management of mixed router/virtual-function networks. I think that would be an almost-inevitable outcome if we actually thought about network-resource abstraction through digital twinning, but I’m not sure whether the notion would arise based on this point alone. NFV has failed to gain broad traction inside a network, and only limited traction (via uCPE) even at the edge. 5G function hosting may well be too localized to present much of a challenge in mixed-network operations.

I think that the role of a digital-twin model in NMS could be justified in theory, but may be difficult to develop in practice. Multi-vendor abstraction missions are never popular with vendors who want to be the only player in a given network. SDN, despite its proven applications in the core network (by Google) hasn’t advanced to broadly support a core mission yet, and it may do so only if we end up with metro-mesh metaverse networking down the line. Networks today are massive sunk costs, and buyers are as reluctant to threaten them with new ideas as the vendors are to open them up. We’ll have to wait to see whether other forces move the needle on this topic!