Making Virtual Function Code Portable

How do we create portable code in hosted-function services for the network of the future? There are a lot of things involved in making virtual-function code portable, and I’m going to use the term “code” in this blog to indicate any software that’s designed to create network features and that isn’t a part of a fixed appliance with dedicated hardware/software. Most operators are looking to ensure that their hosted code can take full advantage of resource economy of scale (meaning a “cloud” or “pool”) and most also are hoping to standardize operations. It’s those two goals we’ll look at here.

Nobody should ever plan for shared-resource code without first looking at what’s going on in the cloud. Right now, there are four basic “cloud architectures” that define hosting at the highest level. They are bare metal, virtual machines, containers, and functions/lambdas. We’ll start by looking at each, and how they relate to the issue of resource efficiency.

Obviously, “bare metal” means that code is hosted on either directly on a server, or on a white-box appliance. The benefit of this option is that it gives your code exclusive use of the resource, so sharing isn’t an objective of this model. Instead, what you’re looking for is the standardization of the hardware/software relationship, meaning that your software is portable across as many of the bare-metal options you plan as it can be.

Because bare metal means nothing is there, all the “platform software”, meaning the operating system, systems software, and middleware, have to be loaded and configured, and so does each application.

Virtual machines are shared-resource strategies, and they form the basis for the public cloud IaaS services. VMs are created by hypervisors that partition the hardware into what looks, even to platform software, like a set of independent computers/devices. You have to maintain the hardware and hypervisor as a collective resource, but each VM requires the same platform software as bare metal elements. Unlike bare metal, VMs share the actual resources, and so there may be crosstalk or other impacts between VMs that could affect operations.

Containers are the current rage for the cloud. Unlike VMs, which are created by a hypervisor, containers are a feature of the operating system, so they’re not quite as fully isolated from each other as VMs would be. With containers, you deploy an application (in this case, a service) by deploying all of its components in pre-packaged form. The packages have everything needed to load and run, and so deployment is fairly simple.

Functions (also called “lambdas”; AWS’s function-hosting is called “Lambda”) are snippets of code designed so that the outputs are a function only of the inputs. Since nothing is stored, a function is portable as long as its code can be run on the platform options. However, lambdas operate inside a special environment that links events to the deployment and execution of lambda code, and this environment is less likely to be portable.

OK, those are the architecture options. There’s also a question of public versus private hosting for each of them, which we’ll now summarize.

Bare metal servers can be obtained as an option from some cloud providers and also from interconnect companies, and white-box appliances are broadly available. This architecture is therefore the most generalized, but because there is no “platform software” included, everything has to be installed for the function code to work. There’s also no resource sharing of a server/device unless it’s provided by the platform software an operator would install. This is therefore a small step away from simply using your own data center, and it’s going to be appropriate mostly at the very edge of the network, or where operators do intend to deploy their own carrier cloud.

Virtual-machine architectures can be hosted on bare metal, including in your data center, in appliances, and on the public cloud. Because the platform software that provides personality to the deployment is provided by the user, and because resource sharing is provided by the VM hypervisor, this is the second-most-generalized hosting model. However, since public cloud providers and appliance vendors may offer some of their extension tools, it’s important to avoid their use if you really want code portability.

Containers can be hosted on bare metal (including appliances), on VMs, on IaaS public clouds, or on managed container public cloud services. The first three options will require the operator provide platform software to do the container hosting, and if those options are linked in any combination and the same software is used for each hosting platform, the same container hosting and management features will prevail. I’m telling operators to avoid using managed container services where possible, in favor of deploying their own container software even in the cloud, unless they don’t intend to use containers anywhere except the cloud.

Functions can be considered like containers, but with an additional layer to deploy and run a function triggered by an event. There is often considerable latency associated with function loading and execution, however, and I’m of the view that operators should avoid functions unless they have significant in-house function development skills.

Now for some best practices. First, unless it’s totally impossible to do so, any hosting environment that an operator creates, for any of the architectures, should be based on the same platform software. The same operating system, system software, and middleware should be used for everything, everywhere. If you do this, then the “platform maintenance” tasks will be the same, you can use the same tools, and consistency is the mother of efficiency.

Second, there are three possible layers of software in a virtual-function infrastructure; the hardware layer, the platform layer, and the functional layer. Your management practices have to deal with all these layers. Hardware management applies to all bare metal, VM, and IaaS cloud implementations, and this is where configuration management to run the (hopefully) common platform software is involved. Platform management is the loading and maintenance of the platform software, and the functional layer is the management of the functions themselves. If you follow the one-platform rule, most of the first two layers of management are straightforward. If not, try to find a management tool kit that works for all your platform options. Look at DevOps tools and “infrastructure-as-code” tools for these lower layers.

Third, your biggest management challenge will come in the functional layer, because here you’re almost surely going to have to deal not only with multiple sources of functions, but also with multiple mechanisms for stitching functions together to create services and managing the outcome.

Let’s close by looking harder at this last point. Virtual function software (not necessarily just the special case of NFV) needs to obey some standard set of interfaces (APIs) to connect them into a cooperative system that we’d call a “virtualized service”, and the software may also have to interwork with legacy devices. There has to be both a source of specifications for those interfaces, and a mechanism to do the stitching. Finally, you have to manage the results.

There are currently three possible approaches to this. First, there are network-specific sources. NFV defines both a specification set and a management strategy, and ONAP has a broader model. Second, cloud practices create their own approach, which is more generalized and has more possible sources of specifications (including write-your-own). Finally, you have international standards like O-RAN or OASIS TOSCA that can be applied.

The best approach to navigate this combination, in my view, is to forget both NFV and ONAP and try to use a combination of cloud practices and international standards. For example, what I’d personally love to see is an OASIS model for O-RAN services, and I think operators who want to deploy open-model 5G should assume they’ll use TOSCA and cloud tools to deploy against O-RAN and the eventual open-model 5G Core specification.

It seems to me that operators should play a greater role in this process. Cloud-native implementation of O-RAN and 5G Core are in their interests, and so is a standard way of defining the service lifecycle, which TOSCA could provide. This would be a very good way, and time, to merge the cloud and the network, in a more systematic way.