Are We Thinking too Small on Network Services and Models?

For decades, it’s been clear to many that the way we use addresses on the Internet and in other IP networks may be less than ideal. An address represents a network service access point, but we tend to use it (or want to) as a reference to an application, user, or database. In the real world, all of these things may be lightly bound to an NSAP, but in reality there are factors that make that binding unreliable. Mobile networks in particular have had to deal with this, but there have been attempts to decouple the what from the where in addressing terms, so we could reference what we want rather than figuring out where it is. Events may be exposing some options not regularly considered.

In my last blog, I noted that SDN and mobile network evolution were pushing an evolved sense of how networks are built. A couple of Light Reading pieces (THIS and THIS) made me think about whether there could or maybe even should be a link between SDN and mobile networks, and whether developing this link is a step in network and cloud evolution that we’re missing. I also wonder whether the linkage, and the means whereby we could create and support it, is an argument for greater reliance on virtual networks.

The thing that makes mobile networks different is (no surprise) mobility. A mobile network is a network that supports devices that move around, and that movement means that the route to reach a given device is likely to change over time because the device’s location has changed. Our convention is that the device’s address has not changed, which means that traditional IP routing is likely to eventually send packets to the wrong place as the device moves about.

The traditional approach to this, codified in the IP Multimedia Subsystem (IMS), is to have packets destined for a mobile device to be put into a tunnel at the point of entry into the mobile network. That tunnel’s other end is then shifted about by mobility management technology to stay linked to the path that’s associated with the cell in which the mobile device is currently served. This keeps any sessions (voice or data) intact, but obviously it’s a complicated strategy.

Another complication is the fact that mobile devices today are almost all broadband-capable and could support voice calls and texting over broadband. VoLTE is an example of this, a way of using data connectivity to create traditional telephone connectivity. However, once you start using data to carry calls and texts, you raise the question of why those calls and texts couldn’t be initiated and received via WiFi, and why we couldn’t perhaps see calls transferred from a mobile device to something else in the home or workplace, if a smartphone user enters a space where other options are presented. This idea was explored decades ago by Project Icarus in Europe.

All of this raises two questions. First, should all “network services” be viewed as overlays on the things we call “networks” today? Should they be deliverable over any transport facility that can offer the SLA, be able to cross between compatible facilities, and even be independent of endpoint specificity? In other words, should services create their own virtual networks? Second, should we be rethinking how we build transport networks based on the assumption that we answer the first question in the affirmative?

I’ve long proposed that network services should connect virtual agents that reside in the network, to which we then connect based on whatever devices are most convenient. Tom, as a real person, contacts Virtual-Tom and asks for a connection with John. He’s connected with Virtual-John, which then signals the real John via the device(s) defined in John’s policies. This would allow services to be decoupled from device specificity, allow for seamless switching between devices, and so forth. Project Icarus reborn, perhaps. We can already see some of this emerging with features that let smartphone calls ring on computer systems, and let people make calls over their cellular service from their computer.

Could something similar to this be used to support mobility? Let’s suppose that Real-Tom is driving along with his smartphone, in a coverage area we’ll call Cell A. Virtual-Tom is living somewhere in the network, always accessible via some stable address or URL. If Real-Tom initiates a call, the network simply connects him to Virtual-Tom as Real-Tom-Cell-A, and makes the call, so Real-Tom is connected. Now, still driving, he moves into Cell B. Real-Tom connects to Virtual-Tom as Real-Tom-Cell-B and Virtual-Tom simply connects that new “device” into the call.

One issue with this approach relates to situations where Real-Tom Is traveling a long distance. At some point, the connection between Real-Tom and Virtual-Tom may be unwieldy. But let’s suppose that Virtual-Tom itself can be moved around, using the same approach. Virtual-Tom decides the connections to Real-Tom are getting problematic, so it asks the Virtual-Manager to rehost it in a better location. The Virtual-Manager could then, for example, reset the DNS that decodes the Virtual-Tom URL to that new location.

Or, perhaps, all the Virtual-People are given addresses in a special address space, one controlled by SDN. If Virtual-Tom moves, then the SDN controller gets an indicator from Virtual-Manager and rejiggles the route to Virtual-Tom. No matter how far Real-Tom travels, we can place Virtual-Tom optimally, and get an optimum route to both Toms.

You can see that if services are created as an overlay, a lot of connection responsibility and power is ceded to the overlay technology. Our Virtual-agent could be hosted, for example, in a metro center and moved to another only if we actually moved a significant distance. The Real-person would connect to the Virtual-agent on demand from any set of suitable devices, but services would see only the Virtual-agent, which means service connectivity could stop at the metro area. Would this enable us to rethink the underlying network model? That underlay only has to see metro areas, because Virtual-Agents can only be hosted there. If metro connectivity is all that’s needed, then we could create MPLS tunnels or SDN routes to metro areas and connect to our agents there.

Before we toss our current models, though, I want to reiterate that this is a presumed approach for service networks. We’d have to adapt it to the Internet and other services, but I want to make a point I’ve made earlier, which is that service networks can be virtual networks. In fact, I think that’s the direction we’re ultimately going to have to go. The same model could work for the Internet on the user side, but the relationship between Virtual-agent elements and the current Internet would be the same as it is now. The Virtual-agents would be an intermediary function, perhaps no more than a routing point.

All network services don’t have the same rules. Up to now, we’ve been trying to bend all services to one rule-set, that of IP and the Internet, because we wanted a common network technology that can support everything. We should want that, but we don’t have to surrender service-specific network features to get it. It’s one of the benefits of virtual networks, and one we should be working to explore and exploit.

Virtual Networking: The Missing Piece of Network Evolution

There is probably no issue more important to the future of networking than the union of networks and hosting. It may seem that’s a simple one; we have the Internet and the stuff that’s on it (including the cloud) and the stuff that’s in it are working globally at significant scale. In fact, though, that’s one of the things that makes the problem a lot less than simple.

Networks, including the Internet, have an “outside” and an “inside”. In the old days of TCP/IP, when the Internet was young and people could assume everyone was “of good will”, these two dimensions were really one. You could address routers as easily as addressing users or servers. Over time, it became clear that everyone wasn’t of good will, and we started to see protective barriers erected between components of Internet infrastructure and the “service layer” of the Internet, the public or outside space. Companies who built their own IP networks with trunks and private routers shifted to virtual private network services, and that created another layer.

The virtual private network or VPN is a significant step, for reasons beyond the business level, because it formally introduces the notion of virtual networks. A “real” network is made up of routers and trunks, but a virtual network is some sort of overlay on that real network, an overlay that creates a semi-autonomous community of connectivity. Virtual networks proved critical in cloud computing because they allow a cloud provider to create tenant networks that are independent of each other, something that’s critical for security.

Security is a key point here in our fusion of networking and hosting. If we have a “virtual network function” or VNF that’s part of Service A, we not only have to ensure that Service B can’t access it, but also that Service B has only minimal ability to impact it. If our VNFs are hosted on a pool of resources, we also have to ask whether the deployment of VNFs can be allowed to “see” those resources, because if they can they can impact all services.

What we end up with, addressing these constraints, is a model of networking that has three logical layers. The top layer is the service layer, which is what users of the network see. The bottom layer is the network that controls the pool of resources on which functions are hosted, and the middle layer is a tenant-service-specific layer that contains pool resources that are assigned to a specific service mission. All these network layers are virtual, all are independent. When you decide that you’re going to build a service from virtual functions, you need to have those functions hosted at the bottom and organized in the middle.

Where virtual functions represent what are essentially on-network features, such as the layer that mobile networks call the “control plane” (in contrast to the “user plane” which is the entire IP stack), the network itself treats them as users rather than service components, and that makes things pretty easy. The problem is more complex when the VNFs are actually service elements, meaning that they are virtual forms of real, traditional, network devices.

A VNF in that mission lives in three worlds. First, it lives (at least potentially) in the service network, because it often presents an interface that’s addressable by service users. Second, it lives at the server pool level, because it runs on a server, VM, or container host and has to be deployed and managed there. Third, it lives in that elusive middle virtual layer, because its relationship to other elements has to be separated in network access terms, from what’s above and what’s below. How this happens could vary, but for simplicity let’s abstract the approaches into a single model.

You start with a server resource pool, and that pool has a set of network addresses that form our bottom virtual network. That virtual network is accessible only to the people running the server farm. Its management properties are invisible to the layers above. This pool might be contained in a single data center, or multiple data centers, but its structure is visible only to that lowest virtual network.

When you want to create services, you deploy, on that lowest virtual network, a series of software components (“VN adapters”) that have the ability to support multi-tenant middle-layer virtual networks. Elements of each of the tenant-virtual-networks are deployed on the bottom layer and their connectivity is confined within the tenant network. That network cannot “see” the bottom layer, nor can one tenant see the other. Furthermore, it is totally invisible at the service network level. Connections can be created on the tenant network to link the VNFs deployed within it.

The final step here is to expose some interfaces of the tenant network to the service network. This is essentially another of those VN adapters, one that provides an address proxy and other features as needed to present the interface to the service network. It can then see these interfaces, but no others.

Each of the virtual networks could have its own protocols and rules, but in nearly all cases these networks will be IP-based. They will have an address space, and other than the service network which almost surely will use public, assigned, IP addresses as the Internet does, they could use either a public or private IP address space. If dynamic address assignment is needed, they would have to contain a DHCP server, and if they wanted to decode URLs they’d need a DNS server as well. In addition, the VN adapters might have to spoof some IP control packets; you don’t want a ping or traceroute to find real components, and in any event they would likely not be addressable.

OK, this is an abstract model, but what might a real implementation look like? That’s where things get potentially more complicated, but I think we could state a principle here that could lead to simplification. The architecture for deploying virtual functions should be the same as that used to deploy application components in the cloud. The NFV ISG process of 2013 made a fundamental mistake by not linking NFV to the main thrust of cloud computing. Because some virtual functions (mobile network control plane, as previously noted) are “on the network” like any cloud application, we could expect them to deploy that way. Because edge computing is likely to generate new services and features that will include what are really application components, it should use the cloud model too.

The prevailing cloud technology for deployment and redeployment is containers and Kubernetes. Kubernetes has the virtual networking capabilities (via virtual network add-on support) I cited here in my abstract model. Logically it should be the choice for the framework that we use to bind networks and hosting for the creation of VNF-based service features. This, of course, is the goal of the Nephio project that Google launched, and one I hope will define the space in the future.

We can’t just wave our arms and declare the problem solved by Kubernetes or even by Nephio. To address the whole picture, we have to step beyond, or around, the cloud. There are other issues to be considered, and I’ll look at them in other blogs down the line.

How Hybrid Cloud Thinking Can Lead Toward (or Away) From Edge Computing

We live in a polarized world, as a half-hour spent watching any news channel will show. It goes beyond politics, though. Even in technology, we tend to see things in an either/or way. Take the cloud and the data center. Many believe that the data center is the past and the cloud the future. Even “moderates” see the two as very different computing frameworks. Are they, though? Not so much, and where this polarizing tendency hurts us the most is in the way it frames our vision of data center evolution, and how that evolution shapes the future of edge computing.

The data center, historically, is both a place where corporate data is stored and the place where “core business” applications are run. These applications are primarily transactional in nature, meaning that they update and access that corporate data to reflect the operation of the business. In most cases, data centers are located proximate to centers of operation, places where many employees are located. Think “corporate headquarters” or “regional center” and you get the idea.

The computer technologies associated with the data center have been evolving since the 1960s when “mainframe” computing systems (like the IBM 360) came along. Early data centers were based on a small number of giant systems that cost millions of dollars. These used “multiprogramming” to share compute power across a range of concurrently running applications. Over time, these were supplemented with or replaced by “minicomputers” that were often more application-specific, and later by racks of servers that formed a pool of resources. Generally, each generation of computer technology was adopted for applications that were either new, or for applications that had undergone modernization or seen a shift in application software vendor.

The cloud is the latest step in data center evolution, a step that offloads “applications” to a public resource pool that’s distributed over a wide geographic area. I put “application” in quotes because what’s usually done today is to create a more agile front-end in the cloud to augment traditional transaction processing and database access still done primarily in the data center.

The cloud is an elastic tail connected to a high-inertia dog, at least in those cases where it front-ends traditional transactional applications. However, as I’ve noted, some applications have shifted to a more modern server-based hosting model, and many of these have evolved to use virtual machines, containers, etc. They still tend to be tied to the data center because, well, that’s where the data is. Even analytics and AI, which are often run on servers using modern virtualization technology, tend to run in the data center because of the data, and also because “headquarters” people usually run these applications and they’re co-located with the data center.

Some of the modern planning types have started to look at the virtualization-based elements of the data center as a kind of third layer, something between data center and cloud. It’s this new layer and its position that’s creating the potential for another shift in data center thinking, something designed to be closer to but not necessarily part of the cloud.

Enterprises have noted three factors that influence the design of this new boundary layer. The first is scalability of data center components that interface to the scalable cloud. Number two is classic cloudbursting from data center to cloud to scale or replace failed elements. The third is the selective migration of some data elements toward, and perhaps eventually into, the cloud. Network operators have added a fourth of their own, though they’d be happy to share it with enterprises. It’s the evolution to edge computing and the growth of distributed real-time applications. It’s nice that they’re willing to share, because this is the evolution that’s most important of them all.

If you think about it, both data center and cloud computing stem from a common compute model. We have a logically centralized application supporting a distributed user population. This application was originally purely central-transactional, has evolved to front-end-cloud, and is now developing that boundary layer. In parallel, though, we can now conceptualize applications that are real-time and naturally distributed. A current application, which is the shipping/warehousing application, is an example of such an application. Another is the metaverse.

The shipping/warehousing application is an example of how the evolution of empowering employees closer to their point of activity impacts computing policy. You can visualize an operation like this as a set of distribution facilities linked by transportation resources. For decades, this sort of operation was run from a data center, but as time passed and we started putting IoT elements out in the trucks and hands of delivery people, it became clear that a lot of what was going on was really local in nature. This is likely what led Fedex, for example, to say they were going to scrap mainframes and maybe even data centers.

There’s an evolving, if yet somewhat imprecise, computing model associated with this application, one that’s not exactly cloud nor exactly data center. The model is characterized by hierarchy, migration and caching, and first and foremost, binding to life.

Efficient operation of a shipping/warehousing company depends on creating and retaining a link between what’s effectively an abstract model of an operation and the real world. The company looks like a network, with the nodes being facilities and the trunks transportation. The binding between reality and the model has to reflect the extent to which the model has to intervene in live situations, so a requirement for edge computing evolves out of supporting situations more real-time in nature. However, the need to consider the system as a whole and reflect global judgments at any given local level is constant, and is reflected in our next characterization.

The value of hierarchy derives from both the life relationship and the way that insight is propagated. We organize workforces into teams, departments, etc. because that’s the most efficient way to get the organization working toward a common goal without requiring that each person involved understand the goal overall. Divide and conquer is a good adage. We could expect our boundary-layer features to be a critical step in the hierarchy, a step between highly agile and highly tactical cloud elements and more inertial transactional elements in the data center. By feeding status between the extremes, it lets them couple more efficiently.

Migration and caching is something we’ve learned from content delivery. The optimum location for a processing element depends on where the work it’s processing comes from and where its output is delivered to. If either of those are variable, then we could expect that the processing element would migrate in response. One way to make that possible is to assume that we would first host a process rather far back, and then push it out closer to the work until we find that the next push is a step too far, creating for example too much total latency elsewhere. Warehousing often works this way in the modern age; you stock something centrally at first, and then as it’s need is proven greater in one place or another, you start caching it closer to those points, distributing the stock. Same with processes, of course.

The point here is that all of this is fundamental to edge computing but also exploitable in more traditional models of the cloud and data center. The current model evolves with the generation of a boundary layer, and that layer takes responsibility for making the decisions associated with what runs where. We are inevitably going to face boundary issues between cloud and data center, and if we address them as we should, we will generate a software platform that can more easily adapt to the introduction of edge computing and the precise modeling, digital twinning, of real-world systems. It’s worth a shot, don’t you think?

Ethereum After the Merge: The Good and the Risky

There may be no more difficult space for people to understand than the “crypto” space. Blockchain technology is well beyond most people; even technology types I deal with are uncomfortable with the details. Cryptocurrency has been hailed as the savior of modern finance and at the same time the biggest bubble of all time. Now, with one of the kingpins of crypto undergoing a major technical/business transformation, it’s no wonder that people are worried, excited, or both.

Ethereum’s “Merge” is perhaps the most significant shift in all of crypto, but it’s surely not the easiest thing to understand. Forget the murky term “merge” and dig down and you find another layer of murk, that we’re talking about a shift from “proof of work” to “proof of stake”. Most of the stories are long on promises and short on details, so let’s look at things and see what we might expect from “the Merge”.

A blockchain is a series of contract/transactional steps recorded in a chain of “blocks” and encrypted. Since these chains are electronic records, they have to be able to address the challenge of authenticity. A chain has to be validated, but to validate it you have to be able to decrypt it, and if you can do that you could in theory falsify a copy, claim it’s real, and steal a boatload of money (cryptocurrency) or falsify a contract. The traditional approach to assuring authenticity is “proof of work”.

Proof of work enlists a host of good guys, “nodes” that will all authenticate blockchains. Over half of all nodes have to agree a chain is valid, which means that for a bad guy to falsify a chain they’d have to control over half the nodes. The problem is that for every new blockchain you have what’s essentially a vote on validity, and since this all involves massive encrypt/decrypt, it’s very time-consuming, resource-intensive, costly, and hard on the environment. It’s called “proof of work” because it takes a lot of work to get to an authenticity proof.

The alternative that Ethereum is going to adopt, proof of stake, is different. Instead of a massive vote on authenticity, every “Validator” puts up a stake of ethers (ETH), the cryptocurrency used by Ethereum. The current value of the stake is roughly fifty thousand dollars US. If you forge a block or validate a forged block, your stake is reduced and eventually eliminated.

The practical benefit of this approach is that it scales. With an estimated reduction of 99% in power needs, the proof-of-stake approach could be used on the scale needed to support initiatives like Web3, which IMHO could simply not be supported using more compute-intensive proof-of-work technology. Proof of stake is also faster, which is important if some retail online applications of Web3 are to offer responses fast enough to satisfy users that their transactions are in fact working.

All this good stuff is important, but it’s not necessarily going to drive Web3 forward, or even help it shake off the increasingly skeptical publicity it’s been getting. There are even some valid concerns about the very technology advances that are cited as proof-of-stake benefits.

To be a node in a proof-of-work blockchain system, you need considerable compute power. The requirements for proof-of-stake blockchains are much less dramatic, and Ethereum even allows for “pools” of users to combine to raise the needed stake and create the processing needed. That’s much less than that needed for proof-of-work, too. All this has encouraged some to say that proof of stake is the vehicle for creating the distributed, decentralized, future that Web3 is usually associated with.

One potential issue here is the sheer populism that proof of stake could create. Imagine some number of pools of purported stake-sharers. How many such pools might be required to foist a false document? What if the total value of a proposed forgery is far greater than the stake? We regulate banks, companies. Who regulates stakeholders? They are the validators, but who validates them? Is there a process of strong identity validation, technical validation, associated with this?

The governance of the pools process is IMHO a weak point in the new Ethereum model. It’s easy to see how it could be used by cybercriminals to defraud people, simply by claiming to be setting up such a pool and taking “investments”. It’s also easy to see how criminals might invest in the pools, or even become validators, to launder money, if there aren’t steps taken to ensure that these sorts of things can be caught.

It’s also important to note that the stake is defined in ethers not in dollars, which means that the buy-in price and the amount at risk will depend on the current value of an ether. Suppose they devalue? Could the required stake fall to the equivalent of a hundred bucks or so? Or suppose that ethers appreciate wildly, and the stake is now worth a million dollars? Do we see people cashing out of their role to take advantage of the appreciation?

None of these fears of mine need derail the progress Ethereum is making, and I’d not want it to. I’ve said in the past that I believe that Ethereum is the strongest of the crypto concepts, and this Merge will only make it stronger. It’s not Ethereum that worries me as much as what could be done with and to it. We’re not going to find that out by digging into the differences between proof of work and proof of stake. We’ll have to experience the outcome by experiencing the applications.

The metaverse really isn’t a blockchain application, and the challenges associated with making it real at the scale that Meta (at the least) hopes have nothing to do with crypto, currency or otherwise. The real applications will be some flavor of Web3. Even the cryptocontract feature of Ethereum is really a Web3 concept. Validation by consensus, in the end.

Can we trust consensus, however we arrive at it or prove it, as the ultimate test of what’s real? I mentioned the ‘60s cultural icon Carlos Castaneda in connection with all this before. Did he prove that reality could as easily be the product of shared delusion as the product of truth? I’m not trying to get philosophical here, just get to the key point. That point is best related as a question; “Who are the masses?”

Any distributed system of validation is based that, on the average, what’s true for most is what’s real. We have an alternative approach today, the “centralized” framework where a few giants we all know are the guarantors of identity, validity. It sounds inherently autocratic, but we know who these people/companies are and we can sue them, regulate them, police them. Will we know all that, be able to do all that, in the decentralized Web3 world? I wonder. I worry.