July 2022 – Page 2 – Welcome to CIMI Corporation's Public Blog

Are SASE and SSE Making “Ss” of Us?

Does SASE equal SSE plus SD-WAN? This question, arising from a recent email I got, shows to me how far we’ve diverted from reality here. Only one of these acronyms actually has a fairly firm definition, and even it has significant variations in implementation. The other two are media/analyst inventions, so what do we gain by trying to relate them to each other? Nothing much, but we might see a justification for looking into the whole question of network-edge.

You might see this as “S-confusion” because all the acronyms start with “S”. SD-WAN stands for “software-defined wide-area network”, SASE stands for “Secure Access Service Edge” and SSE for “Security Service Edge”. It’s SD-WAN that’s a clear product or technology name; SASE and SSE are more models than they are products, except maybe the product of analyst-think. The relationship between SASE and SSE is sort-of-explicit; the latter is presented as a subset of the former.

The original concept, SASE, was all about integrating network and security services in such a way as to be cloud-deliverable or at least cloud-friendly. However, that seems to have gone by the wayside in the specification/definition creep we see these days. In its place is a vision of the integration of security and network features in a single element, which might be an appliance or a software instance, on premises or in the cloud or network.

SSE is generally the security piece of SASE. If you subscribe to the original analyst definition, that means that it’s security-features as a cloud service set. If you subscribe to the evolving market de facto definition, it means a single place/device/software-instance where all security features are presented.

The justification for either of the two is related to, and as slippery as, the cloud connection. Analyst-speak says that the majority of new workloads will be in the cloud this year and forever more, so it makes sense to stick everything there. The truth is a lot more complicated, but to cut to the chase, the important point is that the cloud has become the de facto front-end for most applications. The rest of the application is still likely running in the data center, and will do so for quite some time, maybe forever. Remember IBM’s surprise success and the “hybrid cloud” connection? But the front-end mission of the cloud is enough for our discussion.

If applications connect to the user via the cloud, then it may make sense to have most of the application security features be part of the cloud. The same could be said for network features. However, the cloud association is IMHO not the justification for either concept, that honor goes to the nature of the user of the applications.

Imagine Case One, which is a bunch of employees in some office location. Eventually, I agree that these employees will end up using a cloud front-end component to access their applications, even though they likely went directly to the applications in the data center via their VPN before. Most tweaks in the user interface seem to be shifting to the cloud as part of a general tendency to build a flexible gateway in the cloud for every type of user. These workers, behaving like traditional branch employees, could likely be better supported in terms of either security or network connectivity using a piece of CPE connected to whatever their wide-area service was—a VPN in the traditional case.

Now imagine Case Three, which is a variable set of customers, partners, suppliers, and so forth. To presume that this group would be able/willing to acquire security/networking CPE is outlandish. They will surely be connected to a cloud front-end, and for this group a cloud instance of security/network services would be imminently sensible.

Missed Case Two, you think? No, I saved it because it falls between the two. For Case Two, imagine a mobile worker, a WFH worker, a “hybrid work” worker, or whatever you think represents a group of people who access sometimes from a fixed location and sometimes from on the road or from various locations. COVID validated this group, and the timing of the emergence of SASE and SSE coincides roughly with COVID (SASE was introduced by an analyst firm in 2019 and SSE came along after COVID). If Case Two has the effect of getting a lot of those Case One people out of the office, then CPE-based strategies become the exception, and a cloud-centric approach makes sense overall. Since that’s not yet happened, and since there are ways of connecting WFH and even mobile workers back to their home branch locations, the more general market definition of SASE and SSE seem fine.

What about SD-WAN as an element of SASE and companion to SSE? SD-WAN can add significantly to security, depending on how it’s implemented. Any virtual network technology is likely at least somewhat more secure than an open network, and if the SD-WAN is session-aware and can control connectivity (128 Technology, bought by Juniper, offers this) then arguably SD-WAN is a major security improvement. Is it enough in itself?

Probably not. More and more threats these days come from within a company, meaning that malware has compromised a computer and made it into a bad actor. Presumably it’s got some local connection permissions, and so there’s a risk it might use them to break into stuff or simply to break it. However, unless what that bad-actor system tries to do is totally within the normal session connectivity policies of the SD-WAN, the attempts at evil will leave a footprint and perhaps generate a signal.

Does “SSE” add what’s needed to get to SASE utopia? Obviously that depends. The presumption that SSE will address our cases depends not only on its implementation, but on what can be done from the connection edge under any circumstances. As I noted in my blog yesterday, security is a potential application for AI, because AI could look at network traffic holistically and detect patterns that might indicate a breach. That can’t be done by looking at one user interface from the outside.

SD-WAN implementations that offer connection control are a major positive step in network and application security. AI augmentation of that is almost certainly going to do more for security than gluing on some “SSE” element. In the end, all that may do is make “Ss” of us all.

Is Security a Good Application for AI?

What, exactly, is the relationship between AI and security? We hear a lot about AI these days, and though my own data doesn’t come close to the level of AI adoption some surveys claim to find, it does show that the number of enterprises with actual AI applications has tripled since 2021, which represents pretty decent growth. Security, of course, has been a priority for much longer, but has also been a recent target for AI enhancement. That seems to be a work in progress, but the potential there is real, and so is enterprise interest.

Enterprises tell me that there are two ways in which AI could enhance security. One is the obvious application, where AI is used to detect intrusions directly. The other is less obvious but, at least for the present, a bit more credible to enterprises; the use of AI to reduce configuration/parameterization errors that open security holes. The interest in the latter comes in part because that error-management mission has other benefits.

When Cloudflare had its human-error BGP problem that took down part of the Internet, and Rogers in Canada admitted to a “maintenance” error that took most customers/services offline, enterprises told me that they had their own network configuration problems that impacted their own applications. They also told me that deployment/redeployment automation, through tools like DevOps (Ansible, Chef, Puppet) or container orchestration (Kubernetes) also created problems. A surprising number commented that while these errors sometimes created failure modes that impacted availability, their real concern was that they might well also impact security, and that these impacts might not be easily recognized.

One CIO told me that after Cloudflare, the CEO became concerned that the same kinds of errors might have created accidental holes in security, and asked for a review. To the surprise of IT, they found that there were a half-dozen examples of what they called “loopholes” in their security plan, created by misconfiguration of one or more applications, virtualization platforms, or networks. There were indications that one or perhaps two had been exploited already, but none had been caught.

This enterprise has undertaken what some jokingly called the “Big Brother” project, because we all know the phrase “Big Brother is watching you!” They want AI to play that role, looking at the hosting and network platform setup for their applications and verifying continually that changes and tweaks and recoveries don’t create those pesky loopholes.

While this enterprise hasn’t yet made the leap from a security-focused AI plan to a governance/compliance AI plan, it seems pretty likely that they will, because most organizations tell me that they don’t think that their compliance plans address all of the application and network changes, ensuring that the latter don’t break the former without leaving a clear fingerprint to trigger management focus.

Enterprise have at least a vague view of how they’d like this to work, as the “Big Brother” term suggests. They believe that AI could play a role in continuously matching what we might call “platform and application configuration” policies for security (and eventually, I think, compliance) against actual conditions. They acknowledge that this will almost surely mean formally authoring those policies so that automated analysis can be conducted, but they believe that the information on current conditions is available already. Most think that machine learning could ease the process of establishing what conditions actually break policies, but some believe that at least a bit of expert input would speed things up.

Would a single architecture help this along? The limited input I’ve gotten on that question suggests that it would, but nobody believes that such an architecture exists or says that their vendors are currently pitching one as a future capability. Very few believe such an architecture is on the radar overall, in fact.

How about the “detect the breach” mission for AI in security enhancement? This seems like a no-brainer to enterprises; they believe that AI could be used to detect “normal” application and network behavior and then identify “abnormal” states. Juniper is apparently seen as a vendor making progress in this area, but enterprises think those initiatives are aimed more at fault management than at security management.

Enterprises who have had security breaches tell me that in almost every case, postmortem analysis shows that there was a detectable change in traffic patterns, application behavior, or other visible metrics, but that operations staff either didn’t notice the difference or didn’t interpret it correctly. Most said that while they could identify the link between a breach and a change in behavior after the fact, they were unsure that the correlation was strong enough to trigger human intervention, which is where they hope AI could play a role.

Most enterprises seem to think that this is a task AI could undertake, and successfully, but they’re more divided on just how to go about this and what to do with any AI-generated alerts. Among my contacts, the presumption that a vendor would offer a canned product that was somehow pre-loaded with rules to identify breach-related abnormalities has declined somewhat over time, in favor of a machine-learning approach. However, there is concern about both of these models of AI application.

The problem enterprises have with pre-loaded analysis frameworks, of course, is that they may not reflect the actual conditions created by a company’s own configuration and security problems. Enterprises have their own network traffic patterns, driven by their own shifts in application usage. Would it be possible to establish rules that were helpful in identifying abnormal shifts when “normal” patterns are so diverse?

Machine learning is a solution most enterprises say could be made to work, but it’s the “made” part that’s concerning them. In order for ML to identify abnormal patterns you first need to have traffic monitoring at the level needed to identify application/user traffic. Most enterprises do not have that, and many are concerned that this sort of monitoring might actually create security/compliance risks in itself. Second, you need someone to review a new pattern to determine if it’s “abnormal”, and enterprises fear that this could require so much time from network/applications operations teams that it could disrupt operation.

The final issue, where both the objections seem to converge, is what to do with the alerts. Do you activate some remedial process (one enterprise called this a “lockdown”) in response to an alert, or notify somebody? Enterprises think that “eventually” they would be confident enough in an AI big-brother oversight system to let it take action, but they admit it would take six months or a year for that to happen, and that in the meantime they’d have to rely on human reaction to an alert. That could increase the operations burden again, and also create a “risk window” that could reduce the overall value of the AI solution.

I think that security could emerge as a driver to AI oversight of network operations practices and network connection and traffic behaviors. I think the barriers to that emergence aren’t very different from the barriers to AI adoption in netops overall, and so it may be that security is building adoption benefits without really adding to operations burdens or risks, versus AI driven by operations oversight alone. If that’s true, then a discussion on AI’s value in security could benefit vendors and enterprises alike.

Juniper Adds Cloud Metro Detail, but Still Needs a Bit More

Everyone who’s read my blogs knows that I believe that “metro” is the value bastion of the future for network operators. Juniper did a “Cloud Metro” announcement over a year ago, the first and most significant such announcement by a network vendor, and I’ve been waiting to see how they developed the theme. That question may have been answered with new Cloud Metro material and a blog to flesh out the story further, but more refinement seems essential, down the line.

To understand my view on Juniper’s Cloud Metro, you need to understand both the announcement and why I think metro is important. Networks are, for a variety of business and technology reasons, not homogeneous structures but rather layered structures. There’s an access network and a core network in classic thinking, and the place where the two come together is usually called “metro”. In population demographics, a metro area is a population center that almost always consists of multiple communities that operate as a somewhat cohesive commercial unit. A lot of modern thinking would argue that the boundary of a metro is the distance across which commuters and shoppers are willing to travel. A metro is thus an economic unit.

Technically, meaning in networking terms, metro locations were first the transit switch locations in telephony, then the boundary of the central office locations connected to a common fiber ring. The technical boundary grew out of the economic boundary; people tended to talk/interact with others they could also reach physically, and even companies tended to define their organizational hierarchy based on “regions” that were often metro areas.

As networking has evolved under the pressure of broadband Internet and mobile networks, the metro area has taken on added significance. Interests of all sorts center there, and those interests create service opportunities. Realizing service opportunities these days means The Three “C’s”, combining computing, content, and connectivity, and metro areas offer an ideal place to do the integration because there’s enough traffic/value density there to justify the resource investment. This has been true for at least a decade, but for most of that period both operators and vendors have been slow to recognize the need to evolve metro technology to reflect the evolution in metro missions.

The biggest change in metro architecture and technology is a shift in technology driven by a clear shift in the creation of service value, those Three “Cs”. Telephone switches created service value for voice, and early data networks (T/E-carrier leased line) created value through connectivity. Today it’s features and content that create value, which means that value hosting in metro networking evolves the metro toward data-center-centricity. You can expect the largest concentration of servers hosting features in the metro area because it’s deep enough to create economy of scale and shallow enough to support user awareness and personalization.

Let’s now move to Juniper’s Cloud Metro. In a presentation in the UK last week to launch the vision, CEO Rami Rahim said that it was clear dumb pipes weren’t going to cut it any more; “we need something better”. That’s the fundamental truth of metro in a nutshell. However, the Juniper slant in Rami’s pitch is till aimed less at going beyond dumb pipes but in making dumb pipes more efficient. There’s nothing wrong with that at one level; telecom services are connectivity first and smarts second, and you can’t have connectivity services turn into a loss leader. At another level, you can’t get beyond dumb pipes with connection efficiency alone. The question for Cloud Metro is what adds the essential smarts, meaning where does the “cloud” part come in.

In some data center, obviously. The centerpoint for data-center-centric metro is a highly service-capable fabric, which Juniper provides with its ACX7000 family. An agile fabric is essential in supporting the multi-tenant, multi-service, metro value model and you need IP switching rather than pure Ethernet to keep the pieces and service types separated as needed. The Juniper switches have packet-forwarding engines that can turn off unused capabilities to reduce power consumption. They’re supported by 400G coherent optics that integrate the optical layer with IP. This combines to build the connectivity needed for value hosting, at the metro edge.

To me, the key differentiator for the new metro is tight integration with a local data center, meaning “edge computing”. That requires connectivity, but also a vision of what I called “value hosting” above. The presentation introduces the old model of metro as “retro metro”, and in it we still see the Cloud Metro vision as an evolution of connectivity. Traffic will increase, capacity will explode, and the mission of connectivity will become more difficult to operationalize, secure, and make sustainable. Again, all of this is true as far as it goes.

The “new metro” model is certain to be significantly more operationally intensive than the old fiber-ring model, not only because of the increased technology breadth in metro but also because of the wide variety of service missions, each of which are likely to involve specialized management. Juniper Paragon, its wide-area AI-driven automation tool, provides the baseline FCAPS support, and because metro centers are expected to grow as new services are added, it’s augmented by AI-Enabled Device Onboarding-as-a-Service and Paragon Active Assurance, which uses embedded test agents to make the metro network into what’s essentially a big distributed sensor that can also simulate traffic for active testing.

The multi-technology, multi-service, evolving-infrastructure nature of metro poses special security challenges, and Juniper adds Zero Trust Security to Cloud Metro, including threat detection, analysis, and protection, and strong digital identity proofs via the Trusted Platform Module. Critical platform files are encrypted for transport, and security measures and state are validated by Paragon.

What this adds up to is a Cloud Metro framework that defines how metro networks will be supported as they evolve from simple fiber interconnect to feature hosting points and edge computing centers. Juniper was the first of the network vendors to recognize that the metro of the future was very different, and they’re now productizing the tools to accommodate metro evolution. Every element of their Cloud Metro story is relevant, but one element seems missing…the cloud piece.

The latest release says Juniper “announced the innovation that will power its vision and strategy for Cloud Metro – a new category of solutions for service providers, optimized for metro transformation and sustainable business growth.” The “new category” comment is carried forward in the webinar and the blog entry I link to above. New product categories are a sort-of holy grail for vendors. If you can establish one you are for a moment able to define how the category will be judged and set the bar for how vendor comparisons within it will be aligned by features/capabilities. The first vendor gets to define the columns in the comparison chart, the quadrants in those pervasive market position charts. Does Juniper really want to do that, and if they do, have they done enough?

Let me start by saying that metro should be a new product category, because as I’ve noted, metro is the feature repository for the future of network services. That’s the real value proposition here, and it has the advantage of being something new, and credible…and exciting. News, whether it’s tech news or any other kind of news, means “novelty”. Juniper achieved that in its announcement last year, but can further novelty be achieved without really smartening those dumb pipes? Can it be achieved with technology that’s not targeting specific new service/experience missions? If not, then those missions have to be defined and the fulfillment has to include pipe-smartening, and there was very little about that in the latest announcement.

Media coverage of Juniper’s announcement has focused on 5G, apparently because that was the hot concept the media could link coverage to. 5G is clearly not going to revolutionize metro by itself; at most it’s a way of getting operators to think about what adding feature hosting to connectivity could mean. It’s a potential way of getting started with edge computing, with hosted features, with smartening dumb pipes. Those things should be the story, and ironically I think they are the story that’s inherent in the technology.

This raises the question of what would revolutionize metro and launch edge computing. Servers and platform software? Cisco has more credentials in that space than Juniper, which might not be a good thing. Operators are unlikely to embrace a single-vendor model, so Juniper could be smart in not pushing a specific server/software mix for Cloud Metro. Still, they will be required. Could Juniper launch Cloud Metro without the cloud part, or with no cloud connection other than 5G function hosting? That’s not a revolution, not exciting, and probably not a new product category. So what must Juniper do? Juniper could embrace a model, an architecture, rather than risking someone else suggesting one. That would lay out a vision of how the cloud and metro networking fuse.

“Cloud Metro” has to start with cloud in more ways than sentence structure. Yes, feature-hosting centricity in metro would require changes to the connectivity elements there, but it’s the feature-hosting piece, the stuff that integrates with edge computing, IoT, the metaverse, and so forth, that’s driving all of this. Simply preparing for the network changes puts you at the mercy of those who can drive it.

Juniper actually has the best, most complete, network product solution to support the metro of the future. However, their own stuff can’t bring that future metro about; they need servers and software and architecture to make that happen. Rather than doing the first two things on their own, a move that would help position Cisco (who has more of that stuff already), they need to create something that others can buy into…and make that exciting and revolutionary.

Can a new product category be justified without this sort of excitement-building talk? No. Even if the concept was a real technical framework, analyst firms won’t sell reports and make money off that sort of thing. A new category has to be a revolution, one that as many players as possible will understand and want to be a part of, and buyers will accept and want to evaluate. Making that happen has to be Juniper’s next Cloud Metro step.

The Challenge of the Service Data Plane

Given that there’s an essential relationship between features and functions hosted…well…wherever and the connection network, it’s important to talk about the connection relationships involved, and in particular the service composition and federation relationships to the data plane. This is an issue that most cloud and network discussions have dodged, perhaps conveniently, and that has to come to an end.

When we talk about network services, the Internet dominates thinking if not always conversation. A global network with universal addressing and connectivity sure makes it easy to deal with connection features. Even when we’re not depending on universal connectivity, to the point of actively preventing it, we still tend to think of subsets of connectivity rather than truly different connection networks. IP VPNs create what’s effectively a closed subnetwork, but it’s still an IP network and still likely connected to the Internet, so its privacy has to be explicitly protected.

When I was involved in some of the early operator-supported initiatives on federated services, services made up from feature components contributed by multiple providers, the issues with this simplistic model became clear. If you wanted to connect a VPN or a VLAN between operators, in order to create a pan-provider service, you needed to understand just how the interconnection could be done while supporting the SLA promised at the service level, the privacy of the operators involved, and the need to get traffic from one virtual network of any sort, to another.

Going back five decades or so, we find that early “packet network” standards raised and addressed this issue. The X.25 packet switching standard, for example, defined a user/network interface (UNI) and referenced the X.75 network-to-network interface (NNI). IP networking uses BGP to allow a network to advertise gateway points between networks to allow for overall route optimization. The emerging IETF work on IETF network slices presumes that there would be a similar gateway point where one implementation of the specification met and connected with another. The recent IETF work is interesting in that it addresses not only “connection” between networks, but also the exchange of information relating to how to recognize “slice traffic” and what SLA to apply.

Contrasting the early and current work on federation is useful because there are a couple threads that run through them all. One is that federation is based on a contract, an SLA, between the administrations involved. Another is that federation/interconnection takes place at specific points, points that are aware of the contract and can interconnect the traffic as needed, applying the SLA as it does.

The obvious central pillar in all of this, behind both these threads, is the identification of the traffic that a given service interconnect represents. In connection-oriented services like X.25, the identification is specific because it’s made at the time the connection is established. In connectionless services like IP, you have to provide header information that allows the service traffic to be recognized. However, there’s some current and potential future overlap to be considered.

You could signal a service interconnect explicitly, even over an IP network. MPLS LSPs can be set up in a connection-oriented way, and that means that even within IP you could employ connection behavior to signal service interconnections. That’s helpful because explicit service connection setup facilitates the integration of network connections with deployment of service elements. In addition, recognizing an explicit link between a service set and an interconnection means that if adaptive changes to the network occur, you can reestablish the relationship between service and connection even if different NNI points are used. This is a “signaling” model.

You can also do that through “provisioning” versus “signaling”. BGP policies control routes between AS “administrations”, and you can control internal routing via one of the other protocols (IS-IS, OSPF). If you combined route control with the provisioning of a packet filter at the essential gateway points, you could apply service SLAs there and do the proper interconnecting. With SDN, of course, you could simply explicitly route, and the central SDN controller would have to take responsibility for meeting the SLA in the routes it enforces.

The obvious challenge here is that if there are multiple ways of doing something, you have to address the situation where two different mechanisms are in play and have to be united in a single service. We now have an elevated set of NNI functions, which means that gateway nodes would have to be characterized by the details of what they could interconnect. The more different models we accept, the harder it is to manage all the possible interconnections, and the less likely it is that all of them would be supported everywhere.

This leads us to the issue of how service interconnects are specified in the first place. Today, most cloud and container deployments depend on a virtual network of some sort; in Kubernetes you can specify the virtual network to be used. However, Kubernetes really isn’t designed to “set up” connections and routes; in the great majority of cases the assignment of a user or pod to a virtual network creates the connectivity at the low level, and from there you could in theory expose selected low-level elements to a higher-level VPN or two.

It seems to me that something like the IETF network slice concept will eventually demand that we be able to “orchestrate” network connections in parallel with the orchestration of the lifecycle of hosted features, functions, and applications. That would mean either providing the mechanism to control connectivity within orchestration tools like Kubernetes, or providing a higher-layer tool that would take responsibility for orchestrating the service or application, and would then call upon lower-level tools to do deployment and redeployment of hosted elements, and connection of those elements via the network.

Control of connectivity within Kubernetes seems on the surface likely to require a major extension; it might prove a bit easier with DevOps tools like Ansible but even there I think there’d be a lot of work to do. The question is whether there would be any interest in doing it, given that traditional application networking probably doesn’t require much more than that which is already supported. Absent a cloud-centric (or at least container-centric) approach, it seems we’d likely have to build a higher-layer model to unify the awareness of server resources and awareness of network flows.

This may be the biggest challenge in accommodating telecom needs in cloud or virtualization software. Telecom services are connection services at the core, so it’s difficult to leave them out of lifecycle automation and fulfill the telecom mission. That may prove to be on of the most interesting and challenging issues that new initiatives like Nephio will have to address.