Why do we seem to have so many problems with IT and network security? We hear about a new attack almost every day, a new risk, a new set of cautions, and (of course) new products. You’d think that given the long history of bad actors in the space, something effective would have been done by now. It hasn’t, clearly, and we can’t hope to do better if we don’t understand why we’ve done badly so far.
It’s fair to say that almost all the security problems we have stem from or are facilitated by the Internet. The Internet has become a major fixture in our lives, even the center of many lives, and providing Internet access both at home and at work is broadly seen as mandatory. Unfortunately, there are major problems with Internet security, and that opens us up to a lot of risks.
The first, and biggest, problem is that the implementation of the Internet is inherently insecure. We presume open connectivity unless somebody closes it, and of course it’s hard to close off a connection to someone you didn’t even know about. We exchange links that can represent anything, and with one click we can be compromised. We don’t authenticate sender addresses, or much of anything else, and there’s broad resistance to making things on the Internet more traceable. I understand peoples’ reluctance to be tracked, but ironically many who feel that way nevertheless accept all cookies and end up being tracked in detail.
The second problem, nearly as big, is that we’ve tried to improve Internet security by band-aiding symptoms of problems rather than addressing fundamental issues. We add things like firewalls to protect us from unauthorized access, companies can examine emails and attachments for malware before they’re delivered, and we scan our computers with anti-virus software regularly to catch whatever might have been missed at the delivery level. A total security model would have been helpful, but most people and companies tend to think that extensive security improvements are simply too inconvenient or too much work.
The third problem is that we’ve turned the protocols of the Internet into the universal network protocol for businesses, without addressing the limitations I’ve already noted. In fact, many businesses apply no real control over information exchanges within their companies, making VPNs less secure than the Internet if we’re considering an enemy within.
The fourth problem is that personal interest in content and information has encouraged us to be relatively indiscriminate consumers of online information, often on personal devices that do double duty as business devices. Something can be planted on our devices while we’re using them for our own purposes, even at home, and that something then becomes the enemy within I just noted. They don’t call this sort of thing a “Trojan Horse” for nothing.
The final problem complicates everything else. Vendors make money selling security, more in many cases than on selling network equipment. Users have gotten project approvals for early solutions, and it’s difficult for them to tell the CxOs that the early purchases are now obsolete. I see a lot of this sort of inertia limiting the willingness of sellers to promote what we know is a better approach, and a lot of buyer reluctance to admit earlier strategies weren’t optimum.
We could now, if we were optimists, talk about the steps that we proposed to address each of these issues, and claim with considerable justification that those steps would fix the security problems. The problem is that there is virtually no chance that the “right” solution to each of the problems I’ve noted here would be adopted. The impact on users and network operators would be enormous, and there’d be no authority on the planet who could compel everyone to play nice with all the solution strategies. We need a more contained approach, and I think we can define five things that could be done to improve security significantly.
The first thing is to implement zero-trust connection security within a business. What that means is that all connections between network-addressed elements would be considered barred unless explicitly authorized. It means that there’d have to be a fair amount of up-front work done to get the trusted relationships defined, but role-oriented hierarchies could simplify this considerably. Once we have a trust list, any attempt by any network element to connect off-list would be journaled, and repeated attempts could then take the element out of the network until its behavior was explained and remediated as needed.
This process should be extended with respect to “critical” resources. Any attempt by another non-trusted element to access a critical resource should result in the element being taken off-network for examination. This could mean that the offending element was infected, or was actively attacking.
Thing two is to apply central virus scanning to all internal emails. Many companies don’t scan internal emails for malware, which means that an infection in one system, even one with no rights to access a critical system, could be spread to another who has full access rights. Secondary vectors of attack like this must be eliminated if zero-trust access security is to be meaningful.
The third thing to do is to create private subnets for critical applications. All too many enterprises have a single address range for all their users and applications. That makes it hard to control access and hacking risk. If critical applications are in their own private subnets, the components of these applications can access each other, but the private address space means these components can’t be addressed from the outside. Those APIs that represent actual points of access can then be exposed explicitly (via NAT), with stringent access controls based on zero-trust principles.
Thing four is to create virtual air gaps. Most people are familiar with the concept of an “air gap”, meaning a break in network connectivity that isolates a system or systems from the outside world. In the real world, it’s often difficult or impossible to completely isolate critical systems, without making it impossible to use them. However, what could be done is to eliminate network connectivity except through what I’ll call a “service bridge”. This is a gateway between a critical subnetwork and the main company network that’s created by a pair of proxies, one in each network, that pass only service messages and not general network traffic.
Linking a critical system’s subnetwork to the company VPN, even based on the prior private-address technique, is a risk if a system that’s permitted access can be infected. If all interactions across the boundary to the subnetwork are “transactional” then nothing can be done that’s not been predefined as a transaction.
The final point is use separate management, monitoring, and platform software tools inside the critical-system subnets, not a single strategy across all network elements and IT components. There should be no sharing of tools, resources, etc. between critical systems and the broader company network and IT environments. Systems tools represent an insidious back-channel path to the heart of almost every application in a company, as the Solar Winds hack proved. Not only that, operator errors and configuration problems often open security holes, and if critical systems are managed by the same tools that manage general connectivity and IT deployments, the number of moves, adds, and changes made overall create a greater risk that one of them will accidentally create an issue with a critical system.
Container use would simplify all of this. Think of critical systems as being in one or more separate Kubernetes clusters, with all associated tools confined within the cluster boundaries. Containers would also facilitate the standardizing of application packaging, which would reduce operations tasks and thus reduce errors. However, as already noted above, critical systems should be isolated at the platform/tool level, so container systems should partition them in different clusters, each hosting an independent Kubernetes instance, and all federated via a tool like Anthos. For the federation process, it’s important to protect the network pathway from Anthos to the clusters so it doesn’t become a risk source.
The final point is to address any issues of physical security that are a concern. If active physical security threats are possible, rather than remote hacking, then the network equipment needs to be protected in a secure room and cage, with access control. In addition, all systems inside the security perimeter need to have USB ports and network ports disabled to prevent someone from introducing a new device inside the subnet. Additional protection can be added by eliminating DHCP address assignment in favor of assigning IP addresses to devices explicitly, to prevent someone from adding a computer to the secure subnetwork and having it obtain a trusted address.
We have to close with tools and practices. While there are publicized vulnerabilities that are exploited by malware regularly, many of these rely on improper security practices to spread and do significant harm. The great majority of problems with networks and IT infrastructure are self-inflicted wounds, things like poor password practices, improper network setup, failure to isolate critical systems and protect “internal” application APIs, all create an environment that can generate its own risks, and increase your vulnerability to outside forces. Moral: there’s no substitute for planning your network and IT infrastructure and systematizing your operations.