Science DMZ Security - Firewalls vs. Router ACLs
It is our suggestion that firewalls not be used to protect science DMZs due to the negative impact they have on performance. Instead router ACLs and other security best practices be used. This may seem a controversial statement and hence we explain our stance in the remainder of this page.
The defense of information systems is an essential function of a modern enterprise. This is true whether the information systems are used for human resources and other business applications, scientific discovery, or any other function. One of the great workhorses of network security is the stateful firewall appliance, and firewalls work well for standard business applications - this is the primary purposed for which they are designed. However, many scientific applications require very high network performance - not just in link speed, but in throughput delivered to the application.
Firewalls are typically designed and built in ways that make them ill-suited to high-performance science environments. As described below, firewalls have no special analysis features for scientific applications - all they can do is filter network traffic for science applications by IP address and port number. This is very similar to the ways in which a router Access Control List (ACL) is used to provide security protections - namely, the filtering of traffic by IP address and port number.
One great advantage of the Science DMZ model is that it allows network and security architects to optimize the tools and technologies employed in the defense of science-critical systems. In the Science DMZ model, ACLs are used to defend high-performance scientific applications, and institutional or departmental firewalls are used to defend business and end-user systems - just as they are today. Since ACLs are usually implemented in the router's forwarding hardware, they typically do not compromise the performance of high-performance applications.
As data-intensive science becomes the norm in many fields of science, high-performance data mobility is rapidly becoming a core scientific infrastructure requirement. By deploying a Science DMZ, a research institution can both achieve high performance and defend its systems without having to make the choice between network security and the science mission of the institution.
Security for a data-intensive science environment located on the Science DMZ can be tailored for the data transfer systems on the Science DMZ. These hosts typically run a well-defined and limited set of special-purpose applications rather than the usual array of user applications. Since the Science DMZ resources are assumed to interact with external systems and are isolated from, or have carefully managed access to, internal systems, the security policy for the Science DMZ is tailored for these functions rather than to protect in interior of the general site LAN.
The primary function of a firewall ruleset is to permit or deny network traffic using packet header information in a process where each packet is typically matched against the firewall ruleset. The primary criteria used to decide whether a packet conforms to security policy or not are source IP address, source port (if the packet is a TCP or UDP packet), destination IP address, and destination port. This section describes, in general terms, the high-level operations performed by most firewalls.
The firewall maintains a lookup table that tracks the protocol state of the individual permitted connections (identified by the 4-tuple of source/destination address and port) traversing the firewall in real time. When a packet arrives at the firewall, the firewall looks up the address/port 4-tuple of the incoming packet in its connection state table. If the packet matches a state table entry, the state table entry is updated and the packet is permitted. If there is no state table entry, the packet is matched against the firewall ruleset. If the packet is permitted by the firewall ruleset, a new state table entry is created and the packet is permitted. If the packet is not permitted by the ruleset, the packet is dropped. The state table is central to the operation of the firewall - if the state table fills, new entries cannot be created (and therefore no new connections can be established across the firewall). Also, the state table allows for significant performance leverage, because an address/port 4-tuple lookup is a fast operation. However, the resources of the state table are finite, and so the state table must be managed.
When a connection terminates normally (e.g. if the connection is a TCP connection and the firewall sees a FIN/ACK sequence in both directions, or if the firewall sees a reset for a TCP connection), the firewall removes the associated connection state from the state table. Some protocols such as UDP do not have explicit connection state, and so it is harder to tell when to clean up the connection state. Also, TCP connections often do not terminate normally (e.g. if I'm late for a meeting and just close my laptop, open connections just stop sending traffic without cleaning up). In order to prevent the state table from filling up, state table entries are managed by a timer. If a state table entry has not been updated after a timeout interval (typically 5 to 15 minutes, and potentially different for different protocols), the firewall assumes that the connection is dead and removes the connection from the state table. However, once the state table entry is gone, packets from that connection will be denied by the firewall - the firewall will not re-establish a state table entry for packets from the middle of an established connection.
Modern firewalls can manage very large state tables - millions of connections are typically supported. This traffic processing model is well-matched to the traffic profile of the modern business enterprise, which typically consists of a large number of short duration connections of relatively low data volume. The firewall can be built from many parallel packet processing engines that can be combined to create a firewall capable of processing millions of connections at 10 gigabits per second. This design pattern - the parallel processing of a large number of simultaneous connections by using a set of processing engines - is common to many firewalls.
In addition to address/port matching and connection state management, many more advanced firewalls are able to use deep packet inspection to track application-layer behavior. They can detect email traffic, and scan emails for viruses on the fly, they can analyze web traffic to look for hostile behavior, and so on. These application-layer functions exist because of the large number of enterprise customers that run the applications - there is a broad market for this functionality, and it is worth the investment in R&D to build more advanced analysis for common protocols into a firewall. In contrast, firewall vendors typically do not include application-layer analysis for scientific applications - the market is too small to make it worth building the analysis tools into the firewall appliance.
Interaction of firewalls with data-intensive science
A common task in data-intensive science is the movement of a large amount of data (several terabytes or more) from one location to another. The reason is typically to get the data to a storage or analysis resource of some kind. The transfer of large data sets typically involves a small number of TCP connections that use a significant fraction of the available path bandwidth. Also, such transfers can take a long time (sometimes hours), and the data transfer applications need to communicate at the beginning and end of the transfer, but not always in the middle while the bulk data movement is occurring. This traffic profile is a poor match for common firewall designs in several different respects.
- Firewalls are designed to manage a large number of connections - data transfer applications typically use only a few.
- Firewalls are often composed of many processing engines with a peak performance that is significantly less than the overall device bandwidth (e.g. on a 10Gbps firewall, a set of 8 packet processors capable of 1.2Gbps each is typical).
When the internal data path for a network device is slower than the interface speed of the device (as is the case for the 10Gbps firewall described above), high-performance applications can induce packet loss at data rates significantly less than the nominal bandwidth of the network. Because of the bursty nature of TCP it is often easy to cause loss inside a firewall that is built in this way. Consider the example of a data transfer host with 10GE interfaces - the host will send 10Gbps packet bursts which the firewall above can process at 1.2Gbps. The firewall must buffer the 10Gbps burst while the packets are processed at the lower rate, and some packets will be dropped unless the firewall's buffer can hold the burst until the firewall can process the packets. Unless the firewall has reliable loss counters (and unless the security group that owns the firewall can be persuaded to publish those counters), all that the scientists can see is that "the network" performs poorly because of packet loss caused by the firewall.
The management of the firewall's state table, particularly the removal of idle connections after a short time interval of inactivity, is a critical point of tension between long-running scientific applications and stateful firewalls. If the firewall is configured to prevent the exhaustion of its state table, it breaks long-running applications with network connections that are idle while the application is doing other work, often on terabyte-scale data sets. If the firewall is configured with a connection state timeout that matches the standard keepalive timer in host TCP/IP stacks (2 hours), the firewall is vulnerable to failure due to state table exhaustion. This is not a theoretical concern - we have seen data corruption and transfer failures because of this problem (one case involved a large data transfer where the control connections were aged out of the firewall state table before the data connections completed, resulting in transfer failures).
Note that hosts behind the site firewall that try to access their own local Science DMZ can often achieve reasonable performance. The reason is that the very low latency between the local Science DMZ and the local users results in some of the issues caused by the site perimeter firewall being much less of a problem in practical terms. TCP recovers from loss quickly at low latencies, and short-distance TCP dynamics are different enough from the TCP dynamics in long-distance transfers that packet loss that would exist if the wide area data transfers traversed the firewall may not even exist when local users access Science DMZ resources. The key is to provide the long-distance TCP connections with uncongested, loss-free service.
More information is available in several presentations:
- Some sample results showing performance issues caused by a firewall are in these slides from Internet2.
- Eli Dart's Science DMZ Security talk at TIP2013