Motivation

January 15, 2025

Why Science DMZ?

A laboratory or university campus network typically supports multiple organizational missions. First, it must provide infrastructure for network traffic associated with the organization’s normal business operations including email, procurement systems, and web browsing, among others. The network must also be built with security features that protect the financial and personnel data of the organization. At the same time, these networks are also used as the foundation for the scientific research process as scientists depend on this infrastructure to share, store, and analyze research data from many different external sources.

In most cases, however, networks optimized for business operations are neither designed for nor capable of supporting the data movement requirements of data intensive science. When scientists attempt to run data intensive applications over these so called “general purpose” networks, the result is often poor performance - in many cases poor enough that the science mission is significantly impacted.

Since many aspects of general-purpose networks are difficult or impossible to change in the ways necessary to improve their performance for science applications, the network must be adapted to allow it to support science applications without affecting the operation of the general-purpose network.

The Science DMZ Model accomplishes this by explicitly creating a portion of the network that is specifically engineered for science applications and does not include support for general-purpose use. By separating the high-performance science network (the Science DMZ) from the general-purpose network, each can be optimized without interfering with the other.

While the core mission of a Science DMZ is the support of high-performance science applications, this cannot occur in isolation. Scientific collaboration, like any other network-enabled endeavor, is inherently end-to-end. The Science DMZ can easily incorporate wide area science support services, including virtual circuits and 100G+ Ethernet. While a general-purpose network might struggle to make effective use of technologies and services such as these, a Science DMZ allows the local science resources to be connected to the network services required for the conduct of the science, without interference with the general-purpose networking infrastructure.

Development History of Science DMZ

The Science DMZ architecture has its roots in several aspects of networking – design, operations, and security. The term “Science DMZ” comes from the “DMZ networks” that are a common element in network security architectures. The traditional DMZ is a special-purpose part of the network, at or near the network perimeter, designed to host the site services facing the outside world (e.g. external web, incoming email, and authoritative DNS servers). The security policies, network device configuration, and so forth are tailored for the DMZ, and are not conflated with the security policies and configurations of the internal local area network (LAN) infrastructure.

The Science DMZ adapts this notion to the task of supporting high-performance science applications, including bulk data movement and data-intensive experimental paradigms. The Science DMZ is a dedicated portion of a site or campus network, located as close to the network perimeter as possible, which is designed and configured to provide optimal support for high-performance science applications. Included in the Science DMZ are the capabilities to characterize and troubleshoot the network so that performance problems can be resolved quickly - this is typically achieved by deploying perfSONAR hosts in the Science DMZ that can test to the perfSONAR hosts in the wide area and in other Science DMZs at collaborating laboratories and universities.

Science DMZ helps to solve TCP performance issues

TCP has been characterized as the "fragile workhorse" of the network protocol world. While most science applications that need reliable data delivery use TCP-based tools for data movement, TCP’s interpretation of packet loss can cause performance issues. TCP interprets packet loss as network congestion, and so when loss is encountered TCP dramatically reduces its sending rate. The rate slowly ramps up again, but if further loss is encountered the rate is further reduced. This becomes more dramatic as the distance between communicating hosts is increased. In practice even a tiny amount of loss (much less than 1%) is enough to reduce TCP performance by over a factor of 50. One example of this is shown here.

It is typically much easier to architect the network to accommodate TCP rather than to fix TCP to be more loss-tolerant. This means that the network infrastructure that supports high-performance TCP-based science applications must provide loss-free IP service to TCP in the general case. The Science DMZ model allows a laboratory, campus, or scientific facility to build a special-purpose infrastructure that can provide the necessary services to allow high-performance applications to be successful.