Managing External Networking

Often the question is asked how to manage routing in a Science DMZ environment.  For instance, should the network be treated as:

  • One flat network, where all traffic (commodity/enterprise and research) are handled on the same connection?
  • Segmented network, where certain VLANs (using the same sub-straight) carry commodity/enterprise and research traffic?
  • Completely different networks that require different physical connections to handle commodity/enterprise and research traffic

Often the answer comes down to doing things in a manner that can be easily supported by the implementor.  Adopting an exotic networking approach adds complexity with regards to how traffic is managed (on hosts and network devices).  It may also force users to learn new ways to approach problems.  For instance, a scientist that is not able to access a commodity service (e.g. google cloud storage) via a purely R&E network may not find the service overly useful.  

The following are some considerations during this design and implementation process.  

Flat Network

A flat network (all traffic from commodity and R&E resources mixed) is one of the easier options to initially install.  All network devices (routing, switch, end hosts) only need a single connection that carries all traffic.  From a security perspective, this will mean implementing protections (e.g. ACLs) that must address both forms of ingress traffic.  

ACL management can get arbitrarily complex in this environment, depending on if you are trying to block specific address ranges, or are just limiting things based on ports.  It is recommended that a standard set of ports for a set of DTNs with a similar security posture be defined, so that it is possible to apply the same policy to all of them.  If you have per-DTN policies, then it will be necessary to define an ACL (or at least one chunk of a larger ACL) per DTN.  Note that this is no different than managing per-DTN firewall polices - you still have a bunch of specific rules to keep track of.  It is also helpful to consider keeping master copies of the ACLs on a server someplace, and keep them under version control. Then, if you have to revert, you can do so more easily.

Dual Homed Network

Dual homing is an attractive solution.  It allows more complete access to a specific network (e.g. the research DMZ which has a limited number of hosts, sources, and destinations) and for more prohibition on untrusted networks (commodity).  

There are two typical ways to do this

  • Layer 2 Separation: different VLANs are created to carry specific forms of traffic
  • Physical Separation: physical network differences that require two copper/fiber/wireless connections

In the latter case do note that if your host has a network card that has dual ports, be careful to ensure that both ports can function simultaneously.  E.g. certain varieties of Myricom and Mellanox will allow this.  It will also be necessary to ensure that if you are using two distinct card resources,  that there are no internal bottlenecks on the motherboard
(e.g. enough lanes lit in the PCI, and no contention for  system bus bandwidth).  

Complete segmentation does come with some risks for essential services though, for instance there are many things that are typically 'commodity' that will now need to be passed through.  Unless care is taken to allow it, it may be the case that hosts can't access DNS, perform yum updates, update NTP, etc. It will be necessary to either allow this traffic inbound, or make sure that it is available on other networks.

In general we do not recommend completely walling off the hosts from essential services like DNS/NTP/Update servers if you can avoid it.  In many respects this can make the operational posture less secure if they have to be treated as a delicate special case.  Machines need to be updated, even if they are walled off from the world.  Similarly the measurement tools need access to NTP, and everything needs access to DNS (measurement, data transfer, etc.).  Allowing in essential services makes long term configuration and operational strategy will be much easier to handle.  

The previous doesn’t need to be done via a dual homing if there are concerns about breaching the campus network: perhaps it can be done by still forcing the DMZ to Campus connections to traverse the institutional firewall.  Since the RTT is small, traversing the dataplane of the security infrastructure will not impact short latency data transfers (e.g. in the event a user wants to pull data from the DTN to a local machine) or just small mouse-flow activities like NTP/DNS/Mail/Yum, etc.