Virtual Circuit Strategies
Virtual Circuit Strategies
Virtual circuits (VC), provided through systems such as OSCARS or Internet2 Advanced Layer 2 Services (AL2S), are becoming a common part of the R&E networking experience. Integrating this functionality into the Science DMZ architecture and Data Transfer Nodes (DTNs) can be done in a number of different ways, depending on the needs of the environment and the expertise of the operational team running the DMZ.
Often it is the case that a virtual circuit is most useful when linking campus or research facilities together, and this implies that the configuration must span several components of infrastructure that may not be directly controlled by the users at the end. The following documentation explores some of the options that are available to accomplish the goals of direct connectivity via Science DMZ technologies, including ways that Layer 2 switching and Layer 3 routing can worth together to accomplish goals of science.
Connecting hosts using layer 2 interdomain virtual circuits
Two hosts (or two sets of hosts) at distant sites can be connected to each other using layer 2 virtual circuits. In this case the interdomain virtual circuit typically provides a point to point VLAN between the two site edge networks. If the Science DMZ switch router at each site can be connected to that virtual circuit, then one or more hosts in the Science DMZ at each end can be directly connected (in the IP sense) over that VLAN. Typically OSCARS or AL2S is used to set up the virtual circuit.
The DTNs in the VLAN will typically not have any connectivity outside the VLAN (unless they are dual homed). This works well for specialized workflows and for network research, but has less applicability for workflows which require connectivity to the outside world (again, unless the DTNs are dual homed).
P2P VLANs on Routed Devices with BGP
If the virtual circuit (created statically or through systems such as OSCARS or AL2s) is setup to be between routed devices, such as Science DMZ routers, then any attached DTNs can behave normally, and the routers can forward the traffic between the DTNs over the virtual circuit rather than over the normal Layer3 path. This requires that the routers have addresses on the ends of the virtual circuits. This can be a more persistent capability, e.g. set this up once, and if the circuit is up the traffic traverses the circuit, else the traffic traverses the best-effort path. This is often accomplished by running BGP between the two Science DMZ routers over the virtual circuit, and exchanging routes for the DTNs over the BGP session.
The LHC experiments have used this model for several years, where a point to point virtual circuit is created between devices, a /30 subnet is configured on the circuit, and BGP is used for signaling.
Static Policy Routing
If setting up BGP sessions across virtual circuits is not desirable, it is possible to set up a long lived virtual circuit and set up static policy routing. This avoids the need for BGP, but it means that if the virtual circuit goes down then the DTNs that are connected on the ends can't communicate in the general case. The policy can be configured with the ability to fail over, but this often implies that the policy routing needs to be aware of the state of the virtual circuit (something that BGP does automatically).
RDMA over Converged Ethernet (RoCE)
RDMA protocols (such as InfiniBand) have played a significant role in enabling low-latency and high-throughput communications over switched fabric interconnects, traditionally within data center environments. RDMA operates on the principle of transferring data directly from the user-defined memory of one system to another. These transfer operations can occur across a network and can bypass the operating system (OS), eliminating the need to copy data between user and kernel memory space. Direct memory operations are supported by allowing network adapters to register buffers allocated by the application. The emerging RDMA over Converged Ethernet (RoCE) standard lets users take advantage of these efficient communication patterns over widely deployed Ethernet networks.
Certain path characteristics are necessary to effectively use the RoCE protocol over wide-area networks. The path should be loss-free and should have deterministic and enforced bandwidth guarantees (e.g. such as those provided by OSCARS circuits). Even small amounts of loss or reordering can have a detrimental impact on RoCE performance. Note that the ability to do RoCE also requires RoCE-capable network interface cards (NICs), such as those sold by Mellanox.
If the DTN on your network are intending to use RoCE (RDMA over Converged Ethernet), Layer2 connectivity is required to use this protocol. It is suggested that an OSCARS circuit be created that directly connects DTN adapters, and that switching infrastructure in the middle respects the QoS parameters requested by the circuit. It is possible to use statically configured VLANs for this purpose, but the performance expectations of the protocol will make operation challenging.
Additional information can be found at the following links on experimentation in this space: