Virtual Circuit Strategies
Virtual Circuit Strategies
Virtual circuits (VC), provided through systems such as OSCARS or Internet2 Advanced Layer 2 Services (AL2S), are becoming a common part of the R&E networking experience. Integrating this functionality into the Science DMZ architecture and Data Transfer Nodes (DTNs) can be done in a number of different ways, depending on the needs of the environment and the sophistication of the maintainer.
Often it is the case that the VC is most useful when linking campus or research facilities together, and this implies the configuration must span several components of infrastructure that may not be directly controlled by the users at the end. The following documentation explores some of the options that are available to accomplish the goals of direct connectivity via Science DMZ technologies, including ways that Layer 2 switching and Layer 3 routing can worth together to accomplish goals of science.
Interfaces using Tagged VLANs
A VLAN is a group of ports that form a logical segment on an switch. The ports of a VLAN form an independent traffic domain in which the traffic generated by the nodes remains within the VLAN. This allows you to segment your network, though the switch's management infrastructure, so that nodes can be grouped into logical segments on a LAN, or even over a WAN when software such as OSCARS is used.
If the DTNs on either side have the ability to install and use a tagged VLAN corresponding to the connection between them (e.g. using RFC1918 space), and you have negotiated with the transit networks between the machines to cary the same VLAN tags, it is possible to set up Layer 2 connectivity. Applications can still use the Layer 3 local addresses to reach other members of the VLAN, but note that access to other global resources can only be accomplished using dual homing techniques.
This setup is very specialized for workflows that may involve a small subset of computers, that are able to be isolated from the general internet routed infrastructure. The DYNES collaboration uses this setup wherein dynamic circuits can be setup on a point to point basis between participating machines.
P2P VLANs on Routed Devices with BGP
If the virtual circuit (created statically or through systems such as OSCARS or AL2s) is setup to be between routed devices, such as Science DMZ routers, then any attached DTNs can behave normally, and the routers can forward the traffic between the DTNs over the virtual circuit rather than over the normal Layer3 path. This requires that the routers have addresses on the ends of the virtual circuits. This can be a more persistent capability, e.g. set this up once, and if the circuit is up the traffic traverses the circuit, else the traffic traverses the best-effort path. This is often accomplished by running BGP between the two Science DMZ routers over the virtual circuit, and exchanging routes for the DTNs over the BGP session.
The LHC experiments have used this model for several years, where a point to point virtual circuit is created between devices, a /30 address space is configured on the circuit, and BGP is used for signaling.
Static Policy Routing
If setting up BGP sessions across virtual circuits is not desirable, it is possible to set up a long lived virtual circuit and set up static policy routing. This avoids the need for BGP, but it means that if the virtual circuit goes down then the DTNs that are connected on the ends can't communicate in the default case. The policy can be configured with the ability to fail over, but this often implies that the policy routing needs to be aware of the state of the virtual circuit (something that BGP does automatically).
RDMA over Converged Ethernet (RoCE)
RDMA protocols (such as the InfiniBand Architecture) have played a significant role in enabling low-latency and high-throughput communications over switched fabric interconnects, traditionally within data center environments. RDMA operates on the principle of transferring data directly from the user-defined memory of one system to another. These transfer operations can occur across a network and can bypass the operating system (OS), eliminating the need to copy data between user and kernel memory space. Direct memory operations are supported by allowing network adapters to register buffers allocated by the application. The emerging RDMA over Converged Ethernet (RoCE) standard lets users take advantage of these efficient communication patterns over widely deployed Ethernet networks.
Certain path characteristics are necessary to effectively use the RoCE protocol over wide-area networks. The path should be virtually loss-free and should have deterministic and enforced bandwidth guarantees (e.g. such as those provided by OSCARS circuits). Even small amounts of loss or reordering can have a detrimental impact on RoCE performance. Note that the ability to do RoCE also requires RoCE-capable network interface cards (NICs), such as those sold by Mellanox.
If the DTN on your network are intending to use RoCE (RDMA over Converged Ethernet), Layer2 connectivity is required to use this protocol. It is suggested that an OSCARS circuit be created that directly connects DTN adapters, and that switching infrastructure in the middle respects the QoS parameters requested by the circuit. IT is possible to use statically configured VLANs for this purpose, but the performance expectations of the protocol will make operation challenging.
Additional information can be found at the following links on experimentation in this space: