For older versions of RHEL-based Linux (pre v7.2), FQ is not available. But you can still pace packets using tc and "HTB". We recommend you update to a version of Linux that supports FQ.
Testing on Linux (pre RHEL 7.2) has shown that using a simple Hierarchical Token Bucket (HTB) queue (chosen to closely model an interface being controled by OSCARS, and having multiple flow requirements) and traffic rates that are slightly below the bottleneck capacity greatly improve TCP performance. For a 10Gbps host sending to a 1Gbps circuit on a 36ms RTT path, performance went from 25Mbps to 825Mbps, a dramatic improvement!
Below is a simple Linux tc example to set up a 950Mbps shaper to a particular subnet.
#clear out any existing tc rules /sbin/tc qdisc del dev eth0 root
#create a Hierarchical Token Bucket
/sbin/tc qdisc add dev eth0 handle 1: root htb
#add a 'class' to our route queue with a rate of 900Mbps /sbin/tc class add dev eth0 parent 1: classid 1:1 htb rate 900mbit
#create a filter that restricts our tc queue and class to a specific source subnet /sbin/tc filter add dev eth0 parent 1: protocol ip prio 1 u32 match ip dst X.Y.Z.0/24 flowid 1:1
For more information on tc, see this howto guide.
Choosing a queue discipline is an important aspect of this exercise. In the general case TBF works for single flows, or when the destination will always be the same. HTB is a better option if there are multiple destinations and flows originating from a single source.
For a detailed example and results using tc with an OSCARS virtual circuit, see this document from Jason Zurawski.
Note that tc does not work reliably above 9Gbps on 40G NICs (host with 2.9GHz CPU), and may not work well above around 2Gbps on older hardware.
Starting with Linux 3.11 (available in Fedora 20, Debian 8, and Ubuntu 13.10, and backported to CentOS 7.2), there is a new, simpler method called "Fair Queuing (FQ)" that can help in many situations.
tc qdisc add dev $ETH root fq
or to both pace and shape the bandwidth:
tc qdisc add dev $ETH root fq maxrate Ngbit
This has been shown to lead to much better throughput on some network paths.