Menu

DTN Tuning

Tuning your DTN host is extremely important. We have seen overall IO throughput of a DTN more than double with proper tuning.

Tuning can be as much art as a science. Due to differences in hardware, its hard to give concrete running advice. In general you should attempt to tune one thing at a time, and runs some benchmarks to see if it made a difference. Some sample benchmark commands are shown here.

Here are some tuning settings that we have found do make a difference. This page assumes you are running a Redhat-based Linux system, but other types of Unix should have similar tuning nobs. Note that you should always use the most recent version of the OS, as performance optimizations for new hardware are added to every release.

Additional information on tuning for 40/100G hosts can be found here.

Network

Network tuning is the most important thing to pay attention to. Be sure to following the the advice in our Linux Tuning Guide, and also pay attention our are NIC tuning advice as well. If you are trying to get as much bandwidth as possible out of your DTN, you'll also want to do Interrupt Binding.

BIOS

For PCI gen3-based hosts, you should enable “turbo boost”, and disable hyperthreading and node interleaving. More information on BIOS tuning is described in this document.

I/O Scheduler

The default scheduler on some versions of Linux is the "fair" scheduler. For a DTN node, we recommend using the "deadline" scheduler instead. To enable deadline scheduling, add "elevator=deadline" to the end of the "kernel' line in your /boot/grub/grub.conf file, similar to this:

kernel /vmlinuz-2.6.35.7 ro root=/dev/VolGroup00/LogVol00 rhgb quiet elevator=deadline

File System

We recommend using the ext4 file system in Linux for DTN nodes.

Increasing the amount of "readahead" usually helps on DTN nodes where the workflow is mostly sequential reads. However you should definitely test this, as some RAID controllers do this already, and changing this may have adverse affects. Setting readahead should be done at system boot time. For example, add something like this to /etc/rc.local:

 /sbin/blockdev --setra 262144 /dev/sdb

More information on readahead is in this paper on Linux 2.6 performance improvement through readahead optimization.

EXT4 Tuning

In order to operate optimally when operating on RAID systems, the file system should be tuned to the physical layout of the drives. Stride and stripe-width are used to align the volume according to the stripe-size of the RAID.

  • stride is calculated as Stripe Size / Block Size.
  • stripe-width is calculated as stride * Number of Disks Providing Capacity.

Disabling journaling will also improve performance, but reduces reliability. More information on tuning ext4 for RAID can be found here.

Sample mkfs command:

/sbin/mkfs.ext4  /dev/sdb1 –b 4096 –E stride=64 stripe-width=768 –O ^has_journal 

There are also tuning settings that are done at mount time. Here are the ones that we have found improve DTN performance:

  • data=writeback – this option forces ext4 to use journaling only for metadata. This gives a huge improvement in write performance
  • inode_readahead_blks=64 –  this specifies the number of inode blocks to be read ahead by ext4’s readahead algorithm. Default is 32.
  • Commit=300 – this parameter tells ext4 to sync its metadata and data every 300 s. This reduces the reliability of data writes, but increases performance.
  • noatime,nodiratime – these parameters tells ext4 not to write the file and directory access timestamps. 

Sample fstab entry:

/dev/sdb1 /storage/data1 ext4 inode_readahead_blks=64,data=writeback,barrier=0,commit=300,noatime,nodiratime

More information on ext4 options can be found here.

RAID Controller

Different RAID controllers provide different tuning controls.Check the documentation for your controller and use the settings recommended to optimize for large file reading. You will usually want to disable any “smart” controller built-in options, as they are typically designed for different workflows.

Here are some settings that we found increase performance on a 3ware RAID controller. These settings are in the BIOS, and can be entered by pressing Alt+3 when the system boots up.

  • Write cache – Enabled
  • Read cache – Enabled
  • Continue on Error – Disabled
  • Drive Queuing – Enabled
  • StorSave Profile – Performance
  • Auto-verify – Disabled
  • Rapid RAID recovery – Disabled

Virtual memory Subsystem

Setting dirty_background_bytes and dirty_bytes  improves write performance. For our system, the settings that gave best performance was:

echo 1000000000 > /proc/sys/vm/dirty_bytes
echo 1000000000 > /proc/sys/vm/dirty_background_bytes

 

On heavily used DTN's we've seen cases where the host will run out of memory and give an error such as:

SLUB: Unable to allocate memory on node

Reserving about 5% of the RAM for the VM subsystem using vm.min_free_kbytes seems to fix the problem.

For example, for a host with 96MB of RAM, add the following to /etc/sysctl.conf to set min_free to 4MB:

  vm.min_free_kbytes = 4096000 

More information on the Linux VM subsystem is here.

 


SSD Tuning

Tuning your SSD is more about reliability and longevity than performance, as each flash memory cell has a finite lifespan
that is determined by the number of "program and erase (P/E)" cycles. Without proper tuning, SSD can perform less than a a traditional HD, and can die within months.

And remember, never do "write" benchmarks on SSD:  this will damage your SSD quickly.

Modern SSD drives and modern OSes should all includes TRIM support, which is important to prolong the live of your SSD. Only the newest RAID controllers include TRIM support (late 2012). This article explains why trim is important and how it works:

Swap


To prolong SSD lifespan, do not swap on an SSD. In Linux you can control this using the sysctl variable vm.swappiness. For example, add this to  /etc/sysctl.conf:

 vm.swappiness=1 

This tells the kernel to avoid unmapping mapped pages whenever possible.

RAMDISK


To avoid frequent re-writing of files (for example during compiling code from source), use a ramdisk file system (tmpfs) for /tmp /usr/tmp, etc.

ext4 file system tuning for SSD


These mount flags should be used for SSD partitions.

  • noatime: Reading accesses to the file system will no longer result in an update to the atime information associated with the file. This eliminates the need for the system to make writes to the file system for files which are simply being read.
  • discard: This enables the benefits of the TRIM command as long for kernel version >=2.6.33.

Sample /etc/fstab entry:

/dev/sda1 /home/data ext4 defaults,noatime,discard 0 1

For more information see performance tuning for SSD in Linux.