Increasing and measuring the performance of your ScaleIO environment

This post is a followup to the ScaleIO how to posts that have been posted here and here.

Now that you have your ScaleIO environment up and running after following the posts above, of course you want to see what kind of performance you will get out of it. There are many ways to go about doing this, and I’ll show you one method I’ve been using and also some of the results of said tests.

ScaleIO is handled a bit differently in a VMware environment, where it’s using an iSCSI connection instead of the native ScaleIO protocols. Because of the extra layers that are used (underlying storage (HDD/SSD/PCIe)->VMFS->VMDK->ScaleIO->iSCSI->VMFS->VMDK->Guest FS) you will probably see different performance numbers in a VMware environment compared to a physical one.

However, the method I’ve been using can be used for both physical and virtual ScaleIO environments, so read on.

To gain the best performance out of your ScaleIO environment, a few settings on the ScaleIO VMs need to be set first.

First of all, jumbo frames. Enable them using the following command:

ifconfig eth0 mtu 9000

It’s also recommended to increase the txqueuelen value from 1000 to 10000, like this:

ifconfig eth0 txqueuelen 10000

These are reset during reboot though, so please add them into your network configuration files if you want to continue using them after a reboot.

Here's a protip! If you want to run the same command on all your ScaleIO nodes, there's a tool called admincli.py on the MDM nodes that you can use, like this:

/opt/scaleio/ecs/mdm/diag/admincli.py --command "ifconfig eth0 mtu 9000"

When I created my ScaleIO VMs I created VMDKs on top of a VMFS volume on top of underlying storage. These VMDKs should always be created using Eager Zeroed Thick. I also used a paravirtualized SCSI adapter here, it’s not included in the official user guide but seems to have increased the performance a bit more. You can also play with increasing the vCPU count from 2 to 4 for a bit more performance but of course that eats up more CPU. Don’t touch the RAM though, you probably won’t ever need more than the 1GB that’s allocated.

If you are using SSDs or PCIe Flash as underlying storage, the recommendation when installing using the scripted way is to use the profile for SSDs. To do that, you run the following command during installation:

./install.py --all --vm --license=YOURLICENSEHERE --profile ssd

However, if you've already installed ScaleIO on top of your SSDs and would like to add the correct SSD configuration to your already existing environment, add the following into your /opt/scaleio/ecs/sds/cfg/conf.txt file on each SDS node:

tgt_net__recv_buffer=4096
tgt_net__send_buffer=4096
tgt_cache__size_mult=3
tgt_thread__ini_io=500
tgt_thread__tgt_io_main=500
tgt_umt_num=1200
tgt_umt_os_thrd=6
tgt_net__worker_thread=6
tgt_asyncio_max_req_per_file=400

Then restart your SDS by issuing the following command on each node:

pkill sds

The test bed I have consists of 4 ScaleIO VMs, each using an XtremSF Flash PCIe card as backend storage. I’ve created one volume of 2TB, given my 4 ESXi-hosts access to it, formatted it with VMFS5 and created 4 Ubuntu VMs with one drive each located on top of that ScaleIO volume. That second VMDK is also created using Eager Zeroed thick. It looks something like this:

In each Ubuntu VM, I’ve installed the load generating tool “fio”, which gives easy access to set things like block size, percent of read/write, if it should be random or not, etc. I’ve attached an example fio configuration file here:

[4k_random_read_90] 
# overwrite if true will create file if it doesn't exist
# if file exists and is large enough nothing happens
# here it is set to false because file should exist 

#rw=
#   read        Sequential reads
#   write       Sequential writes
#   randwrite   Random writes
#   randread    Random reads
#   rw          Sequential mixed reads and writes
#   randrw      Random mixed reads and writes
rw=randrw

# ioengine=
#    sync       Basic read(2) or write(2) io. lseek(2) is
#               used to position the io location.
#    psync      Basic pread(2) or pwrite(2) io.
#    vsync      Basic readv(2) or writev(2) IO.
#    libaio     Linux native asynchronous io.
#    posixaio   glibc posix asynchronous io.
#    solarisaio Solaris native asynchronous io.
#    windowsaio Windows native asynchronous io.
ioengine=libaio

# direct If value is true, use non-buffered io. This is usually
#        O_DIRECT. Note that ZFS on Solaris doesn't support direct
 io.
direct=1

# bs The block size used for the io units. Defaults to 4k.
bs=4k

# nrfiles= Number of files to use for this job. Defaults to 1.
#filename - Set the device special file you need
filename=/dev/sdb
size=200g
iodepth=64
numjobs=4
rwmixread=90

Past the content above into a file, name it "fio_4k_random_read_90" so you'll know it's for 4KB blocks, random read/write with a R/W ratio of 90/10. Then run it like this:

fio fio_4k_random_read_90

When running the fio workload from one Ubuntu VM, you will see some performance numbers immediately, and probably really good ones at that. What’s really cool though is when you run more than one fio workload, you’ll most probable see even more performance coming out of those HDDs/SSDs/PCIe cards that you have. So start up your engines!

When measuring performance, it’s easy to get lost in all the numbers flying by when using fio, so I suggest using the included ScaleIO dashboard. You can find the dashboard on the ScaleIO VM itself, it’s located under /opt/scaleio/ecs/mdm/bin/dashboard.jar. Just copy that to your own workstation and run it from there. When started, point it to your ScaleIO cluster IP, password not needed:

When connected, you’ll see something similar to this:

Yup, that’s one hundred thousand IOPS being handled by 4 ScaleIO VMs! Pretty crazy considering many other storage solutions would love to have numbers like this, and here we are with just 4 virtual machines and a few flash cards. I can definitely see a future for this product, what do you think?

Please comment below with your setup and your results!

About these ads

About Jonas Rosland

Solutions architect at Office of the CTO at EMC
This entry was posted in Converged Infrastructure, How to, Installation, ScaleIO, VMware. Bookmark the permalink.

9 Responses to Increasing and measuring the performance of your ScaleIO environment

  1. virtualprouk says:

    Hey Jonas

    What was the latency like on this test rig? I’m interested in the latency impact of pooling XtremSF cards across multiple servers, especially when mirrored protection is in place.

    Cheers

    Craig

    • Jonas Rosland says:

      Hi Craig,

      The latency is of course higher when needing to copy the blocks over the network and verify that we get an ACK back from the secondary SDS, than when running directly to the XtremSF cards. Also, the added layers of dual VMFS and VMDKs will increase latency as well. From the ScaleIO VMs we still see sub-ms latency, and above ms latency from the test VMs. It all depends on your workload as well, so YMMV.

      /Jonas

  2. Pingback: Microsoft SQL server testing with ScaleIO on VMware | pureVirtual

  3. Pingback: Cody Hosterman | Provisioning a new ScaleIO volume in a VMware environment

  4. Pingback: Provisioning a New Scala Volume in a VMWare Environment | ytd2525

  5. Pingback: ScaleIO – Snapshots och VMware | wimpyfudge

  6. Vladimir says:

    Hi Jonas,

    Impressive numbers really, i’m looking forward to testing ScaleIO soon.

    Is it possible for your to re-run the test above with iodepth=1 and numjobs=1? Yes, there are still single threaded applications out there in the real world ; )

    Cheers,
    Vladimir

    • Jonas Rosland says:

      Hi Vladimir,
      Single-threaded applications aren’t really a good use case for ScaleIO unless you have many single-threaded apps in the environment. This would give you a random workload, kinda like the one I’m showing here. So I would recommend you to try out those numbers in your own environment, and scale up the number of servers running those workloads and you’ll see an increase in overall performance.

      • Vladimir says:

        Single threaded small blocks 100% random workload kills any fancy SAN =)

        Increasing number of the servers connected to the SAN storage should help revealing the SAN storage’s actual overall performance, ie it’s about your storage usage efficiency. However this approach doesn’t improve IO performance of each individual server connected to the SAN storage.

        Even now I have SAN storage environment that can deliver way more IOPS and throughput (with decent latency) than any server / application in my environment can claim.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s