Add a local vSphere ESXi user without using the vSphere GUI

So while hacking away at a demo I wanted to create a local user on my vSphere ESXi box, and got frustrated that according to all the documentation out there I had to use the vSphere GUI (or vicfg) to create one. Logged on locally to the ESXi box I wondered why the normal “adduser” command wasn’t available, but it turns out it actually is.

vSphere ESXi utilizes a binary called “busybox” for a lot of standard linux commands, as you can see if you look at the linked binaries in /bin:

~ # ls -l /bin/
<snip>
lrwxrwxrwx 1 root root 35 Apr 9 21:41 stty -> /usr/lib/vmware/busybox/bin/busybox
lrwxrwxrwx 1 root root 35 Apr 9 21:41 sum -> /usr/lib/vmware/busybox/bin/busybox
-r-xr-xr-x 1 root root 3432 Feb 22 01:27 summarize-dvfilter
lrwxrwxrwx 1 root root 35 Apr 9 21:41 sync -> /usr/lib/vmware/busybox/bin/busybox
lrwxrwxrwx 1 root root 35 Apr 9 21:41 tail -> /usr/lib/vmware/busybox/bin/busybox
lrwxrwxrwx 1 root root 35 Apr 9 21:41 tar -> /usr/lib/vmware/busybox/bin/busybox
<snip>

However, “adduser” isn’t there, but it is in the busybox binary itself. To use this command, simply call “busybox” with “adduser” as a parameter, and then the normal “adduser” parameters after that, like this:

~ # /usr/lib/vmware/busybox/bin/busybox adduser
BusyBox v1.20.2 (2012-12-11 11:54:28 PST) multi-call binary.

Usage: adduser [OPTIONS] USER

Add a user

 -h DIR Home directory
 -g GECOS GECOS field
 -s SHELL Login shell
 -G GRP Add user to existing group
 -S Create a system user
 -D Don't assign a password
 -H Don't create home directory
 -u UID User id

So, if you wanted to create the user “jonas” as a member of the root group with “/” as home directory and with “/bin/sh” as shell you could simply do this:

~ # /usr/lib/vmware/busybox/bin/busybox adduser -s /bin/sh -G root -h / jonas
Posted in How to, Installation, VMware, vSphere5 | 1 Comment

Optimized versus flexible – You’re asking the wrong question

I’ve heard from many customers and partners that they want an optimized and flexible architecture for their new or future infrastructure, and I would like to voice my concerns about this point of view when looking at new datacenter architectures.

Let’s start by going back a bit to the 2nd platform that has served us so well for the last 15 years. The start of that platform consisted of somewhat standardized servers running an application or two, and the last few years we’ve been loading those servers with several virtualized apps per server. With the adaption of virtualization, we also saw an explosion of the number of apps that were actually running inside a datacenter. This led us to have an infrastructure that needed to be very flexible as it would have to handle a large variety of applications on the same hardware, be it server, network or storage.

The continued centralization of applications have brought us a lot of ease of use and management, and many times when looking at new infrastructure for new apps we’re counting on the old ways of centralizing and virtualizing applications will be the answer to everything. However, using centralized virtualization clusters, centralized storage, and core networking, might not work for many of the new types of applications you want to deploy.

3rdplatform

Now, when were moving into the 3rd platform, many of the new apps are optimized for scale-out architectures and are therefore also demanding an optimized infrastructure. When being scale-out, it means that apps can use resources where available, not necessarily just the centralized resources. Your application might span from the DMZ, to internal, to a cloud provider, with improved performance and availability. Here’s where the problem comes in. If you’re used to having an very efficient and flexible (not optimized) infrastructure for a large variety of applications, it might be hard to see how you would go about to enjoy optimal performance on your new apps. Hint: it’s probably not by using the old ways.

Let’s start with centralization. It’s been great. Awesome even. Having a hundred VMs running on one centralized server, that is a great feat of technology. I’m sure we’ll continue enjoying the benefits of centralization for many of our internal applications for years to come, but many of the new app technologies or parts thereof will not, cannot or should not be centralized. Some examples I see include content distribution, both internal and external, data analysis, and our old friend cloud computing. Most of these include some way of scaling out.

“But we can just put it in here with all the other stuff!”
No. Stop. Don’t.

Some parts of it might be a great fit. All of it? Probably not. Running data analysis on the exact same hardware where you’re running your Exchange mailboxes and CRM systems might not be such a good idea (unless your data set is very small). I’m not saying you’ll abandon your centralized systems, absolutely not, I’m just saying you’ll use them in a different way. And I’m not saying that you wouldn’t benefit from building your own cloud environment, absolutely not, just that everything in a private cloud might not run in a  centralized environment.

One thing that’s not mentioned enough IMHO when looking at new applications and the 3rd platform, is the Hot Edge and Cold Core, and it’s really important. Your centralized, flexible systems will essentially become the Cold Core where we can and will store data for a longer period of time at a lower cost. The new Hot Edge will be where you will put customer-facing (internal and external) applications that are on an optimized platform. Everyone is connecting to it through different networks, devices and places. You want to have your latest code running there, you want the latest features, and the fastest most optimized hardware. This is where you will see the most benefits of not having a centralized flexible solution. With an optimized Hot Edge, you can more easily control and predict the application performance, instead of trying to shuffle around other virtualized applications that have nothing to do with the delivery of data from that app.

Picture the gymnast on top and imagine your infrastructure is as flexible as that, which would be great, right? It could do pretty much anything you needed it to do, and stretch and bend in many ways. But it might not be as efficient as others when trying to run fast or do really heavy lifting. That’s where the optimized infrastructure come in. So when you talk about wanting an optimized flexible infrastructure, you’re most likely talking about an infrastructure for the 2nd platform applications. Some of them will probably be exchanged with a 3rd platform variant during the next 3-5 years, so I would suggest you look for a flexible infrastructure for those (which you can then also use as a Cold Core later on), and focus your vision on infrastructure that will be optimized for your new 3rd platform applications.

Posted in Big Data, IT Transformation, Predictions | Leave a comment

Free ScaleIO licenses for EMC Elect!

Screen Shot 2014-02-26 at 17.25.53

Yes, it’s true. The EMC Elect, the community-driven recognition for individuals that have provided help, support and been generally awesome people around EMC and its products, will now be getting free ScaleIO licenses. You might have read about ScaleIO on my blog before, and for you and those chosen for the EMC Elect program I would like to share how to best make use of your new licenses and software!

First off, if you missed the online session where Erez Webman and Boaz Palgi went through the ins-and-outs of ScaleIO, I highly recommend you watch it. You should have access to it via the EMC Elect portal.

Secondly, I would like you to make sure you have the proper equipment to run ScaleIO. Most homelab setups would most likely work fine, but the more hardware you have the more value you will get out of it. ScaleIO scales pretty much linearly when you add nodes and disk to it, so you will get a better experience the more hardware you have. Don’t expect to get blazingly fast flash performance out of a couple of tired SATA drives connected to servers with early Intel Celeron CPUs. Ok? Ok.

Ok, let’s get to it. For a VMware installation, continue reading. For local laptop installations, read here and then continue below the VMware installation for cool use cases.

If you’re going to run ScaleIO in a VMware environment, please follow the steps outlined here and here. When you do this for the first time it will probably take some time, but fear not, if you don’t want to go through the whole GUI setup you can actually make it very easy by editing the following file, upload it to your ScaleIO environment and run the automated installation. Yes, just configure a text file, upload it to a server, run a script and voila, you have a fully functional scale-out storage solution! Pretty cool IMHO.

Please make sure you read through the two installation posts above even if you’re going to use the edited configuration file, as they contain a lot of good information. When you’re done with that, you can edit the following for your environment and store it as site.cfg under /opt/scaleio/siinstall/ECS on one of the ScaleIO nodes, doesn’t matter which one. The template below configures 3 virtual SDS nodes that are also SDCs and gives access to a 2048GB large volume (of course you need to have the correct size of storage underneath, otherwise change it) to three ESX hosts using iSCSI. Yes, you need to configure iSCSI in your VMware environment for this to work. Also, if you want to try it out with more nodes (remember better experience with scale-out?) please do!

[global]
virtual_ip = 192.168.0.10
sds = SDS-A;SDS-B;SDS-C
sdc = SDC-A;SDC-B;SDC-C
volumes = volume1
domains =

[domains]
pdomain = pool1

[initiators]
esx01 = iqn.1998-01.com.vmware:esx01
esx02 = iqn.1998-01.com.vmware:esx02
esx03 = iqn.1998-01.com.vmware:esx03

[email_alert]
email_to = admin@your.company.name
severity = error

[smtp]
host_ip = localhost
port = 25
tls = no
username = scaleio
password = admin
email_from = scaleio@your.company.name

[miscellaneous]
encryption = no
password = 0000000000

[mdm_primary]
ip = 192.168.0.11
password = 0000000000
os = linux
virtual_nic = eth0

[mdm_secondary]
ip = 192.168.0.12
password = 0000000000
os = linux
virtual_nic = eth0

[tb]
ip = 192.168.0.13
password = 0000000000
os = linux

[SDS:SDS-A]
ip = 192.168.0.11
password = 0000000000
domain = pdomain
devices = /dev/sdb
pools = pool1
os = linux

[SDS:SDS-B]
ip = 192.168.0.12
password = 0000000000
domain = pdomain
devices = /dev/sdb
pools = pool1
os = linux

[SDS:SDS-C]
ip = 192.168.0.13
password = 0000000000
domain = pdomain
devices = /dev/sdb
pools = pool1
os = linux

[SDC-A]
ip = 192.168.0.11
password = 0000000000
os = linux
iscsi = yes

[SDC-B]
ip = 192.168.0.12
password = 0000000000
os = linux
iscsi = yes

[SDC-C]
ip = 192.168.0.13
password = 0000000000
os = linux
iscsi = yes

[volume1]
size = 2048
hosts = SDC-A;SDC-B;SDC-C
domain = pdomain
initiators = esx01;esx02;esx03
pool = pool1

When you have the configuration file set up, just run the following in the /opt/scaleio/siinstall/ECS folder and you should see your ScaleIO environment start taking form:

./install.py --all -c site.cfg --license YOURLICENSEHERE

That’s about it for the VMware part. As I stated before, if you want to run it on your local laptop for demos and lighter functionality testing, please follow the blog post here to use Vagrant and VirtualBox to set up a fully functional ScaleIO environment. Then come back and continue reading.

Ok, so now what? You have a ScaleIO environment, what can you do with it? Well, if you’re running it in a VMware environment, the answer is simple. Connect your vSphere-hosts to the iSCSI IP, and create a VMFS volume of course! Then start storing your virtual machines on it. Easy peasy 🙂

If you’re running it locally on your laptop, you can start up another VM, install only the SDC on it, add it as an SDC for the ScaleIO volumes you have and connect it to the volumes. How about storing your MySQL database there? Maybe run your postfix mail server there? Or maybe set up a cool load balanced cluster of nginx web servers with one volume each, spreading the load across not only CPU and RAM but also disk? It’s entirely up to your imagination what type of applications you want to put on it – what interesting use cases can you come up with?

Posted in EMC | 2 Comments

My IT predictions for 2014

Disclaimer: These are my predictions, not those of my employer. And yeah, I might be a bit late to the 2014 predictions, still wanted to share though.

During 2013, a lot changed on the playing field of IT. Software-Defined Everything started popping up everywhere, and Software-Defined Networking was all of a sudden in two camps with VMware and Cisco on each side. Seeing as they are great partners to each other I believe they will support each others strategy in the long run and continue to develop their own infrastructure (software/hardware) and partner flora. Choice and open competition usually means a better product for the end-user, and I think this is the case here as well.

So in 2014 we’ll see these strategies take more form than before, but I don’t think it’ll stop there. I also saw 2013 as the year-before-the-boom of automation. Everyone talks about it, many have products for it, and they’re now ready for the mass market. No longer needing to rely on old handwritten scripts to do new application deployments and maintenance on the one’s you already have, we’ll see a workforce in IT delivering an even better service to their organisation. With automation, we can lower the human error percentage (there will still be a percentage) and make things faster and more consistent. And go home at 5PM, spend time with our families and friends, maybe have a glass of your preferred beverage at the local pub. All in all a win-win for everyone, IMHO.

While this is happening, and IT departments are changing in the ways of Software-Defined infrastructure and automation, I see this as one of their first steps towards a long-term partnership with service providers. As IT departments are realising that their own infrastructure isn’t that far from an SPs, and technologies for connectivity have advanced beyond just a regular VPN, there can be massive savings to be had. Instead of maintaining their own infrastructure, cooling systems, reserve power, diesel generators, datacenter facilities etc, I think many IT departments will move portions of their current flora of applications out to an SP. And then spend the savings on educating their personell to be even more efficient and responsive to the real business needs. Once again, a win-win.

And while they’re at it, I hope they look into other alternatives for their applications as well, making sure that they don’t maintain a legacy app just because none knows of any other options. There are many great options for a lot of legacy software that can be easily used, both on-prem and online at Software-as-a-Service vendors. And those vendors are experts at that specific software, not needing you to be an expert in setting it up, configuring it, securing it etc, meaning parts of the internal IT department can become something else than SharePoint Workflow experts (for example) which will benefit both the IT organisation and the business. Win-win.

So yes, I am predicting general broad strokes in where the industry is headed. Technology-wise we all know there will be marvellous inventions and solutions, and I can’t wait to get my hands on them and try them out, but I am not the right person to predict those. Perhaps you are? Do you agree or disagree with my predictions? Please leave a comment.

Posted in Automation, EMC, IT Transformation, Predictions | Tagged , | 2 Comments

Splunk on VMware and EMC ScaleIO – A quick index performance test

The last few weeks I’ve been getting acquainted with Splunk, a powerful tool for searching, analysing and visualising logs and events that happen in your infrastructure, live application performance and any type of machine generated data. I read the performance blog post that Splunk had previously done on physical bare-metal hardware and Amazon ECS instances, and wanted to se what I could get in a virtual environment on top of EMC’s scale-out block storage ScaleIO (which I’ve written several posts on here).

Generally speaking, virtualising Splunk has been frowned upon as Splunk consumes a lot of resources, more and more as you add more data ingestion and more searches. Physical bare-metal servers have been the de facto standard for Splunk servers for years, but I still wanted to see what we could do with virtual instances of it. Here’s the setup:

4 Splunk 6.0 servers, configured in a VMware environment with 12 vCPUs and 12 GB RAM as is recommended in the Splunk Enterprise installation guide. 
Each Splunk server has a ScaleIO volume attached to it for the entire /opt/splunk directory, containing the Splunk installation and all log and index files.
These ScaleIO volumes are running on top of EMC’s XtremSF PCIe Flash cards.

For the tests I used a standard tool for performance testing of Splunk, namely Splunkit. This tool can be used for generating a large log file, which can then be tested and indexed by Splunk itself.

To configure Splunkit like I did, edit the file called “pyro.properties” like this:

### SPLUNKIT PROPERTIES ###
# SPLUNK_HOME, the absolute path to the Splunk installation on this machine,
# e.g: on Linux: /home/user/splunk, usually ending in "/splunk"
# e.g. on Windows: C:\Program Files\Splunk

SPLUNK_HOME = /opt/splunk

# Host or IP of the server machine (this machine), as seen by the SplunkIt search user
# Server test process will bind to this address
# User's server_host (defined in splunkit-user/pyro.properties) must match this for proper test operation
# If left blank, will default to this machine's hostname

server_host = 127.0.0.1

# Admin-level login credentials of the Splunk instance
username = admin
password = yourpasswordhere

static_filesize_gb = 150

Then, create the log file by running the following command in the splunkit-server directory:

python bin/gendata.py

When the data has been generated, start the index test by running this command in the same directory:

python bin/indextest.py

Now login to your Splunk instance, and go to the Splunk-on-Splunk tab, and you should see something like this:

wpid-Screen-Shot-2014-01-24-at-13.16.17.png

That graph will show you the current estimated indexing rate, which is always interesting (this one shows close to 30000KB/sec). But if you want to compare your indexing performance to other benchmarks, you can click the “View results” link to get to another search, and enter the following search term:

index=_internal host=“localhost.localdomain” source=“*metrics.log” eps=“*” group=per_index_thruput series=splunkit_idxtest

This will give you a view of your current “eps”, events per second, which you can then compare to other benchmarks like the ones I mentioned in the beginning of this post.

So what eps values did I get out of my virtualised Splunk Enterprise environment? Pretty good ones I must say. And note that this is on a ScaleIO shared scale-out block storage, not individual independent local drives in each server. Also, it’s one volume per server, not a striped volume across multiple virtual drives. So no LVMs or anything like that, and regular ext4 filesystems without any tuning. Your basic server setup so to say 🙂

System Splunk Version Virtual Hardware Average EPS
Splunk-Index1 6.0 12 vCPUs, 12 GB RAM 86931 eps
Splunk-Index2 6.0 12 vCPUs, 12 GB RAM 90242 eps
Splunk-Index3 6.0 12 vCPUs, 12 GB RAM 87199 eps
Splunk-Index4 6.0 12 vCPUs, 12 GB RAM 92792 eps

So as you can see, we’re surpassing the performance numbers of the tests mentioned before, which is great! However, it will be even more interesting when we continue to do massive log input and then add searches on top, to see if we can maintain performance or not. And according to the performance number we get from the ScaleIO environment (see below), we’re nowhere near saturated on disk right now, which hopefully means that we can squeeze out the searches without a heavy impact on the indexing performance.

Screen Shot 2014-01-24 at 13.10.03

Stay tuned for the next installment of these posts 🙂

Posted in Big Data, Converged Infrastructure, EMC, Experiment, ScaleIO, Splunk, VMware, vSphere5 | Tagged , | 7 Comments

Automate a ScaleIO lab setup with Vagrant and VirtualBox

If you’ve read the other blog posts on ScaleIO you might be interested in running it yourself. However, you might not have your own hardware lab to run it on, but you do have a laptop or desktop, right? Awesome! That’s all you need, and we’ll go through how to get it up and running by using some really smart tools.

If you just want to see how it runs without installing anything, here’s the entire automated setup captured in asciinema, one of my new favourite tools:

http://asciinema.org/a/6543

First tool we’ll use is VirtualBox, a freely available and open source virtualization solution (yes, no money needed to get it but please contribute to the development!) for Windows, OS X, Linux and Solaris. Download it, install it, and that’s it. No configuration needed unless you want to change any of the defaults we’ll be working with. It is a really good virtualization solution and I’ve been using it for years next to my VMware Workstation and VMware Fusion installations.

Next up is Vagrant, an awesome tool for automating creation and configuration of VMs running in VirtualBox, VMware Fusion, AWS and others. It runs on Windows, OS X and Linux as well, so no matter how you spell your favourite OS you’ll be able to use it. Download it, install it and you’re ready to go. No configuration needed there either, as all the settings we’ll use with Vagrant will be in a so called Vagrantfile.

If you want to try Vagrant and VirtualBox before we get to the ScaleIO deployment, you can create a folder called “vagrant”, open your terminal/command window into that folder, and run the following commands to install and start a recent Ubuntu distribution automatically:

vagrant box add saucy http://cloud-images.ubuntu.com/vagrant/saucy/current/saucy-server-cloudimg-amd64-vagrant-disk1.box
vagrant init
vagrant up

Vagrant will now download a “Cloud Image” of Ubuntu 13.10, initialize the Vagrant directory with a Vagrantfile and start the VM. After it’s booted, you can ssh to it with the following command:

vagrant ssh

That’s it! Now you have a fully installed Ubuntu 13.10 VM where you can do whatever you want, and all you’ve done is issue a few commands 🙂

Ok, now that you’ve become somewhat comfortable with VirtualBox and Vagrant, let’s move on to the ScaleIO lab setup. All you need for this is the ScaleIO installation package that you can find on support.emc.com, unpack it and you’ll find a folder called CentOS_6.X, and there you’ll see a file called “ECS-1.20-0.357.el6-install“, which is the most recent version by the time of this writing.

Create a new directory called “scaleio” somewhere on your computer and copy the installation file there. As you saw in the example above, you will also need a Vagrantfile to actually get your VMs up and running, and instead of letting you figure out by yourself how to do that I am providing such a Vagrantfile for your use here. Coming with no warranty and I’m not responsible if your computer breaks in any way 🙂

When you have all that, your “scaleio” folder should look like this:

$ ls
ECS-1.20-0.357.el6-install Vagrantfile

That’s all you need! Crazy, I know. But if you look at the Vagrantfile, you’ll see that we are in fact doing a lot of things in there. First, we’re defining three VMs (3 nodes is the minimum for a ScaleIO environment), setting static IPs on them, and running a really long shell command on each node which will automatically install and configure ScaleIO to use a truncated 100GB file as the SDS devices, and create an 8GB volume on it. There are no clients defined outside the ScaleIO environment, I’m leaving that as an exercise for you, dear reader.

One thing that needs to be changed in the Vagrantfile is the string called YOURLICENSEHERE in the long string of commands in the bottom of the file. Add your own ScaleIO license there and you’re done, and now run the following command to bring up the entire ScaleIO environment:

vagrant up

This will take a while so go grab a coffee and relax. I highly recommend using an SSD drive for this, if you don’t have one already isn’t it time you get one? Anyway, after the environment has been setup and is running, you can do the following to connect to the first MDM:

vagrant ssh mdm1

Then issue this command to verify that the install was completed correctly:

sudo scli --query_all --mdm_ip=192.168.50.10

You should see output similar to this:

[vagrant@mdm1 ~]$ sudo scli --query_all --mdm_ip=192.168.50.10
ScaleIO ECS Version: R1_20.0.357
Customer ID: XXXXXX
Installation ID: XXXXXXXXXXXXX
The system was activated 0 days ago
Rebuild network data copy is unlimited
Rebalance network data copy is unlimited
Query all returned 1 protection domains

Protection domain pdomain has 1 storage pool, 3 SDS nodes, 1 volumes and 112 GB (114688 MB) available for volume allocation
Rebuild/Rebalance parallelism is set to 3
Storage pool Default has 1 volumes and 112 GB (114688 MB) available for volume allocation

SDS Summary:
3 SDS nodes have Cluster-state UP
3 SDS nodes have Connection-state CONNECTED
3 SDS nodes have Remove-state NONE
3 SDS nodes have Device-state NORMAL
276.3 GB (283026 MB) total capacity
229.7 GB (235268 MB) unused capacity
0 Bytes snapshots capacity
16 GB (16384 MB) in-use capacity
16 GB (16384 MB) protected capacity
0 Bytes failed capacity
0 Bytes degraded-failed capacity
0 Bytes degraded-healthy capacity
0 Bytes active-source-back-rebuild capacity
0 Bytes pending-source-back-rebuild capacity
0 Bytes active-destination-back-rebuild capacity
0 Bytes pending-destination-back-rebuild capacity
0 Bytes pending-rebalance-moving-in capacity
0 Bytes pending-fwd-rebuild-moving-in capacity
0 Bytes pending-moving-in capacity
0 Bytes active-rebalance-moving-in capacity
0 Bytes active-fwd-rebuild-moving-in capacity
0 Bytes active-moving-in capacity
0 Bytes rebalance-moving-in capacity
0 Bytes fwd-rebuild-moving-in capacity
0 Bytes moving-in capacity
0 Bytes pending-rebalance-moving-out capacity
0 Bytes pending-fwd-rebuild-moving-out capacity
0 Bytes pending-moving-out capacity
0 Bytes active-rebalance-moving-out capacity
0 Bytes active-fwd-rebuild-moving-out capacity
0 Bytes active-moving-out capacity
0 Bytes rebalance-moving-out capacity
0 Bytes fwd-rebuild-moving-out capacity
0 Bytes moving-out capacity
16 GB (16384 MB) at-rest capacity
8 GB (8192 MB) primary capacity
8 GB (8192 MB) secondary capacity
Primary-reads:                        0 IOPS 0 Bytes per-second
Primary-writes:                       0 IOPS 0 Bytes per-second
Secondary-reads:                      0 IOPS 0 Bytes per-second
Secondary-writes:                     0 IOPS 0 Bytes per-second
Backward-rebuild-reads:               0 IOPS 0 Bytes per-second
Backward-rebuild-writes:              0 IOPS 0 Bytes per-second
Forward-rebuild-reads:                0 IOPS 0 Bytes per-second
Forward-rebuild-writes:               0 IOPS 0 Bytes per-second
Rebalance-reads:                      0 IOPS 0 Bytes per-second
Rebalance-writes:                     0 IOPS 0 Bytes per-second

Volume Summary:
1 volume. Total size: 8 GB (8192 MB)

I would also recommend you to point the ScaleIO dashboard, found in mdm1 and mdm2 at /opt/scaleio/ecs/mdm/bin/dashboard.jar, to your new cluster. Just copy the dashboard.jar file to your desktop, and if you haven’t changed the IP addresses set in the Vagrantfile you can point it to 192.168.50.10, and get the following dashboard image:

Screen Shot 2013-12-06 at 2.58.17 pm

And there you go, you now have a complete three node ScaleIO cluster up and running on your own computer, where you can try to put data, read data, fail nodes etc. Play around with it, and please comment on improvements you would like to see and if you’re editing or adding functionality, please let me know. Enjoy!

Posted in Automation, EMC, Experiment, How to, Installation, ScaleIO | Tagged | 15 Comments

Microsoft SQL server testing with ScaleIO on VMware

What ScaleIO is and how it works has been covered in earlier blog posts here, here, here and here, and this post is a follow-up with some of the performance testing we did with Microsoft SQL server on top of ScaleIO in a VMware environment.

We used the ScaleIO setup described earlier in this post, and we wanted to see how well the ScaleIO storage solution worked with a non-scale-out application workload such as Microsoft SQL server. The setup of the MSSQL servers looked like this:

wpid-Screen-Shot-2013-11-29-at-3.00.22-pm.png

To run the tests we used TPC to generate a workload on the four MSSQL servers we had set up. We started with having these SQL servers on a VNX, in a mixed storage pool with some other VM running as well. A normal situation in many aspects, and we wanted to see what would happen if we moved those SQL workloads to a separate environment. We expected increased performance of course, and that’s exactly what we got!

The picture above shows the Disk reads/sec (thick black line) that we had during the TPC tests. The left part is when we were running on the VNX with _one_ MSSQL workload, the right shows it running on ScaleIO with _four_ simultaneous workloads! So yes, we’re not comparing apples to apples, but rather showing what you can expect when moving to a ScaleIO environment.

This second picture shows the Disk latency, once again with one workload on the VNX to the left and four workloads on ScaleIO to the right. Even though we added more workloads the latency numbers were still really good. Pretty cool, right?!

Here’s the ScaleIO dashboard during the test of four SQL servers running the TPC workload, showing a consistent 45000 IOPS being handled by the underlying storage infrastructure:

Screen Shot 2013-11-14 at 9.20.20 am

So not only did we drastically increase the amount of data that could be handled, being able to add more MSSQL servers, we also lowered the latency so much that we could actually run even more work on these nodes. The test was based on 40 users per SQL instance, and we see here that we could easily increase that when running on top of ScaleIO. This means you can easily migrate from an existing environment to ScaleIO using VMware Storage VMotion, and do some testing on your own 🙂

In conclusion, a properly architected ScaleIO platform will provide similar and likely better throughput and latency performance compared to traditional storage. The ScaleIO architecture keeps your data closer to your processor while leveraging high performance SSD and in our test case PCIe flash cards for optimal performance.

I want to thank Ed Walsh, Jase McCarty, Cody Hosterman and Txomin Barturen for all their help of setting up the environment and running the tests.

So, what workloads would you like to run on ScaleIO? Add your wish list below in the comment field!

Posted in Converged Infrastructure, Installation, ScaleIO, VMware | Tagged , , , , , | Leave a comment

Increasing and measuring the performance of your ScaleIO environment

This post is a followup to the ScaleIO how to posts that have been posted here and here.

Now that you have your ScaleIO environment up and running after following the posts above, of course you want to see what kind of performance you will get out of it. There are many ways to go about doing this, and I’ll show you one method I’ve been using and also some of the results of said tests.

ScaleIO is handled a bit differently in a VMware environment, where it’s using an iSCSI connection instead of the native ScaleIO protocols. Because of the extra layers that are used (underlying storage (HDD/SSD/PCIe)->VMFS->VMDK->ScaleIO->iSCSI->VMFS->VMDK->Guest FS) you will probably see different performance numbers in a VMware environment compared to a physical one.

However, the method I’ve been using can be used for both physical and virtual ScaleIO environments, so read on.

To gain the best performance out of your ScaleIO environment, a few settings on the ScaleIO VMs need to be set first.

First of all, jumbo frames. Enable them using the following command:

ifconfig eth0 mtu 9000

It’s also recommended to increase the txqueuelen value from 1000 to 10000, like this:

ifconfig eth0 txqueuelen 10000

These are reset during reboot though, so please add them into your network configuration files if you want to continue using them after a reboot.

Here’s a protip! If you want to run the same command on all your ScaleIO nodes, there’s a tool called admincli.py on the MDM nodes that you can use, like this:

/opt/scaleio/ecs/mdm/diag/admincli.py --command "ifconfig eth0 mtu 9000"

When I created my ScaleIO VMs I created VMDKs on top of a VMFS volume on top of underlying storage. These VMDKs should always be created using Eager Zeroed Thick. I also used a paravirtualized SCSI adapter here, it’s not included in the official user guide but seems to have increased the performance a bit more. You can also play with increasing the vCPU count from 2 to 4 for a bit more performance but of course that eats up more CPU. Don’t touch the RAM though, you probably won’t ever need more than the 1GB that’s allocated.

If you are using SSDs or PCIe Flash as underlying storage, the recommendation when installing using the scripted way is to use the profile for SSDs. To do that, you run the following command during installation:

./install.py --all --vm --license=YOURLICENSEHERE --profile ssd

However, if you’ve already installed ScaleIO on top of your SSDs and would like to add the correct SSD configuration to your already existing environment, add the following into your /opt/scaleio/ecs/sds/cfg/conf.txt file on each SDS node:

tgt_net__recv_buffer=4096
tgt_net__send_buffer=4096
tgt_cache__size_mult=3
tgt_thread__ini_io=500
tgt_thread__tgt_io_main=500
tgt_umt_num=1200
tgt_umt_os_thrd=6
tgt_net__worker_thread=6
tgt_asyncio_max_req_per_file=400

Then restart your SDS by issuing the following command on each node:

pkill sds

The test bed I have consists of 4 ScaleIO VMs, each using an XtremSF Flash PCIe card as backend storage. I’ve created one volume of 2TB, given my 4 ESXi-hosts access to it, formatted it with VMFS5 and created 4 Ubuntu VMs with one drive each located on top of that ScaleIO volume. That second VMDK is also created using Eager Zeroed thick. It looks something like this:

In each Ubuntu VM, I’ve installed the load generating tool “fio”, which gives easy access to set things like block size, percent of read/write, if it should be random or not, etc. I’ve attached an example fio configuration file here:

[4k_random_read_90] 
# overwrite if true will create file if it doesn't exist
# if file exists and is large enough nothing happens
# here it is set to false because file should exist 

#rw=
#   read        Sequential reads
#   write       Sequential writes
#   randwrite   Random writes
#   randread    Random reads
#   rw          Sequential mixed reads and writes
#   randrw      Random mixed reads and writes
rw=randrw

# ioengine=
#    sync       Basic read(2) or write(2) io. lseek(2) is
#               used to position the io location.
#    psync      Basic pread(2) or pwrite(2) io.
#    vsync      Basic readv(2) or writev(2) IO.
#    libaio     Linux native asynchronous io.
#    posixaio   glibc posix asynchronous io.
#    solarisaio Solaris native asynchronous io.
#    windowsaio Windows native asynchronous io.
ioengine=libaio

# direct If value is true, use non-buffered io. This is usually
#        O_DIRECT. Note that ZFS on Solaris doesn't support direct
 io.
direct=1

# bs The block size used for the io units. Defaults to 4k.
bs=4k

# nrfiles= Number of files to use for this job. Defaults to 1.
#filename - Set the device special file you need
filename=/dev/sdb
size=200g
iodepth=64
numjobs=4
rwmixread=90

Past the content above into a file, name it “fio_4k_random_read_90” so you’ll know it’s for 4KB blocks, random read/write with a R/W ratio of 90/10. Then run it like this:

fio fio_4k_random_read_90

When running the fio workload from one Ubuntu VM, you will see some performance numbers immediately, and probably really good ones at that. What’s really cool though is when you run more than one fio workload, you’ll most probable see even more performance coming out of those HDDs/SSDs/PCIe cards that you have. So start up your engines!

When measuring performance, it’s easy to get lost in all the numbers flying by when using fio, so I suggest using the included ScaleIO dashboard. You can find the dashboard on the ScaleIO VM itself, it’s located under /opt/scaleio/ecs/mdm/bin/dashboard.jar. Just copy that to your own workstation and run it from there. When started, point it to your ScaleIO cluster IP, password not needed:

When connected, you’ll see something similar to this:

Yup, that’s one hundred thousand IOPS being handled by 4 ScaleIO VMs! Pretty crazy considering many other storage solutions would love to have numbers like this, and here we are with just 4 virtual machines and a few flash cards. I can definitely see a future for this product, what do you think?

Please comment below with your setup and your results!

Posted in Converged Infrastructure, How to, Installation, ScaleIO, VMware | 9 Comments

How to install ScaleIO in a VMware environment – Part 2

To continue our setup of ScaleIO, we’ll need to go through the full installation using the files and scripts you’ve already copied over to the ScaleIO VMs in the previous part of this how to. There are two common ways of installing ScaleIO, either manually or using the install script, and we’ll use the latter for this how to.

First off, create a new VMDK on each ScaleIO VM that will be used as your ECS storage. The VMDK needs to be bigger than 30GB, and for a production workload needs to be on local storage. Also, make sure the VMDK is created using Eager Zero, not Lazy Zero or Thin Provisioned. Then reboot/startup the ScaleIO VM.

Start the install script by running the following command in the /opt/scaleio/siinstall/ECS folder:

install.py --make_config --vm

After you’ve given a password and acknowledged that you want to create a site.cfg configuration file, you’ll be presented with the following picture.

As you can see, there will be several pieces that need to be configured and you need to go through each piece. It won’t take that long 🙂
We’ll start with the MDM cluster, where you will need a minimum of 4 IP addresses (two for the MDMs, one for the Tie-Breaker and one for our virtual IP).
The Meta-Data Manager (MDM) is responsible for knowing where each chunk of data is located, and the Tie-Breaker (TB) makes sure that we don’t run into a split-brain scenario. So these are essential functions and parts of the secret sauce that makes up ScaleIO 🙂

As you remember from the earlier post, you will need to create two Meta-Data Managers, one Tie-Breaker and one Virtual IP. So let’s get to it!

Go through the configuration of Primary and Secondary MDM like this:

Then continue with the Tie-Breaker:

And finally create a virtual IP, which will be used for things like management functionality and dashboards later. Yes, dashboards. Awesome, beautiful dashboard which will be covered in the next part of this how to.

Ok, so now the MDMs, TB and Virtual IP are all configured, let’s move on the Protection domains.

A Protection domain is a logical construct where you can divide your environment up into different failure domains. Let’s say you have 20 servers where you’ll store a certain type of data, and another 10 servers where you’ll store other type of data, you could create one Protection domain for each type of data, like Exchange and VMware for instance. That way you’ll separate the data and create smaller failure domains which are easier to manage. Here’s an example of what it might look like in a larger environment:

So, let’s create a Protection domain for our VMware environment:

Continue with adding a Protection domain, edit it (it might say Edit Initiator, but it’s still editing the Protection domain) and add a Storage Pool to that domain. I’m creating a pool called pool1:

And done! Yup, it’s that easy. We’ll use this Protection domain later on in our configuration, so remember the name you’ve given it and the pool you defined.

Now let’s create our Storage-Device Servers (SDS). For a VMware environment, it’s recommended that there’s one ScaleIO VM on each host with local data, installed as both an SDS and a Storage-Device Client (SDC). The SDS will manage the underlying storage, while we will connect our VMware environment through an iSCSI target to the SDC. Essentially, the data will flow like this:

VMware iSCSI initiator <-> ScaleIO iSCSI target <-> SDC <-> SDS

Start by defining an SDS for each VMware host (essentially it’ll be every ScaleIO VM you’ve deployed):

Once you’ve defined all the SDSs, continue to the configuration of each (we’re using /dev/sdb as our storage device here as that’s the new VMDK you created at the top of the blog post, change that if you’re seeing something else). The password that’s given here is the root password, so we can actually login to SDS and install the packages.

Next up, define your iSCSI initiators. You can find the iSCSI initiator information here in your VMware environment:

Now go ahead and add all the iSCSI initiators that will access this ScaleIO environment.

Ok, so far so good. Let’s continue with the SDCs, which will be the bridge between our iSCSI  environment and the SDS. Do the same as for the SDSs, define one SDC for each ScaleIO VM and the configure it:

Now it’s time to put it all together, and create a volume that we’ll use as our storage. This is where all your hard work pays off 🙂

Oh yeah! Now we’re on a roll! Now it’s time to finish it up with some monitoring as well before we’ll run the install:

And DONE! If you have any RED or YELLOW parts left in the configuration GUI , please get those fixed before proceeding, otherwise it should look like this. Exit and SAVE your configuration 🙂

Ok, let’s go ahead and run the install now. Run the following command to start the installation:

./install.py --vm --all --license=YOURLICENSEHERE

The installation will now start and go through the deployment and configuration of packages on each ScaleIO VM, and hopeful it’ll look something like this in the end:

Voila! All done! Now all that’s left is to rescan your iSCSI Software Adapter in your VMware environment, and you should see a new ScaleIO Device pop up. Go to Storage->Add Storage and add your new ScaleIO device just like any other LUN 🙂

Next post will cover the dashboard functionality and some basic performance testing.

Have fun!

Posted in Converged Infrastructure, EMC, How to, Installation, ScaleIO, VMware, vSphere5 | 7 Comments

How to install ScaleIO in a VMware environment – Part 1

About a week ago, ScaleIO 1.2 was released and it’s been very well received by existing customers and curious newcomers. If you are of the latter, I think you’ll like this post as we’ll go through how to install ScaleIO 1.2 in a VMware environment and use it as a scalable storage solution for virtual machines stored on VMFS.

ScaleIO 1.2 comes in two parts, an OVA and an installation script. This might seem strange at a first glance, but it actually makes it easier to upgrade to a new version by just using a new install script instead of having to provision new OVAs everywhere. Both parts can be downloaded here.

Once you’ve downloaded both parts, deploy at least three ScaleIO VMs. These three VMs will create the base of your ScaleIO environment, containing two Meta-Data Managers (MDM) and one Tie-Breaker (TB). The MDMs will handle the information about where all blocks are stored, and the Tie-Breaker will be used in case of a split-brain scenario. You can have more MDMs, but a minimum of two is required.

These ScaleIO VMs can also be used as storage devices servers (SDS), which is what we’ll do in this scenario. ScaleIO is flexible enough letting you have dedicated physical/virtual hardware for all services in a ScaleIO environment, or run several on each one. For this how to we’ll be using several services on each server.

So, our minimal setup will look like this:

  1. ScaleIOVM1 – MDM, SDS
  2. ScaleIOVM2 – MDM, SDS
  3. ScaleIOVM3 – TB, SDS

Once you’ve deployed the ScaleIO VMs, you’ll notice you’ll be presented by just a login prompt. Login as root with password admin. What will happen next is a basic setup screen, where you need to enter information as per the picture below.

Next, you’ll add NTP information; choose your NTP server and your timezone, it’ll look like this when done.

When that’s done, you’re dropped back at the root prompt. Remember how the ScaleIO installation is divided into two files, an OVA and an installation script? Well, you’re now done with the first one. Easy, right?

So on the the installation script. Copy it using scp/pscp or a similar tool to one of the ScaleIO VMs. That’s right, just putting it on one of them is enough as the installation will talk to all other nodes during deployment. It doesn’t matter which one you put it on, as long as you know where you’ve put the files. Copy it to the existing /opt/scaleio/siinstall/ folder.

So, you’ve copied the file over, now it’s time to run it to extract the installation files.

There’s now a new folder called ECS that contains all the necessary ScaleIO files for a new deployment. enter that folder and run the help command and you’ll see there are several options for the install script.

We want to create a configuration file for our deployment (and it’s going to be deployed in a VMware environment, but that’s not important now), so run the following command:

install.py --make_config --vm

When you run this command, you’ll be presented by a basic GUI, letting you go back and forth between menus and create your configuration. The first thing you’ll do is to set a password (I use admin everywhere to make it easy for me to remember) and tell it to create a configuration file. If you already have a config file (like the one I will provide in the last part of this how to), it’ll automatically load that up and you can reconfigure it through the menus.

Once inside the menu, you’ll find several choices. My suggestion is to go through them one by one, as I’ll explain in the next segments of this how to.

Before going to the next section of this how to, make sure you’ve enabled the iSCSI initiator on your ESXi-servers. You already know how to do that, right?

Stay tuned for the next post!

Posted in Converged Infrastructure, EMC, How to, Installation, ScaleIO, VMware | Tagged , , | 5 Comments