The hidden pitfalls of server virtualization

Where’s there’s buzz, there’s bullshit. Slashing hardware costs and the time I spend herding servers makes virtualization sound like a silver bullet, but I have no doubt it’s not as easy as the salespeople tell me to pull it off. I’ve spent some time researching and thinking it through, and have come up with a few things that I need to keep in mind when planning to get into virtualization properly.

Will I really spend less on hardware?

So maybe I can replace replace 4 or more servers with 1, but none of the boxes I currently have can handle that, so I either need to beef some up, or buy newer, bigger boxes. So there is probably some cash outlay, which means that even if I’m using fewer boxes in the end, if I’ve already paid for the boxes I’m cutting out I may not actually save money.

Do I know how much the hardware can handle?

There’s no guarantee I really can fit 4 server images onto a single box and get the same performance I had on 4 different boxes.

Let’s look at my java application servers, which are the hungriest animals in my farm. They absolutely must have 2 gigs of RAM each, or they’re not gonna work. They actually can’t make use of more than this, due to the nature of the 32 bit JVM, so in theory they are an excellent candidate for virtualization.

So I could pimp a few of my current single CPU 2 gig servers with 8 gigs of RAM and 2 dual core CPU’s each. Or I could satisfy my techo-lust and get servers with 2 quad core processors each, and 16 gigs of RAM.

In theory my pimped server will have the same power as 4 of my old servers, and the new quad-cored beast could run 8 servers. Now that would be consolidation!

But there’s more to hardware than CPU and RAM, so when I load up one of these beasts I could find that the network or disk I/O are a bottleneck, keeping all of my virtual images crawling.

The point is, I don’t know how many of my VM’s a server can handle until I try it. It’s even more difficult in the wonderful world of virtualization, where a box may be running a mixture of web, app, and database servers along with infrastructure services like DNS and email. I might actually get better peformance with a mix of servers than a monoculture where all of the VM’s are trying to do the same type of thing and competing for the same hardware resources.

So rule number 1, before I spec out and price up my new virtualized server farm and get it signed off by the bosses, is to do plenty of testing to get an idea of what hardware I’ll really need. I ought to test the pimped version of my existing servers, as well as higher specced boxes, maybe with different characteristics. What happens if I use hopped up caching RAID controllers, or high end network cards, or fibre channel rather than iSCSI for my shared storage?

Hardware restrictions

One interesting tidbit I ran across when googling for virtualization limitations is that some of the groovier features can be picky about the hardware. In particular, the capability of shifting virtual machines between physical servers on the fly may require that the physical hardware be very similar, down to the CPU family and chipset. I’ve read this about VMware’s vmotion technology (although I’ve lost the original reference, sorry).

This means I can’t mix and match hardware, and could get trapped by legacy hardware that goes out of production. I can easily see ending up with several pools of hardware, and having to resort to old fashioned manual methods to migrate servers between them. This would mean having to choose between giving up the productivity gains of easy-auto migration or chucking out slightly older, but still perfectly good hardware to buy a raft of newer kit.

What’s interesting about virtualization is that it is actually the opposite of the concept - epitomized by Google - of building a utility-style computing infrastructure on lots of cheap, commodity boxes. Instead, I will end up buying a few highly and carefully specified servers.

Don’t forget spare capacity for failover!

So let’s say I work out that I can trim my current farm to a quarter of its former size, 4 servers per box. My old farm had failover servers, or load balanced servers scoped to be able to cope with one of them going down. I may have virtual servers in my new pool which do the same thing, so if one virtual image crashes others will pick up the slack.

But what happens if one of my physical boxes goes down?

I now have to find homes for all of the virtual servers on that box, but if I’ve been over-efficient in sizing my hardware capacity I won’t have any place to put them. In practice, I will probably have virtual machines I can take offline in a crisis, like my staging servers, and of course the failover images.

But the point is, I need to think about this up front. I’ll work out what my tolerance for failure should be - is it enough to be able to cope with 1 server croaking, or should I have 2, or some percentage of my total?

The d’oh! of licensing costs

So if I work out that I will run 20 virtual servers on 4 boxes in a new server farm, then I only need to spend 20% of what I would for 20 separate boxes, right? Oops, no, if I’m running a commercial OS like Windows, Red Hat, or Suse, I’ve still got to buy 20 licenses. Common sense, but easy to overlook when costing initially, and it would be embarrassing to have to go back and ask for the extra budget.

The real hidden cost …

OK, maybe I still have to pay for all those licenses (or I can just use Debian), but at least I only have 4 boxes to manage going forward, rather than 20. Phew!

Nope. That’s 20 servers that need to be monitored, backed up, patched, disk space managed, user accounts kept up to date, configurations to be changed, etc.

Even worse, once I go to virtualization, I expect to expand my usage of virtual images to where I’ll have some that are kept offline until needed. So I can do certain staging, testing, and other exercises by bringing up images I need for a short while, then putting them back into cold storage. So even with a finely honed, well-automated infrastructure management system, I need to work out how these get updated. Do I cycle them into memory periodically to run updates on, or have the update process (potentially a long one) run when they are brought online as needed?

Conclusion

There is obviously plenty I need to think about when planning to go to virtualization. I’m sure there’s more I’m missing, and will learn the hard way. I do still think the payoff can outweigh the difficulties, but we’ll see as I go along!

kief.com

Sporadically delivered thoughts on Continuous Delivery

The Hidden Pitfalls of Server Virtualization