recording.begin( protodeka );: VS2 Conversion: Hour Twenty-One

Oh lordy what did I get myself into.

So I take a legacy virtual server. And by legacy I do mean legacy. We have to keep this one x86 (not x86-64, mind you) box around to support a VM called 'Xaero' (zay-ro). It's a Gentoo linux guest that doesn't play nice on x86-64 chipsets. Makes the clock run about 1.6 times the normal second speed (so where on an x86 host 10 tick-seconds will go by in a 10 actual-seconds period, on an x86-64 host 16 tick-seconds will go by in a 10 actual-seconds period - this. causes. problems.).

Not a major deal. I've worked with Virtual Machines before. Ahaha. AHahhahahaha. Hahahahaha. Oh no.

So the objective is this: take VS2 (2x 1.0GHz, 2.0GB RAM, 3x 72GB RAID5 disk array, 3x NIC), and upgrade her from the ancient rev of the OS (SuSE 10.0, GSX server 2.2 - old) to the new and shiny, with a RAM and disk upgrade to go. Okay, so I'm modifying the array. I have to destroy the existing RAID container, which means, we make an image. We always make an image. But before I do that, I have to export the VMs.

Beginning at roughly eight AM yesterday morning, I began rsyncing 120GB of Virtual Machine data off of VS2 and onto MDS-BVS10. Not a major deal, right? Dedicated gigabit backup LAN, soley for the purpose of moving large amounts of data back and farth. Should be simple, right?

Yeah, it finished at freaking NOON. Four times longer than I hoped. This is when I went into the office to begin the dirty work - making the backup image of the machine. Again, dedicated backup network? Five hours. I got stuff done on the inbetween, mind you. Got a few other machines imaged, got a salesman's laptop squared away, even found time to grind my fishing up to a respectable point. Woo.

The time comes to begin the nuts-and-bolts. The VMs have been exported (and verified), and a backup image has been made (and verified). It's time to take her apart. Adding three more 72GB disks, and two additional gigabytes of RAM. Configure a new RAID container with 5x 72GB disks in RAID5 with one hot standby spare. No sweat. I have a lovely new 287GB logical drive. Perfect. Slicey, dicey. Installing Linux! An hour later, we have a box that I can administer remotely.

Now, why is this important? Well, you see, most computers have these fancy windowing systems. With Linux - in this case, SuSE 10.2 - you'd get KDE around here. And KDE is great. If you actually use it. But see, this is a virtual server. It has no real purpose with a window manager, so we opted to make it 'headless'. Don't get me wrong - it's a great idea when you want to squeeze every last drop of CPU horsepower out of a host. The VMware Server Console makes remote administration a snap. It's really cool.

It's the conversion from yester-yester-years virtualization that's going to be the death of me. That, and a blade server with a faulty NIC.

My gripe in all of this really comes down to: data processing takes WAY TOO LONG. And processing large amounts of data allows for the introduction of minute replication errors that build upon each other.

Xaero, for example, has two virtual disks. Both SCSI, one is 20GB, the other is 80GB. That's all fine and dandy. But we've upgraded, you see. We upgraded from VMware GSX 2.2, which is ancient, to VMware Server 1.0.4. This requires retooling a few components - in Xaero's case, his disks have to be converted to support the new VMware standard.

Yeah.

The 20GB disk took almost two and a half hours. Mathematically speaking, the 80GB disk should take TEN.

I DON'T HAVE TEN HOURS! I KICKED YOU OFF AT FOUR FREAKING AM, AND YOU'RE ONLY AT TWELVE PERCENT!? IT'S FIVE THIRTY! CHUG, MOTHERFUCKER. I NEED YOU UP AND RUNNING! YESTERDAY!

Ahem. Yes, don't get me started on moving a VM that does this:

Source checksum of disk-file-1:
c3371600dcd0e7ad9ad3841bf905ec35 MDS-PAG-CVS1-IDE-0-s001.vmdk

Target checksum of disk-file-1:
ce3cc9db76f4b164d38632f7a99cc78a MDS-PAG-CVS1-IDE-0-s001.vmdk

This is the part where you notice that the checksum for the same file on two different machines is different. The bytecount is the same, but the signature is wrong. Meaning the file is damaged. And given the amount of time it took to copy, it's going to be a long, long, long ni--... morning. Frak.

I just don't want to have to explain to people that Xaero or EPICenter are down because the disk is still converting. I'll get the standard issue 'Well, didn't you plan for time yardda yardda.' I did. I'd argue that the amount of time required to convert this data was completely misrepresented to me. I'd also like to point out that I can't exactly _cancel_ the operation and let you have the disk back. It's halfway converted from one generation to another. Stopping now means the data is lost. Lost as in gone bye bye where's-your-backup-tape gone.

Hey hey! She's at fourteen percent. Ugh.

recording.begin( protodeka );

2007-11-05

VS2 Conversion: Hour Twenty-One

No comments:

Blog Archive

About Me