Tag Archives: software RAID

Intel Patsburg and Software RAID

I just got done reading the “Intel eats crow on software RAID” writeup from the The UK Register. On one side I’m really happy to see that server based software RAID (or Virtual RAID Adapters, VRAs, as we called them at Ciprico and DotHill) coming into the spotlight again. Performance, especially now with SSD usage on the rise, is definitely one of the strengths of a software RAID solution which has the ability to scale to much faster rates than a hardware RAID adapter in terms of raw IOPs or MB/s. After all, it’s using the power of a 2-3GHz multi-core Intel or AMD CPU coupled to a very fast memory and I/O bus, versus some fixed function, 800M-1.2GHz embedded RAID CPU hanging off the PCIe bus.

On the other hand, asking if software RAID is faster than or can replace hardware RAID is not really the right question to be asking here. Sure, software RAID with persistent storage like SSDs is changing the landscape as far as making a pure host based software RAID viable, but for traditional hard disk drives not much has changed. There’s a lot of volatile elements (i.e. gone if the lights go out) type storage stages used all the way from the application that wrote the data, through the storage IO device (be it a hardware accelerated RAID adapter or simple IO device), through the 32+MBytes of cache on the drive if you left it enabled, until you actually arrive at the persistent media storage platter. Oh, and then there is VMware ESX which can’t support a conventional software RAID stack yet.

So let’s get some perspective here.

First, as any good RAID vendor will tell you, it’s not so much about software vs hardware RAID, it’s about who is providing your RAID stack, how many “RAID bytes served so far”, how good service and support is and essentially how much you trust the vendor offering the software RAID stack. This is where a RAID stack’s “age” and pedigree is important regardless of its implementation. Being a good software RAID provider goes well beyond making it fast. It’s how robust your solution is and also how great your support is when things don’t work right and you need help fixing it. Hard disks (and SSDs are no exception) throw all sorts of curve balls at you and only the robustness of your RAID vendor’s test and compatibility labs can really filter a lot of this out. It often takes a knowledgeable RAID systems engineer to figure out that it either was, or was not, the fault of the RAID stack in the first place. My deepest respect is for those folks that have to spend their Sundays way into the wee hours of the morning figuring these sort of things out when the fault defies conventional logic.

Second, on the technology side, RAID is always implemented in software in IT application regardless if host or hardware based. It either runs on the host CPU (software, chipset or host RAID) or on a dedicated CPU on the RAID adapter (hardware RAID), sometimes in host software with some assistance from the hardware (e.g. XOR calculations). Granted, one runs in an unpredictable OS environment and the other in a more closed and predictable embedded one, but they end up doing the same thing in software on different CPUs. While there are cases where software RAID may be sufficient and more affordable as it eliminates much of the hardware cost, there are probably just as many cases where it just doesn’t work well at all. Case in point being VMware ESX (see earlier post on this topic here) where there are no commercially available, bootable software RAID solutions available, plus there are less general CPU cycles available anyhow. So hardware RAID tends to win out here. Also, software RAID doesn’t protect your data fully from a system power loss unless you are protecting the whole server with a dedicated UPS which can do an orderly shutdown of the system in the event of a plant power loss. Then there are the video editing crowd that maybe use their host CPUs for video compression, another case where software RAID often fails due to lack of enough available CPU cycles.

So, the key questions to be asking about software RAID in my mind are not how fast it can go, but:

  • How robust is the RAID stack in question i.e. how many “bytes were served” before you got to it, who else is using it a mission critical environment?
  • How would my business be impacted by a server power loss running software RAID? Can I live with a UPS to protect the whole server as I have a fast means of getting back to a fully operational level?
  • Who’s going to support it when it goes wrong and how good is this support when it comes to knowing both the RAID stack strengths and limitations?
  • Are you comfortable buying a RAID solution a chip vendor or storage vendor, the latter who makes their livelihood from creating highly robust disk array systems? You may be perfectly ok with the former.

All of these will depend on just how important your data is and more importantly, how quickly you can restore the system to full operation in the event of a hardware failure.

Hardware RAID Adapters Making A Comeback?

We are constantly reading about how the number of cores being offered by Intel and AMD will eventually make hardware acceleration devices in a server mostly obsolete. But hang on, I recall hearing something similar back in the 90s when MIPS – meaningless indication of performance provided by sales people – was the in metric. Having just been involved in a key OEM software RAID project and watching the steady replacement of server based RAID adapter unit shipments start to shift downward as more CPU cycles come on line, I recently saw some recent market numbers from Gartner that showed the reverse trend. Hardware RAID adapter cards, which were supposed to be dying slowly, are starting to see a resurgence in servers.

So why is this?

Not surprisingly, one of the key contributing suspects is looking like increased virtual server adoption, in particular VMware ESX. Unlike the conventional operating system environment, a dedicated hypervisor environment like ESX doesn’t have the same “luxurious” methods for developing a broad range of device drivers for starters, let alone the lack of RAM space to load up complex device drivers. The whole point of virtual servers is to drive the hardware to a minimal number of absolutely necessary interface types, so no surprises that there is a rather slim pickings of storage adapters in the standard distribution of ESX. This is specially so with the new breed of skinny hypervisors capable of operating on minimal ROM and RAM footprints (e.g. inside the system BIOS itself) that require skinny device drivers versus something like a fat fully featured software RAID driver that could required up to 1Gbyte RAM minimum in a single-OS setup. Then there is the user aspect of loading custom drivers via the vSphere command line if VMware doesn’t bundle the one you need with their standard distribution. I’m still coming up my learning curve on ESX, but there is a definate and significant learning curve as I attempt to explore the capabilities of my newfound experimental VMware ESXi system to make sure I get it right (i.e. don’t kill it) versus the Microsoft easy-to-install approach for standard apps and drivers we’ve all become used to seeing.

In the case of hardware RAID versus software RAID, the initial problem is that conventional software RAID just doesn’t fit well into a VMware ESX hypervisor environment because most of today’s solutions were built with a single-OS in mind. Apart from the fact you can’t get hold of a software RAID stack for ESX, even if you could, it is likely to take up significant – possibly excessive – system RAM resources as a percentage of the overall hypervisor functions. Re-enter hardware RAID as it doesn’t really care about what OS you are running above it. It can use a simpler, skinny host driver without impacting the system resources and ports over easier to the ESX environment as the RAID stuff runs down on a dedicated hardware accelerated engine or storage processor.

One of the other contributing factors to hardware RAID’s incline again could also be the increased usage and focus on external SAS disk arrays. DAS is certainly making a comeback, especially given that the latest generation SAS disk arrays can operate at up to 24Gbps rates over a single cable to the host (four SAS channels running at 6Gbps). Having a hardware RAID adapter as the primary connection to a SAS JBOD ensures that there is always a consistent virtual interface to the hypervisor layers and that the CPU and RAM resources are not being overtaxed. Sure a software RAID stack can do the same given enough system resources and CPU cycles, but concerns about scaleability as you add drives over time is something that makes software RAID more difficult to manage in an ESX environment as it impacts the RAM usage when adding more drives for starters. Again, not a problem with hardware RAID. Same resources consistently presented to the hypervisor layers without significant CPU-RAM impact as drive changes and capacity increases are made.

So for hardware RAID, and many other traditional functions that were on the track to oblivion if you read the multi-core CPU tea leaves of late, maybe things aren’t so bad after all for storage in particular. VMware creates a new lease of life for performance or IO functions that really need to operate at maximum performance and not steal CPU cycles or RAM resources from the hypervisor and/or applications.

If hardware RAID vendors can continue to add enhanced functions such as the SSD caching algorithms and other storage virtualization functions behind an external SAS switched storage setup, then there is definitely some life left in ye old RAID engines, at least in the opinion of this blogger.

While software RAID has a solid place in the future, dare I say “long live hardware RAID” (again)?