Tag Archives: VMware

SSD Tiering versus Caching: Part 2

A while back I wrote about some of the differences between caching and tiering when using solid state disk (SSD) drives in a PC or server.

Having just returned from the 2011 Flash Memory Summit in Santa Clara, I feel compelled to add some additional color around the topic given the level of confusion clearly evident at the show. Also I’d like to blatantly plug an upcoming evolution in tiering, called MicroTiering from our own company, Enmotus which emerged from stealth at the show.

The simplest high level clarification that emerged from the show I’m glad to say matched what we described in our earlier blog (SSD Caching versus Tiering): caching makes a copy of frequently accessed data from a hard drive and places it in the SSD for future reads, whereas tiering moves the data permanently to the SSD and it’s no longer stored on the hard drive. Caching speeds up reads only at this point with a modified caching algorithm to account for SSD behavior versus RAM based schemes, whereas tiering simply maps the host reads and writes to appropriate storage tier with no additional processing overhead. So in teiring, you get the write advantage and of lesser benefit, the incremental capacity of the SSD which becomes available to the host as usable storage (minus some minor overheads to keep track of the mapping tables).

Why the confusion? One RAID vendor in particular, along with several caching companies, are calling their direct attached storage (or DAS) caching solution “tiering”, even though they are only caching the data to speed up reads and data isn’t moved. Sure write based caching is coming, but it’s still fundamentally a copy of the data that is on the hard drive not a move and SSD caching algorithms apply.

Where Caching is Deployed

SSD caching has a strong and viable place in the world of storage and computing at many levels so it’s not a case of tiering versus caching, but more when to use either or both. Also, caching is relatively inexpensive and will most likely end up bundling for free in PC desktop applications with the SSD you are purchasing for Windows applications for example, simply because this is how all caching ends up i.e. “free” with some piece of hardware, an SSD in this case. Case in point is Intel and Matrix RAID, which has now been enhanced with it’s own caching scheme called Smart Response Technology (SRT) currently available for Z68 flavor motherboards and systems.

In the broader sense, we are now seeing SSD caching deployed in a number of environments:

  • Desktops (eventually notebooks with both SSD and hard drives) bundled with SSDs or as standalone software e.g. Intel SRT and Nvelo (typically Windows only)
  • Server host software based caching e.g. FusionIO, IOturbine, Velobit (Windows and VMware)
  • Hardware PCIe adapter based server RAID SSD caching e.g. LSI’s CacheCade (most operating systems)
  • SAN based SSD caching software, appliances or modules within disk arrays e.g. Oracle’s ZFS caching schemes (disk arrays) or specialist appliances that transparently cache data into SSDs in the SAN network.

Where Data Tiering is Deployed

Tiering is still fundamentally a shared SAN based storage technology used in large data sets. In its current form, it’s really an automated way to move data to and from slow, inexpensive bulk storage (e.g. SATA drives, possibly even tape drives) to fast, expensive storage based on its frequency of access or “demand”. Why? So data managers can keep expensive storage costs to a minimum by taking advantage of the fact that typically less than 20% of data is being accessed over any specific period of time. Youtube is a perfect example. You don’t want to store a newly uploaded video and keep it stored on a large SSD disk array just in case it becomes highly popular versus the other numerous uploads. Tiering automatically identifies that the file (or more correctly a file’s assocatied low level storage ‘blocks’) is starting to increase in popularity, and moves it up to the fast storage for you automatically. Once on the higher performance storage, it can handle a significantly higher level of hits without causing excessive end user delays and the infamous video box ‘spinning wheel’. Once it dies down, it moves it back making way for other content that may be on the popularity rise.

Tiering Operates Like A Human Brain

The thing I like about teiring is that it’s more like how we think as humans i.e. pattern recognition over a large data set, with an almost automated and instant response to a trend rather than looking at independent and much smaller slices of data as with caching. A tiering algorithm observes data access patterns on the fly and determines how often and more importantly, what type of access is going on and adapts accordingly. For example, it can determine if an access pattern is random or sequential and allocate storage to the right type of storage media based on it’s characteristics. A great “big iron” example solution is EMC’s FAST, or the now defunct Atrato.

Tiering can also scale better to multiple levels of storage types. Whereas caching is limited to either RAM, single SSDs or tied to a RAID adapter, tiering can operating on multiple tiers of storage from a much broader set up to and including cloud storage (i.e. a very slow tier) for example.

MicroTeiring

At the show, I introduced the term MicroTiering, one of the solutions our company Enmotus will be providing in the near future. MicroTiering is essentially a direct attach storage version of its SAN cousin but applied on the much smaller subset of storage that is inside the server itself. It’s essentially a hardware accelerated approach to teiring at DAS level that doesn’t tax the host CPU and facilitates a much broader set of operating systems and hypervisor support versus the narrow host SSD caching only offerings we see today that are confined to just a few environments.

Tiering and Caching Together

The two technologies are not mutually exclusive. In fact, it is more than likely that tiering and caching involving SSDs will be deployed together as they both provide different benefits. For example, caching tends to favor the less expensive MLC SSDs as the data is only copied and handles the highly read only transient or none critical data, so loss of the SSD cache itself is none critical. It’s also the easiest way to add a very fast, direct attached SSD cache to your sever provided your operating system or VM environment can handle it.

On the other hand, as tiering relocates the data to the SSD, SLC is preferable for it’s higher performance on reads and writes, higher resilience and data retention characteristics. In the case of DAS based tiering solutions like MicroTiering, it is expected that tiering may also be better suited to virtual machine environments and databases due to it’s inherent and simpler write advantage, low to zero host software layers and VMware’s tendencies to shift the read-write balance more toward 50/50.

What’s for sure, lots of innovation and exciting things still going on this space with lots more to come.

Hardware RAID Adapters Making A Comeback?

We are constantly reading about how the number of cores being offered by Intel and AMD will eventually make hardware acceleration devices in a server mostly obsolete. But hang on, I recall hearing something similar back in the 90s when MIPS – meaningless indication of performance provided by sales people – was the in metric. Having just been involved in a key OEM software RAID project and watching the steady replacement of server based RAID adapter unit shipments start to shift downward as more CPU cycles come on line, I recently saw some recent market numbers from Gartner that showed the reverse trend. Hardware RAID adapter cards, which were supposed to be dying slowly, are starting to see a resurgence in servers.

So why is this?

Not surprisingly, one of the key contributing suspects is looking like increased virtual server adoption, in particular VMware ESX. Unlike the conventional operating system environment, a dedicated hypervisor environment like ESX doesn’t have the same “luxurious” methods for developing a broad range of device drivers for starters, let alone the lack of RAM space to load up complex device drivers. The whole point of virtual servers is to drive the hardware to a minimal number of absolutely necessary interface types, so no surprises that there is a rather slim pickings of storage adapters in the standard distribution of ESX. This is specially so with the new breed of skinny hypervisors capable of operating on minimal ROM and RAM footprints (e.g. inside the system BIOS itself) that require skinny device drivers versus something like a fat fully featured software RAID driver that could required up to 1Gbyte RAM minimum in a single-OS setup. Then there is the user aspect of loading custom drivers via the vSphere command line if VMware doesn’t bundle the one you need with their standard distribution. I’m still coming up my learning curve on ESX, but there is a definate and significant learning curve as I attempt to explore the capabilities of my newfound experimental VMware ESXi system to make sure I get it right (i.e. don’t kill it) versus the Microsoft easy-to-install approach for standard apps and drivers we’ve all become used to seeing.

In the case of hardware RAID versus software RAID, the initial problem is that conventional software RAID just doesn’t fit well into a VMware ESX hypervisor environment because most of today’s solutions were built with a single-OS in mind. Apart from the fact you can’t get hold of a software RAID stack for ESX, even if you could, it is likely to take up significant – possibly excessive – system RAM resources as a percentage of the overall hypervisor functions. Re-enter hardware RAID as it doesn’t really care about what OS you are running above it. It can use a simpler, skinny host driver without impacting the system resources and ports over easier to the ESX environment as the RAID stuff runs down on a dedicated hardware accelerated engine or storage processor.

One of the other contributing factors to hardware RAID’s incline again could also be the increased usage and focus on external SAS disk arrays. DAS is certainly making a comeback, especially given that the latest generation SAS disk arrays can operate at up to 24Gbps rates over a single cable to the host (four SAS channels running at 6Gbps). Having a hardware RAID adapter as the primary connection to a SAS JBOD ensures that there is always a consistent virtual interface to the hypervisor layers and that the CPU and RAM resources are not being overtaxed. Sure a software RAID stack can do the same given enough system resources and CPU cycles, but concerns about scaleability as you add drives over time is something that makes software RAID more difficult to manage in an ESX environment as it impacts the RAM usage when adding more drives for starters. Again, not a problem with hardware RAID. Same resources consistently presented to the hypervisor layers without significant CPU-RAM impact as drive changes and capacity increases are made.

So for hardware RAID, and many other traditional functions that were on the track to oblivion if you read the multi-core CPU tea leaves of late, maybe things aren’t so bad after all for storage in particular. VMware creates a new lease of life for performance or IO functions that really need to operate at maximum performance and not steal CPU cycles or RAM resources from the hypervisor and/or applications.

If hardware RAID vendors can continue to add enhanced functions such as the SSD caching algorithms and other storage virtualization functions behind an external SAS switched storage setup, then there is definitely some life left in ye old RAID engines, at least in the opinion of this blogger.

While software RAID has a solid place in the future, dare I say “long live hardware RAID” (again)?