All posts by tekinerd

Andy, chief tekinerd, is a senior technology executive and entrepreneur who has worked for more than 24 years in the high tech local area networking, storage area networking, enterprise computing systems and semiconductor industries. Specializing in whole product cycle strategy formulation, new product development and marketing launch into both OEM and system builder channels, he has worked in several progressive positions in product development, field and product marketing, general management and CEO levels in both the US and Europe.

Cloudy Server Growth

Infoworld recently posted an interesting article (The numbers don’t lie — cloud computing boosts server sales) about how server market revenues have increased by around 11 percent in the second quarter of 2010, with quarterly revenues coming in around $10.9 billion for the quarter. In fact, if you look at IDC’s 2Q10 press release, unit shipments were up 23.8% year over year which is impressive, reportedly the fastest quarter over quarter growth in 5 years. Much of this can certainly be attributed to recovery of the server market segment, but as David Linthicum discusses, shouldn’t overall server growth start declining in the face of server virtualization and cloud computing? Hm… good point.

In many ways, growth in the face of a technology shift caused by more efficient cloud and utility based computing models should not be that surprising given we’ve seen a similar phenomenon before. We lived through it in the early broadband communications days where the concept of an overlay network was used to introduce new services non-disruptively i.e. rather than replace the old network, a new one was built along side hence the term “overlay”. So it is not an unlikely scenario in the cloud age where both corporate and public computer-storage networks are effectively overlaying new systems on top of their existing systems to minimize disruptions and in many cases, testing this stuff out incrementally before they turn it all over to cloud and virtualization – be it internal or externally hosted.

This definitely seemed to be the case when I recently called my hosting provider to ask why FrontPage extensions could no longer be enabled for a website I was putting together as a temporary placeholder for another project. Turned out that all new customers were being steered to the new “grid computer system” which doesn’t support FrontPage extensions anymore, so I had to be moved back to the “legacy” systems. Bingo…. two networks, one for old and one for new. Reminded me of a tour of a central office in Austin, Texas we were given when developing the early broadband network technologies. We were there to review how they planned to roll out DSL when it eventually arrived. I was surprised then to see a lone Siemens digital switch in the corner providing ISDN services purely for data as an overlay service instead of using the already huge installed base of AT&T switches they’d been using for mainline customers and installing ISDN line cards. The technology and switch could handle it, but for service and disruption reasons it was not the preferred way to roll out an infant service.

Bottom line. Traditional models always stick around much longer than us technologists think for usually non-technical reasons, even when there are many advantages to ditching the old stuff for the new. Another reminder of the hype curve age we live in.

Windows Home Server to the Rescue

I have now been running WHS since Dec 2007 on a small self-built VIA based ITX system, chosen because it was small and ran on much lower power and generated less heat and noise than a conventional PC or low cost server allowing me to leave it on all the time. It’s worked like a charm since it was turned on in 2007 and not crashed once! Since then I’ve added a second WHS using the VIA Artigo shoebox PC which I use for projects like the Earthquake monitoring project referred to on this site a few months back.

We have a pretty active PC home. I have gamer kids (I’m one of them) in the house with 3 dedicated gaming PCs mixed in with 3 laptops (2 college, 1 work), a digital audio workstation for audio and midi recording, plus a home built media center PC in the family room. Operating systems are a mix of Windows XP, Vista 32 bit, Vista 64 bit and Windows 7. All of the PCs have the WHS Connector software, though only 2 of them (my home desktop and home laptop) wake up automatically to do backup all the time as I found that the laptop would wake up in my hotel room looking for my WHS when travelling! The rest, I tend to run manual backups or turn on the automated backup only when I know the work laptop is going to be stationary for a while or I have a lot of new content I’m creating.

Having lost data from my pre-WHS writing days, I’d already developed a healthy habit of making sure I had multiple copies of important data (e.g. photos, audio/midi projects) in multiple locations so I’ve managed to avoid catastrophic loss of personal data with a little careful management so far. WHS really helped automate and simplify this previously manual process. On occasion however, there are situations where a recovery of a complete PC or file is still necessary beyond the normal “copy off your backup USB drive” scenario.

(Read the full article on MSWHS.COM here.)

Intel Patsburg and Software RAID

I just got done reading the “Intel eats crow on software RAID” writeup from the The UK Register. On one side I’m really happy to see that server based software RAID (or Virtual RAID Adapters, VRAs, as we called them at Ciprico and DotHill) coming into the spotlight again. Performance, especially now with SSD usage on the rise, is definitely one of the strengths of a software RAID solution which has the ability to scale to much faster rates than a hardware RAID adapter in terms of raw IOPs or MB/s. After all, it’s using the power of a 2-3GHz multi-core Intel or AMD CPU coupled to a very fast memory and I/O bus, versus some fixed function, 800M-1.2GHz embedded RAID CPU hanging off the PCIe bus.

On the other hand, asking if software RAID is faster than or can replace hardware RAID is not really the right question to be asking here. Sure, software RAID with persistent storage like SSDs is changing the landscape as far as making a pure host based software RAID viable, but for traditional hard disk drives not much has changed. There’s a lot of volatile elements (i.e. gone if the lights go out) type storage stages used all the way from the application that wrote the data, through the storage IO device (be it a hardware accelerated RAID adapter or simple IO device), through the 32+MBytes of cache on the drive if you left it enabled, until you actually arrive at the persistent media storage platter. Oh, and then there is VMware ESX which can’t support a conventional software RAID stack yet.

So let’s get some perspective here.

First, as any good RAID vendor will tell you, it’s not so much about software vs hardware RAID, it’s about who is providing your RAID stack, how many “RAID bytes served so far”, how good service and support is and essentially how much you trust the vendor offering the software RAID stack. This is where a RAID stack’s “age” and pedigree is important regardless of its implementation. Being a good software RAID provider goes well beyond making it fast. It’s how robust your solution is and also how great your support is when things don’t work right and you need help fixing it. Hard disks (and SSDs are no exception) throw all sorts of curve balls at you and only the robustness of your RAID vendor’s test and compatibility labs can really filter a lot of this out. It often takes a knowledgeable RAID systems engineer to figure out that it either was, or was not, the fault of the RAID stack in the first place. My deepest respect is for those folks that have to spend their Sundays way into the wee hours of the morning figuring these sort of things out when the fault defies conventional logic.

Second, on the technology side, RAID is always implemented in software in IT application regardless if host or hardware based. It either runs on the host CPU (software, chipset or host RAID) or on a dedicated CPU on the RAID adapter (hardware RAID), sometimes in host software with some assistance from the hardware (e.g. XOR calculations). Granted, one runs in an unpredictable OS environment and the other in a more closed and predictable embedded one, but they end up doing the same thing in software on different CPUs. While there are cases where software RAID may be sufficient and more affordable as it eliminates much of the hardware cost, there are probably just as many cases where it just doesn’t work well at all. Case in point being VMware ESX (see earlier post on this topic here) where there are no commercially available, bootable software RAID solutions available, plus there are less general CPU cycles available anyhow. So hardware RAID tends to win out here. Also, software RAID doesn’t protect your data fully from a system power loss unless you are protecting the whole server with a dedicated UPS which can do an orderly shutdown of the system in the event of a plant power loss. Then there are the video editing crowd that maybe use their host CPUs for video compression, another case where software RAID often fails due to lack of enough available CPU cycles.

So, the key questions to be asking about software RAID in my mind are not how fast it can go, but:

  • How robust is the RAID stack in question i.e. how many “bytes were served” before you got to it, who else is using it a mission critical environment?
  • How would my business be impacted by a server power loss running software RAID? Can I live with a UPS to protect the whole server as I have a fast means of getting back to a fully operational level?
  • Who’s going to support it when it goes wrong and how good is this support when it comes to knowing both the RAID stack strengths and limitations?
  • Are you comfortable buying a RAID solution a chip vendor or storage vendor, the latter who makes their livelihood from creating highly robust disk array systems? You may be perfectly ok with the former.

All of these will depend on just how important your data is and more importantly, how quickly you can restore the system to full operation in the event of a hardware failure.

VMware ESXi at Home

What started out as a simple experiment to help me learn more about VMware ESX has now turned into a full blown experiment running my current Windows Home Server setup, along with two Linux servers used for an online FEAR Combat gaming server (for the kids of course) and a private WordPress website development environment running on a single Dell T110 server.

I am now able to create and tear down “server sandboxes” right next to my “leave alone” server setups (i.e. Windows Home Server) given the relative ease with which I can now create new virtual servers. More experimentation is necessary to get to streaming high definition videos which seem to struggle, but standard resolution and audio seem to work fine so far. This certainly seems like virtual servers are now well within the reach of the tekinerd and small home office type setups.

In addition to the Dell server, I also created a low cost iSCSI box using the free OpenFiler software and the VIA Artigo A2000 shoebox sized computer to expand the storage capabilities of ESX and primarily the virtual Windows Home Server which required additional storage to handle my total home PC backup requirements. Details of the final setup are included below, with a more detailed writeup on the Tekinerd Server Pages at http://tekinerd.com/server-pages/at-home-with-vmware-esxi/.

Hardware:

  • Dell T110 server (2x 160G drives in my particular setup), 2G DRAM (~$399 special at Dell)
  • VIA Artigo A2000 for the iSCSI storage box with 2x WD 500G drives (~$350 all in)

Software

  • Dell: VMware ESXi v4 (free download from VMware)
  • Dell: Client OS#1: Microsoft Windows Home Server ($99)
  • Dell: Client OS#2: OpenSUSE 11.2 ($0) setup as a FEAR Combat Server ($0)
  • Dell: Client OS#3: OpenSUSE 11.2 ($0) setup as a WordPress Development Server ($0)
  • VIA Artigo A2000: Openfiler v2.3 ($0) configured with 2 iSCSI targets (317G + 465G available)
  • Laptop: VMware vSphere Client software ($0)

PCIe Flash versus SATA or SAS Based SSD

The impressive results being presented by the new PCIe based server or workstation add-in card flash memory products hitting the market from the likes of FusionIO and others are certainly pushing up the performance envelope of many applications, especially in transactional database applications where the number of user requests is directionally proportional to the storage IOPs or data throughput capabilities.

In just about all cases, general purpose off the shelf PCIe SSD devices all present themselves as a regular storage device to the server e.g. in Windows, they appear as a SCSI like device that can be configured in the disk manager as regular disk volume (e.g. E: or F:). The biggest advantage PCIe SSDs have over standalone SATA or SAS SSD drives is that they can handle greater data traffic throughput and I/Os as they use the much faster PCIe bus to connect directly to multiple channels of flash memory, often using a built in RAID capability to stripe data across multiple channels of flash mounted directly on board the add-in card.

To help clear up confusion for some of the readers, the primary differences between PCIe Flash memory and conventional SSDs can be summarized as follows:

Where PCIe Flash Works Well

The current generation of PCIe flash SSDs are best suited to applications that require the absolute highest performance with less of an emphasis on long term serviceability as you have to take the computer offline to replace defective or worn out SSDs. They also tend to work best when the total storage requirements for the application can live on the flash drive. Today’s capacities of up to 320G (SLC) or 640G (MLC) are more than ample for many database applications, so placing the entire SQL database on the drive is not uncommon. Host software RAID 1 is typically used to make the setup more robust but starts to get expensive as high capacity PCIe SSD cards run well north of $10,000 retail, the high price typically a result of the extensive reliability and redundancy capability of the card’s on-board flash controller. As the number of PCIe flash adapter offerings grow and the market segments into the more traditional low-mid-high product categories and features, expect the average price of these types of products to come down relatively fast.

Where SSDs Work Well

SATA or SAS based SSDs, by design, work pretty much anywhere a conventional hard drive does. For that reason we see laptops, desktops, servers and external disk arrays adopting them relatively quickly. Depending on the PCIe flash being compared to, it can take anywhere from 5-8 SSDs to match the performance of a PCIe version using a hardware RAID adapter which tends to push the overall price higher when using the more expensive SLC based SSDs. So SATA or SAS SSDs tend to be best suited to applications that can use them as a form of cache in combination with a traditional SATA or SAS disk array setup. For instance, it is possible to achieve a similar performance and significantly lower system and running costs using 1-4 enterprise class SSDs and SATA drive in a SAN disk array versus a Fibre Channel or SAS 15K SAN disk array setup. Most disk array vendors are now offering SSD versions of their Fibre Channel, iSCSI or SAS based RAID offerings.

Enterprise Flash Memory Industry Direction

At the Flash Summit we learned that between SSDs and DRAM a new class of storage will appear for computing, referred to as SCM, or storage class memory. Defined as something broader than just ultra fast flash based storage, it does require that the storage be persistent and appear more like conventional DRAM does to the host i.e. linear memory versus a storage I/O controller with mass storage and a SCSI host driver. SCM is expected to enter mainstream servers by 2013.

Kingston SSDnow VSeries and RAID

I’ve been watching the price of SSDs fall for a little while now and was intrigued by the sub $100 pricing being offered by Kingston and their MLC based 30G SSDnow Vseries, a 3G SATA II based SSD. Once they reached $80 at NewEgg.com, I dove straight in. Actually, I dove right in at the deep end and bought 6 of them as I wanted to see just how well they performed with an old 3-port NetCell RAID SATA I (1.5Gbps) card I had (some of you may remember them), along with a more recent 3ware (pre LSI) 8 port hardware RAID SATA II (3Gbps) card to see just how fast they were capable of going.

I definitely saw some interesting results. I should say upfront that this was a quick and dirty test to see if you can use these low cost devices in any RAID configuration, which they were not really designed for, so please take that into consideration as you read this as this was more an academic exercise to prove a point when mixing old with new.

Inside the Kingston 30G SSDs

For fun, I disassembled one of the SSDs at the end of my tests (don’t worry Kingston; not going to try and return it) and I was surprised to see lots of fresh air inside. These things are tiny (duh! I hear a few folks saying). They kept the cost low by removing much of the length of the electronic circuit card, with the Toshiba T6UG1XBG controller on one side with a Micron 9TB17-D9JVD DRAM memory, and four 8GB Toshiba flash devices on the underside. This combination according to the published specs should result in streaming speeds of up to 180MB/s read, and 50MB/s write, so definately more on the entry level side versus what you’d see in some of the more expensive versions.

Testing with Legacy RAID Cards

I picked two old legacy cards to work with: a PNY NetCell 3-port SATA RAID 3 and a 3ware 9650SE 8 port SATA II RAID card. I was really pleased that the SSD worked just fine with my old PNY NetCell RAID card in one of the older PCI test rigs. The Kingston looked just like a regular SATA drive as advertised and the NetCell card auto-configured it nicely into a single drive before booting into Windows. No surprises there.

However, given my primary performance test system was a SuperMicro X8DT6 PCIe setup which had no PCI slots, I quickly moved onto testing with the PCIe x4 Gen1 3ware 9650SE SATA2 RAID adapter as I wanted to see just how well traditional RAID adapters work with SSDs. Of course, having just attended the Flash Summit and was told 100 times over that most of today’s controllers and applications don’t take advantage of SSDs the way they should be doing, I really had to see it for myself. I was also curious why most benchmarks used host based software RAID, and following this am beginning to see why.

Quick Test RAID Performance

Low cost SSDs don’t really work that well in a traditional hardware RAID setup. On the positive side, using the 3ware RAID adapter, a single drive outperformed against it’s published spec of 180MB/s coming in at 200MB/s on streaming reads. The performance test results were based on IOmeter 2006 running under Windows XP and were all performed using raw volumes, Q size of 1 and only two block sizes; 1M for streaming reads and writes, and 512 bytes for random I/Os. The drives were set up in simple RAID0 configurations and with the write caches left on.

While the Kingston SSD’s I used were not really designed for RAID applications, I saw no reason why I shouldn’t at least see linear performance increases as I added drives from the RAID controller. The quick results published here in the two charts did convince me as an novice SSD user that throwing these into a system without consideration of all the various moving parts doesn’t automatically make for a high performance system. In fact, it could even slow it down. While 1-2 drives in a RAID0 configuration did reasonably well on streaming reads and significantly higher that ye old hard drives, this quickly hit a ceiling at around 550MB/s with not much of a benefit in performance past 4 SSDs. Not bad for an 3-4 year old RAID card that wasn’t really optimized for low cost flash memory. However, the writes were a different story, barely making it to 50MB/s (with some very high max latencies >4 seconds in some cases). IOPs were a real mixed bag with decent results at around 4500 IOPs, but never really seeing the increase as more SSDs were added. IOPS writes were again a mixed bag and not really related to the number of drives at all, with the performance actually falling off as more drives were added beyond 3 SSDs. Nothing conclusive here of course, but definitely illustrates that the caching algorithms of the older RAID cards were not optimized for a 4000+ IOPs device hanging off each port!

Conclusion

This exercise with legacy RAID components for me confirms what we saw at the Flash Summit i.e. even with all the excellent work going on in the industry on making SSDs perform to their true potential, it will be a while before the whole PC eco-system will be able to take full advantage of them. While these devices work well as a low cost SSD for entry level single drive systems, it’s unlikely they can offer much when used with legacy RAID adapter cards. A lot of effort has gone into making a flash drive look just like a regular hard disk drive which has fooled a few of us end users into thinking it’s just a faster hard drive, whereas it’s a very different beast. Lesson learned here is that applications, software drivers and of course RAID firmware all need to be re-written to ensure that they are SSD literate. Bottom line is you need to use a modern RAID card that is SSD literate. LSI and Adaptec are moving in this direction with their built in SSD caching technology, but general concensus is that hardware RAID engines in all types of equipment are going to need an overhaul given the quantum leap in performance of SSDs over traditional hard drives.

Flash Memory Summit 2010

The Flash Memory Summit was well attended this year with attendance up over 50%. A total of around 2000 people showed up versus the planned 1200 according to the show organizers.

It was first time for me, but I definitely thought the show was well worth attending (Note: I’m an independent and not associated with the show in any way). Not only did it bring folks like me up to speed quickly on what was happening, but it also provided a forum for discussing where this industry is heading and more importantly, what the various vendors and researchers are up to in this space. We even had Steve Wozniak (now at FusionIO) as a keynote speaker, entertaining us with his various brushes with memory from the first DRAM chips from his computer club days to the hilarious pranks he used to play on folks, including Steve Jobs in the early years.

In summary… things are definitely looking up for the flash industry.

Technology tidbits of interest:

  • PCM (Phase Change Memory) as a future technology looked very interesting, especially given it’s significantly higher write endurance (1M+) though not meeting density expectations yet
  • MLC 3 -bits per cell using 25nm semiconductor process was announced by Intel and Micron; time for higher densities (albeit at worse write cycle levels)
  • Most SSD manufacturers appear close to having a PCIe Flash add-in card solution
  • Automatic tiered storage and auto load balancing are on the rise and badly needed wherever flash shows up in general computing
  • The term “short-stroking SSDs” came up which was really another way to say only use 75% of the capacity of an MLC-SSD in order to provide the optimum performance
  • The concept of SCM – storage class memory – due in 2013 onwards also came up on a number of occasions as being a new class of memory mapped flash versus the current storage I/O model commonly used today (i.e. flash becomes an extension of DRAM space in this case)
  • FusionIO were giving away nice looking capes, monkeys and T-shirts. They also had a very cool looking display with a huge number of video streams.

One point of interest on the SSD adoption side; Intel claim almost all datacenters want MLC as the TCO for SLC solutions doesn’t compute for most end IT managers (lots of discussion around that one). It was pretty clear throughout the show that an enterprise class MLC solution was badly needed, tied of course to the basic cost advantages over SLC.

Other new products of interest introduced at the show included FusionIO’s hybrid flash card with external Infiniband (IB) I/O (40Gbps) connectivity, the ioSAN, which was simply a Infiniband PCIe controller integrated with a Fusion Duo card using a PCIe switch . While there is no onboard bridging offered, the system software driver does allow the card to transfer data to and from the flash memory as a target IB device, offering a very fast method to move data onto or off the card, and potentially offer support for virtual machine migration down the road. Future versions will offer 10G iSCSI support also.

Another cool little device was the SANDisk® 64GB iSSD SATA-flash module in the form of a silicon BGA package for embedded applications, no bigger than the size of a US postage stamp. Given the increased availability of SATA ports appearing in embedded chipsets, this makes for a very nice component for many applications beyond just the mobile space.

The only concern that continues to come up consistently is the write endurance problem, along with how end users and applications still need to understand the differences between SSD and traditional hard drives to achieve the most out of them. As noted in one of the keynotes, it has taken almost 30 years to go from a few 10s of IOPS on a HDD to approximately 280+ IOPs. All applications and operating systems have conditioned themselves around this level of performance, so it’s not surprising that a sudden jump by 10-20x is not being seem in anything other than raw benchmarks. RethinkDB™ was one company that had gone to the trouble of rewriting a SQL database application to take advantage of SSDs, and managed to push operations per second up from 200 to over 1200 for the same hardware.

There is still the serviceability issue however for enterprise class and data centers whose IT managers have been well trained by the hard drive industry in ways not always well suited to SSDs. While there are tremendous gains from technologies such as PCIe Flash cards from the likes of FusionIO and OCZ, there is also the question of how to replace them when they go bad without turning off the server and migrating data for example. This will continue to drive new ideas and solutions for a while in the enterprise space.

All in all, a great show and well worth attending.