A little over six months ago I started researching the quickly emerging world of “Hyper-converged” infrastructure as a new IT ethos to take my company’s IT operations into the next decade of application and data hosting. I was first introduced to the idea when I attended a webinar from Simplivity, one of the leading companies in this new market. I was immediately intrigued… the underlying question that Hyper-convergence answers is “What happens if you radically simplify the data-center by collapsing storage and compute into one all-encompassing platform?
The broader idea of convergence has been around for a while. I started seeing it with the advent of wider 10 Gbe adoption; the idea of taking a single (or +1 for redundancy) high-speed LAN connection and splitting up into multiple logically separate connections. I.E. you could “converge” management, application, and storage LAN connections down into a single wire. The wider over-arching concept predates even this.
Expand that thought into the virtualization space. Virtualization has been around for a very long time but traditionally if you wanted to get some kind of fault-tolerant system setup it required a complex stack of centralized network attached storage, management software, and a clustered hypervisor. Not to mention (often) separate networking equipment for both the storage and hypervisor nodes.
The promise of hyper-convergence is that many of those disparate parts can go away and instead you can host your workloads on a unified, easily scaled, inherently redundant, platform that encompasses all of your storage and compute needs while simplifying a majority of your networking. Wikipedia sums it up nicely. Rather than reinventing the wheel I will just refer you there if this is a new concept: https://en.wikipedia.org/wiki/Hyper-converged_infrastructure
Hyper-convergence is a rather elegant answer, especially if the product is designed from the outset to BE a hyper-converged platform. My premise in this article is that Scale Computing is one of the few (perhaps the only?) “proven” vendors that have developed a product from the ground up as a hyper-converged system. Based on a lot of the FUD I came across while researching, I got the distinct impression that a large number of people don’t understand this fundamental difference between Scale and the majority of other HCI products currently populating the market. This is a long post, get coffee now…
Disclaimers
- This is an independent assessment/opinion piece. Scale Computing isn’t compensating me in any way.
- This is based on several months of research into Hyper-convergence generally plus 3 months of real-world experience with Scale’s HC3 platform. I have not had any hands on experience with Nutanix, EMC, or Simplivity. I have had -multiple- technical deep-dive conversations with Nutanix and Nutanix affiliates in addition to conversations with EMC (shortly before Dell snapped them up).
- I have several years of professional experience administering traditional VMware and Hyper-V environments in addition to a significant amount of personal experience with smaller virtualization platforms like VirtualBox and Proxmox. I wouldn’t call myself an expert, but I do have a broad-range of real-world experience across a broad-range of products.
- This article isn’t written with the intention of denigrating competing HCI products. Based on my research, I do however believe that Scale is actually doing some rather unique things both technically and aesthetically which have made their product the best fit for my use-case. I am certainly not attacking anyone else’s choice of platform. From where I am sitting we are all in this together… this whole bloody blog is in large part written because of that belief.
- Based on all of the above, I am not objective (I have a Scale Cluster, I have skin in the game), this post isn’t meant to be scientific, and I don’t have a PHD in computer science. I have neither the time nor the inclination to do/be any of those things. I really like our new HC3 system and am probably on the verge of becoming a raving fan, hence I want to write about it because I think there are many other people in the same boat that could benefit.
End Disclaimers
What makes one HCI product different from another?
I am probably not truly qualified to give an in-depth answer to this question as there are, I am sure, a plethora of technical differences that exist when you get into the “guts” of each HCI offering. However I will attempt to summarize my birds-eye view findings. The primary items I have found that really differentiate one HCI product from another are (in order of magnitude), the approach taken to storage, the approach taken to management, and finally the choice of underlying hypervisor. I am going to start with the latter of the three.
Quick Aside
One could also argue that price should make the list, but this is more of a technical review. Suffice to say some system’s pricing is what I would term as “stratospheric” (especially from the perspective of the SMB market) when it comes to up-front investment while others are a bit more approachable. Scale Computing is one of the few (only?) providers targeting this segment. Simplivity and EMC (Dell) VxRail both came to mind when I used the term “stratospheric”. Nutanix is also very expensive however they are trying to move into the SMB space. That said, I found that due to the “synthetic” limitations they enforce on their “entry level” products that their offering was honestly a bit laughable while still being rather expensive. That probably sounds harsh but the value proposition (X features at X cost) offered by Scale’s HCI platform makes the pricing on other systems look downright shameful in comparison. I will grant though that Simplivity is doing some very interesting things with their hardware which makes at least somewhat of a case for their higher pricing.
End Aside
Hypervisors: VMware, KVM, and Hyper-V, Oh my…
Virtualization already allowed us to collapse compute for a bunch of independent systems down into a single system/platform. The idea of hosting more than one logical server on a single physical server is well over a decade old. In one sense, none of the various vendors (Scale Computing included) are really bringing anything new to the table here. The only thing that really differentiates them in this arena is the choice of Hypervisor (the big three being VMware, Hyper-V, and KVM). ALL THREE are stable, high-performance, bare-metal hypervisors. I emphasize this because in some quarters you will run into some particular “fanboy/girl” loyalty and/or a lot FUD, particularly surrounding KVM. Do the research though and you will discover that all three, KVM included, have significantly large enterprise install bases and are rock-solid for day-to-day virtual workloads. I would even posit that KVM is perhaps the most efficient of the three when it comes to resource usage but that is more “gut feeling” than anything else.
I will offer yet another brief aside and mention that Nutanix’ primary product is built around VMware, however it also supports Hyper-V, and they recently launched a separate product line (Acropolis) built around KVM. I think more and more vendors will start offering some flavor of KVM in their products because it’s largely open-source which means low/no licensing overhead. I have a soft-spot for Hyper-V because that is where I have spent a lot of my time professionally and, perhaps ironically, I am also a fan of Linux. So I dig open-source and I like KVM.
In conclusion, all three hypervisors are “good” however I noticed particularly with Nutanix that the choice of hypervisor summarily dictates how you end up working with the system on a day-to-day basis This leads us right into talking about how Scale’s approach to systems management radically differs from other vendors.
Systems Management
I am going to provide a couple of intentionally leading questions…
- Would you like to manage your infrastructure from one pane of glass?
- Do you want to manage multiple subsystems or a single unified system?
- Do you want a single packaged product or an amalgamation of different products packaged together?
- Do you want to have the ability to “tweak” all kinds of settings or do you want a system with minimal configuration options that always just “work” and work well?
- Do you want to have to worry about managing your hypervisor, your storage, and your networking or do you want all of this handled for you?
- Do you want a system that you work on and in often -or- one that you can essentially just forget about once all the setup is done?
I am going to wax strongly on Scale’s platform. Some might accuse me of just marketing their product but I would refer those folks back to my list of disclaimers. I am a real world admin that is in no way being compensated by Scale or any of their affiliates for this article. I simply was (and am) blown away by Scale’s offering on this particular front. Also, I will highlight the handful of issues I have had with their platform towards the end of this article.
Simplicity vs. Complexity
If you want a simple (whilst still being very powerful) system that you can manage via a single pane of glass (more to the point, through an HTML5 interface in a browser window… including your console sessions with your guest machines), that’s precisely what Scale provides. If you want an HCI product that you can quite literally “set and forget” with bare-minimum up-front configuration, Scale satisfies. I would go so far as to argue that they are the only provider on the market that strives for and summarily hits both of these goals. Yes, you can “set” and then truly “forget” about Hypercore because if there is an issue that you need to pay attention to the system will email you about it. Vsphere offers an HTML5 interface now but I still find myself going back to using the client because I am more familiar with it and frankly their interface isn’t intuitive, especially when put side-by-side with Hypercore.
Let me devolve into a discussion about complexity for a moment… We have a SAN, I dislike our SAN. Storage Area Network management is truly a discipline unto itself and most SAN products require constant babysitting, tweaking, and fiddling. If you like knobs, input fields, sliders, and buttons… get a SAN, they have all of those things in virtual spades. When I first came across the HCI ethos, I think I quite literally let out a sigh of relief because one of the overarching ideals of convergence is simplifying the operational stack. The reality however when I started researching actual products was that very few actually take the idea of “simplicity” to heart.
This desire of mine for hands-off simplicity may not apply to everyone. Some companies with large IT departments may want (may even require) a system that is infinitely tweakable; providing a graph and a button for everything. Those companies probably have dedicated storage engineers (or a sub-department full of them) or they have very niche’ workloads. If your company falls into that category then you might want to look elsewhere. I would suggest that one should really, really, ponder on this because it is my supposition that very few companies actually do fall into this category. In my opinion, getting a complex system for the sake of complexity alone is a bad practice. I get it, admins (including myself) love the concept of control. However, one should think long and hard about the difference between their preferences and their requirements. It has been a gradual shift for me over the course of several years. I still had that odd sense of stupor when I finished getting our HC3 cluster setup… the whole “is this really it?” experience and then the wonderful sense of relief as I realized that this was indeed really truly it and it was doing everything I needed it to do.
Single Pane of Glass?
When I first started looking at Nutanix I was interested… After I saw a demo of their product I was much less interested. Like my current SAN, Nutanix features a plethora of menus, sub-menus, graphs, buttons, input fields, and charts. However as soon as I asked the question, “Can I spin up a VM from your interface?” The immediate answer was, “No you need to go access the underlying Hypervisor for that, vCenter or Systems Center.” My response… seriously?!… all of that and no integration with the underlying hypervisor controls? Ugh… That means you would also have the pleasure of separately managing the virtual networking, spin-up, spin-down, drive and memory allocations, snapshots, etc. etc. I decided against the product after that demo. In addition to not wanting another “SAN” product that would require constant monitoring and have 100 different items to tweak, I wanted a system that could actually monitor and control my entire stack. I thought such management constructs were just a given but that was a bad assumption and would have been an expensive letdown.
To be clear, I am not trying to pick on Nutanix, they just happen to be the other HCI product with which I have the most familiarity. this fact actually speaks well for them because we started with a list of three; Nutanix, Simplivity, and EMC VxRail. Of those three, only Nutanix made it past the initial flirtation phase. Scale HCI actually came to the party much later and won me over after one interface demo. The personalized pre-sales attention from their sales team and, much to my surprise, engineering team, is what won our business. Thus far my after-sales support care experience has been a continuation of the same… as much expert attention as we could ask for.
A Unified Product vs. an Amalgamation of Functional Parts
All of the above goes back to that third question I asked at the start of this section… “Do you want a single packaged product or an amalgamation of different products packaged together?” The difference between HC3 and most of the other players on the field is that Scale has built something essentially from the ground up to be a Hyper-converged product… Most other vendors have seemingly built a VSA (virtual storage appliance), an interface for their VSA, and then paired it to an existing Hypervisor. That is no small feat, but it is fundamentally different from what Scale Computing has done. By comparison I believe the folks at Scale went back to the drawing board and came at this by asking that base question “What is the purpose or value of HCI?”
It should be noted, some companies do some significantly novel things with their platform. Simplivity, for example, has included a proprietary add-in card purpose built for hardware assisted in-line data deduplication. That’s actually very cool from where I am standing. At the end of the day though it’s still just a VSA product tacked onto a hypervisor. If Simplivity and others had a unified interface that tied it all together perhaps it wouldn’t matter as much… To be fair, I didn’t take an extremely close look at Nutanix Acropolis which is KVM based. Acropolis may be more streamlined as a result due to the fact that graphical management stacks for KVM aren’t as common and this may have forced Nutanix to build one. I honestly don’t know.
Simply Hyper-converged
Scale’s modus operandi is SIMPLICITY. Nothing should be in the interface that isn’t absolutely needed to host high performance, highly available virtual machines. Nothing that CAN BE silently automated SHOULD BE left to the user to manage and configure. Period. Case in point, hardware initialization of a five node HC3 cluster took about 20 minutes. That was going from the state of “never having powered on” to “hosting the first vm” in 20 minutes. Here are some other HC3 interaction examples to drive this point home:
- Spinning up a new VM can be done within two clicks + set your drive sizes, memory, core count… done.
- Live migration, literally two clicks. Click the VM, click the node you want it to move to… done.
- Want to increase performance on a particular virtual drive by increasing its SSD priority? Three clicks + a slider.
- Want to console into a VM? One click and the console opens up in a new browser window.
- Clone a VM? 2 Clicks. Try to do the same thing in Hyper-V or VMware… it’s a five minute process instead of a 20 second process. When you have 5 – 10 machines to provision, that adds up.
- Uploading ISO files to your media library is quite literally a drag-n-drop operation. You can even drag a bunch of them at once and the system just queues the uploads.
Suffice to say, it is ludicrously easy and nearly thoughtless to perform the majority of regular VM administration tasks.
Let’s hit on network administration… add a NIC to your VM, type in the VLAN ID, done. No virtual network configuration…the underlying hardware/software is pretty much invisible and that’s the point.
Let me go on though and discuss how this “culture of simplicity” extends into their support model.
Scale has some of the BEST support of any tech company I have ever worked with. You have an issue? Call in, press 2, you are on the phone with a technician. Identify yourself, the tech will then give you a pin code, you punch it into your console and click a button, done. They are in your system and working. The tunnels are always outbound-established and customer initiated and controlled. No “GoToAssist” or any other additional remote access/control technologies required. You get white-glove care 24/7. Drive failure? They send you a new one next day, pop out the old, pop in the new, go on with life (and all of the other million and one things you have to get done). Total node failure? Same story.
Another brief aside, I think their entire support team is based out of their office in Indiana and every technician I have dealt with knows their product well. They are actively improving the product based on customer feedback. For real. I don’t think they get very many calls after people have gotten through the initial month or two with their new system. Funnily enough, I heard or read that one of the more common calls they get is for password resets… Established customers seldom have to log in to their systems because A.) there just isn’t that much to do and B.) everything just works all the time…
In this case it is my not so humble opinion that this is exactly how it should be. This is how the IT world should be ordered in 2017. Now it is time to talk about the primary technical differentiation between Scale and everyone else.
The HCI Linchpin: Storage
I would posit that the primary technical differentiator between Hyperconverged offerings is how each vendor handles storage. Virtualization, clustering and distributed network storage are the three technological pillars of HCI. However storage in particular was the last foundational piece that has led to the rise of this thing we have dubbed hyperconvergence. Look at all of the major players, in almost every case their HCI offering is an evolutionary development of some kind of storage product.
SAN’s, vSAN’S, and then there is SCRIBE which is raw storage sans anything else…
Traditional clustered virtualization requires a SAN (or some kind of networked storage)… so I believe the prevailing logic has gone, what if we virtualized the SAN and ran it within the cluster? And now we have witnessed the birth of vSAN or the VSA (virtual storage appliance). In many HCI products the idea is that you run a VM or more specifically a VSA on each node which performs the duties of a specialized SAN array. These all communicate with each other in a pool to provide data redundancy across nodes and poof… no more need for dedicated SAN hardware. In the end though you still end up with all of this “stuff” between your virtual workloads and the underlying storage… iSCSI Protocols, intervening file systems and or object stores, Virtual Storage Containers (i.e. VHD’s, VMDK’s and the like), etc.
However, on other fronts, particularly in the academic/scientific/supercomputing/unix/linux world, another approach to networked storage had arisen which involved network distributed storage using some form of logical, low-level “wide striping RAID” of local drives using a high-speed network backbone. Technologies like Hadoop Distributed File System, GlusterFS and Ceph, to name a few, are excellent examples of this concept in the open-source world. On the Windows front, MS Windows Server 2016 has introduced Storage Spaces Direct which is Redmond’s take on the same concept.
Scale has taken the latter concepts of “wide-striping raid” and network distributed software-defined shared storage and created the “Scale Computing Reliable Independent Block Engine” -or- S.C.R.I.B.E. I believe Scribe grew out of previous experience (and perhaps frustration with) IBM GPFS. I think they also pondered the rather obvious question of “Why use a remote storage protocol at all when a lot of my storage is local?”
Scribe is integrated directly into the kernel of their HyperCore OS. If you are coming at this with previous virtualization experience, throw out most of what you know about virtual system storage… SCRIBE allows the KVM Hypervisor direct block-level access to the underlying storage infrastructure and works with VirtIO drivers to provide rather incredible in-guest drive performance. All of that without cycle-sucking intervening protocols and file systems. This isn’t a repacking of “legacy” ideas which I would kindly suggest is exactly what most competing VSA’s are.
Scribe works with everything, SAS/SATA, HDD/SSD, it doesn’t matter. It discovers all block-level devices (and is aware of each device’s capabilities) and aggregates them intelligently into one giant storage pool for consumption by the cluster. Because Scribe is aware of each storage type in the pool, it is able to intelligently “tier” data based on frequency of access. This is NOT simply a layer of SSD “cache”. The SSD drives are consumed in the same manner as the spinning disk. Which once again in an example of simplifying things.
Speaking of SSD storage, I would be remiss if I didn’t discuss another feature called “HEAT” (Hypercore Enhanced Automated Tiering). With HEAT the administrator can assign higher data-speed priority to individual virtual drives on each VM. Inline with Scale Computing’s overall “ethos” – the entire setup is what I would call “stupid simple” to administer. Assigning higher SSD priority or even “pinning” a specific virtual disk to SSD can be done with as little as 4 clicks of the mouse.
Why it matters: Efficiency
I can’t impress upon my readership enough what an absolutely crucial and fundamentally different approach this is to Hyperconverged storage. All of the other major players rely on some kind of frankly “bloated” VSA to provide similar features. A Nutanix vSAN deployment can take as much as 8 virtual cores and 24 Gb of memory per node. Extrapolate that to a three node cluster and you could be looking at 72 GB of memory to run your storage layer. VMware VSAN (as used by VXrail) can require even more memory than this. Conversely, Scale Hypercore (so everything including the management interface for the cluster) + Scribe consumes about 4 Gb of memory per node…
Taken directly from Scale’s technical documentation I feel like this sums it up nicely:
“Unlike other seemingly “converged” architectures in the market, the SCRIBE storage layer is not
running inside a virtual machine as a VSA or controller VM but instead runs parallel to the hypervisor,
allowing direct data flows to benefit from zero-copy shared memory performance—also unlike other
architectures.
SCRIBE is not a re-purposed file system with the overhead introduced by local file / file system
abstractions such as virtual hard disk files that attempt to act like a block storage device. This means that
performance killing issues such as disk partition alignment* with external RAID arrays become obsolete.
Arcane concepts like storing VM snapshots as delta files that later have to be merged through I/O killing
brute-force reads and re-writes are also a thing of the past with the SCRIBE design.”
Scribe is fully automated and integrated into Hypercore, this means that you, the admin, don’t have to administer it. Aside from the marketing, articles like this one, and deeper technical documentation, you would be hard pressed to even know it is there. Compare this with all of the other HCI offerings.
SCRIBE is the result you get when you have a Hyper-Converged product designed from the ground up. It’s elegant and its pretty much 100% automated.
Some of what SCRIBE can do
You may say, “That level of simplicity and efficiency sounds great… but what about all of the functionality I require?” Let me tick off for you what we have been using in our stack:
- Near instant “thin” VM-level snapshots which induce almost zero performance hit when being taken. This allows you to take snapshots frequently and keep a lot of them on hand for each VM. Furthermore, ANY point-in-time snapshot can be recovered by simply spinning up the snapshot as a new VM. This allows for rather painless file-level recovery.
- Near instant “thin” clones of VM’s… I can clone a 4 TB VM and boot the thing in under two minutes.
- Excellent in-guest storage performance
- Data redundancy. Lose a drive, heck, lose an entire node, and lose NO data.
- Totally independent VM-level snapshots and clones – this is such an administrative blessing it’s absurd. Delete ANY VM, ANY SNAPSHOT, ANY TIME, with no worries because none of the systems that were cloned off of that VM and/or snapshots will be affected. So you have a hundred VM’s cloned off of your “Gold Master” image and you want to delete that “Gold Master” – Go right ahead.
- Storage is “one giant pool” – this will drastically simplify your life… No more worrying about resource libraries, vaults, multiple data pools, partitions, and arrays. It’s one less thing (and a rather large thing indeed) that needs to occupy space in your brain. You have X amount of total storage of which you have used Y. That’s all you need to know.
What’s wrong with Scale?
Not that I want to conclude on a low note here but I do want to be as honest as possible. While it is in fact mostly ice cream and candy canes on HC3, I have run into a few frustrations.
I also did some due diligence before purchasing our final solution and contacted several current Scale customers. This thankfully helped me avoid a few pitfalls. It should be noted, I asked the Scale sales teams for references to contact and they were more than happy to provide me with customers to talk with… They even tried to match me up with folks that were running similar workloads to the ones I intended to run. These weren’t chaperoned calls. I got names and telephone numbers of admins working in other businesses. Everyone consistently loved what they had while also giving me some good pre-purchase advice. This jived with what I had read from the majority of current customers across the web in forums like Spiceworks.
Anyhow, the issues:
- Sometimes it’s too simple… moments where I have felt this way are few and far between but occasionally I have to pick up the phone and have a support technician do something for me in the back end of the system. I will take this problem every day though vs. the opposite issue of a system that is overly complex.
- You can’t resize virtual drives. That would be what one of my support calls was for. Scale is supposed to be releasing a feature for allowing drive resizing any day now so this complaint will go away. Currently if you need to increase the size of a drive on a VM, you have to call support.
- Lack of metadata in the GUI. We have database servers with lots of separate drives. There is no way in the GUI to indicate which drive is which. This is annoying and the only workaround currently is to simply size the drives differently so you can compare what is in the GUI with what is in the guest OS.
- No online drive resizing. Similar to point 2, we have been working with a SAN. Sometimes our drive space requirements on a system change… in the world of SAN this is easily addressed because you can just do an online resize without having to reboot the server. That’s not an option on scale although I have found a very easy work-around…. Size the virtual drive on the VM really big… then only format a small portion of it. You won’t take up any more “real” space on the cluster than you need/use and if you ever need more you just increase the volume size in the guest OS. This still isn’t as great as being able to do an online resize but it is enough for now.
- Thus far all of my complaints have been about drives… let’s complain about something else. In line with my first complaint… some things are perhaps oversimplified. Case-in-point, there is only one account for managing the cluster. I was really surprised by this. So if you have a team of admins that will be administering this setup, they will all have to share one user account. I would really like to see the ability to have multiple accounts with different permission levels. Scale’s intended market is SMB (many of those shops only have one guy) so I kind of get it but at the same time it seemed like a bit of a glaring omission. That said, I much prefer they air on the side of simplicity vs. throwing everything + the kitchen sink at customers.
- Back to storage… Because SCRIBE is really smart when it comes to snapshots and clones… and really smart in protecting you from breaking stuff by deleting snapshots and clones… it can be hard to figure out how much space your snapshots are taking up and furthermore, deleting a snapshot may or may not free up all that much storage. Once you get the full explanation on this it all kind of makes sense (and honestly I have no qualms with how they are doing cloning/snapshotting, it is honestly quite brilliant), but I still wish storage utilization by snapshots was a little less opaque.
- They charge an arm and a leg for hardware. Now don’t get me wrong, Scale is leaps and bounds cheaper than the next comparable “complete/unified” HyperConverged solution. Furthermore, their pricing and support model is dead simple. All features, for everyone, all the time. Flat percentage cost for annual support/updates/part and system replacement. That’s all fine… but God forbid you want to upgrade something… Charging 2x – 4x the price of the part is not cool. Even though these servers use “off the shelf” components (which I love), you are still locked into getting parts from Scale. Replacement parts and whole nodes are free if you are under support. But if you want to increase memory for example be prepared to shell out. Case-in-point, even though Scale markets “buy only what you need and add bigger nodes later” – the truth of the matter is that you should probably buy more than you need up front, if possible. BTW, the story is bleaker than this with all of the other HCI products I looked at… in some ways it’s just the state of the market. If you want a simple unified architecture with full warranty and support, you are going to have to endure “vendor lock-in.” Thankfully Scale is a pretty good vendor to be joined at the hip with but it can be frustrating at times all the same.
- Lower-cost nodes are all single-socket CPU boxes and the options are comprised of various mid-to-upper level Xeons. An 8 core chip with 16 threads seems really “light” from where I am sitting for virtual workloads. That said, before we purchased we looked at our environment and noticed that most of our server CPU resources were hardly touched except for a few specific cases. For the average SMB in particular, I think processing has far exceeded needs in 2017. I never talked to a single customer (some having installations that were a couple of years old and were fully loaded with virtual machines) that complained of not having enough compute performance and as far as our setup goes, I have yet to see the processor meter even emit a hiccup. We are pretty much idling along. I think part of the reason for this is KVM… it’s a beautiful hypervisor when it comes to compute efficiency.
- However, if at a time in the future I want to “add” more compute to my cluster by getting some beefier nodes, I am still limited to 16 vCPU’s on my individual VM’s. That said, my newer nodes could have chips with much higher IPC (faster cores) in which case I am still getting improved compute performance. Do I ever think this will be anything more than a theoretical complaint? Nope.
- I found something complicated… it took me a while but I did. Importing VM’s from other platforms is a bit of a messy affair. Direct/traditional migration of Hyper-V Generation 2 Virtual Machines is “out” because Scale doesn’t use or support UEFI for its virtual machines yet. They are all legacy boot. I spent a few hours trying to use a recovery product to push a system into a “shell” VM in Scale only to finally figure out the reason it couldn’t boot was due to the fact that the source system in question was UEFI. Scale partners with Double-Take however, which is also expensive, but it is stupid easy to use and works really well so I don’t mind the premium price so much. Being able to do P2V conversion of large systems with almost 0 complications and only 15 minutes of downtime…. well, that’s almost priceless. Double-Take will pretty much move anything to anything (Well… my systems are pretty much all Microsoft and all Server 2008R2 and newer…). So that was a very simple way to get around the UEFI to Legacy Boot mode issue. Double-Take BTW works with all kinds of stuff so even if you aren’t going with Scale I would urge you to take a good look at DT if you have a lot of migrations/conversions to do. If you have any money left you can buy me a beer… you will at least have your life and your job and most of your hair so you won’t care so much about the fact that you are broke :).
- Limited KVM Ecosystem – Not everything supports KVM. There are more tools available for VMware based solutions. If you stay within Scale Computing’s walled garden I don’t see much need for any additional tools but then again I work in a “mid-size” IT shop. That said, we use a particular software product for backup that has licensing that is much easier on the wallet when backing up VM’s. The problem is that it doesn’t recognize our KVM systems on HC3 properly and thinks they are all physical systems. I went around and around with their support and finally got them to work out a deal where they would just extend my physical licensing based on “scouts honor” due to the fact that we already had several virtual host server licenses purchased to use with HC3. I thought it particularly gracious of them. So that worked out fine for us. If you are in an IT department that has a lot of sunk cost in one product or another though that is incompatible with KVM, it might be an uphill battle. Nutanix Acropolis customers have to fight the same battles though. Often-times so do users of Hyper-V although third-party support for Redmond’s virtualization platform has really surged in the last several years.
Conclusion
Should you give Scale Computing a shot? Obviously I think so. It might be my ignorance of competing HCI products but I honestly can’t think of a good technical reason for why someone would go with Nutanix, traditional VMware, EMC, etc. If I have mis-characterized one of those competing products in this post, please point it out and I will correct it.
I didn’t do a deep dive into testing performance and perhaps my Hypercore cluster is lacking something? If so, I have yet to figure out what. I am coming off of an older SAN environment and all I can say is that storage performance in particular has been pretty phenomenal. I am continuing to migrate production workloads over and thus far my cluster CPU usage has been steady without breaking sweat. We already replaced one drive and it was painless. We did component failure testing and our whole system appears to be rock-solid. I am sleeping better…
Personally, I want to see our entire network running on Hypercore over the long-term. I might just get my sanity back (my Wife hints that this might not be possible), drink less coffee, have time to focus on more important things in my role and add new value to my company with all of my “free” time.
Hypercore offers ridiculously good benefits that are immediately apparent when compared side-by-side with competing products. To summarize:
- SCRIBE is an absolutely genius, high performance storage solution designed specifically to meet the needs of Hyperconverged infrastructure. It is tightly integrated into the heart of Hypercore. As a result, SCRIBE offers an incredible feature-set while being completely transparent to the end-user.
- HC3 has the best “daily-driver” management interface in the industry. Hands down.
- Hypercore/KVM is an excellent, rock-solid, high-performance Hypervisor.
- My own experience and pretty much that of every other customer testifies to the fact that we all love our product. I searched long and hard trying to find unhappy owners of HC3 equipment… to this day I still don’t know if any exist.
- Furthermore, Scale’s technical support is absolutely phenomenal. I would challenge you to find a single customer of their’s that says otherwise
- Finally I will add that while not “cheap” – when put side-by-side with comparable products HC3 is honestly an absolute steal. When compared to trying to get the same results on traditional infrastructure… even more of a steal. The value proposition here is huge.
In conclusion, I think the folks at Scale Computing have fully understood and embraced the spirit of Hyperconvergence. In so doing they have created a truly unique and compelling product offering in HC3 which is worthy of better understanding and ultimately wider adoption.
References:
http://www.scalecomputing.com/files/documentation/whitepaper-taneja-group-tech-validation.pdf
https://www.scalecomputing.com/wp-content/uploads/2014/10/whitepaper-hc3-hypercore-theory-of-ops.pdf
http://www.joshodgers.com/2016/05/05/vmware-youre-full-of-it-fud-nutanix-cvmahv-vspherevsan-overheads/
https://developer.ibm.com/storage/videos/hadoop-hdfs-vs-spectrum-scale-gpfs/
https://community.spiceworks.com/topic/752353-scale-computing-references
https://community.spiceworks.com/topic/753863-anybody-using-scale-for-their-virtualization-platform?page=1
Thanks for the review. Found it helpful and well written.
Thanks Dan,
I need to write a followup post as I have spent several more months with our clluster now. I still find it to be a solid solution although I have ran into the occasional quirk and learned a few things “NOT” to do :). To date I haven’t had any production outages for any systems hosted on Scale HC3. In our environment an outage typically means an application is unavailable for > 15 minutes in a given week (or day depending on SLA). I actually took a vacation for the first time in ages where no one called me with a crisis. Ironically it was during that vacation when our Scale solution had an issue where essentially one node became unavailable for several minutes due to a bug in HyperCore and one of our guest systems… and the cluster just hummed along, sent out alert emails, crashed the VM’s on that node and brought them up within 5 minutes on the other nodes in the cluster and no one even noticed the issue. That was with 0 staff intervention on our side. Kind of wonderful. The Scale tech support team is still doing a deep dive; I have a feeling I am one of their biggest PIA customers lol :)… and they are still very nice, professional, attentive, responsive. So yes, I am still a happy camper 9 months in.
Hi,
Thank you for the in-depth review. Did you ever write a follow-up post?
Hi Francesco, I need to write a followup post for sure. We have had our cluster in production for almost a year now and I am repeatedly told by support that we are in the top 10 of their clients as far as “aggressive” use cases. Why? We abuse the heck out of our cluster 🙂 – Lots and lots of SQL servers. I think this perhaps give me a bit of a unique take on the whole thing and I would love to do a summary write-up when I have time.