When a company looks to build a major new IT system, they have a set of business goals in mind. Mapping those goals to specific pieces of technology to acquire is not always the easiest thing to do. This is a discussion of how EMC worked with one customer to take some of the questions out of what they were getting and increase the customer confidence in the final solution at the same time.
The Problem
This customer deals with continuous data feeds, both coming in and going out. They need to have the data integrated into their overall information framework within very specific time windows. They know how much data they need to be able to move - after all, they are in the information business. But how does that translate into a storage solution?
They mapped their performance needs to a set of hardware requirements. They took these requirements to multiple vendors, and they got very different solutions from each. They have enough relationship with their vendors to know that all of the solutions would eventually meet the need. However, it was possible that some were not going to support the full (planned) production workload with only the gear being proposed. Price was the major factor in the purchase decision between workable solutions, and the vendors were well aware of this. Might some be planning to fix things later with upgrades, or hoping that the planned workload would never appear? (Customers have to consider this possibility, even if only causes them to do enough checking to 'trust but verify' - thus the pictured fake news clipping)
The customer originally started out by asking for a collection of specific storage components - drives, cache, etc. The vendor responses either had all of those components, or had substitutes and explanations for why the substitutes were suitable. The differences in the architecture of the systems that they were looking at purchasing turned this into comparing apples, oranges, and grapefruit (as a general metaphor for the problem, not a specific comparison of these products). So how were they really supposed to compare the proposed solutions?
Sure, the solutions all had to have the right usable capacity. But their application had serious performance needs. If they got into production and then found a problem with the system, would they be asking questions about the ability of the storage to keep up with the rest of the environment? They needed a way to ensure that the solution was mapped back to their true business requirements.
This was also a challenge for the vendors. Each had their own solution which they believed was well suited to the need as it was described to them. (I assume that the competition has the same goal of a satisfied customer at the end of the day.) So how could they convince the customer that their solution was complete enough to meet the business need?
As a part of the discussion with the customer, we noticed that they were making a leap from the performance that they needed for their application to the storage components that they were looking to buy. This is a necessary step, but one we were convinced they should not be handling (or at least not alone).
The Solution
EMC proposed that the customer let the storage vendors do the translation from the application performance needs to the solution. Since the solutions were not going to be simple to compare at the hardware level, and the customer did not have specific hardware need (it was a business need), why not make the purchase based directly on the business need?
We worked with the customer to detail their application performance profile. They not only had their older production system as a model, but they had large-scale testing of the new version well enough developed to have a really good handle on their I/O profile. We worked to build a model.
Now the customer had a new set of requirements: T iops at 8 KB that are 20% write and 95% random, L iops at 8 KB that are 60% write and 100% sequential, and W iops at 32 KB that are 15% write and 80% random. (The model here is a rough set of requirements, not the actual requirements of the customer.) They could then ask the vendors to propose both a solution to meet these requirements and a method to demonstrate that the requirements were met.
The Results
The change in the discussion was immediate. Now the vendors had a specific performance target to meet. EMC believed that we met the new performance requirements with the configuration that we had been proposing. We offered guarantee language that put a performance test in as a part of the acceptance criteria, which included the use of the iorate performance tool and detailed the performance model translated into the iorate input files.
This was also much simpler on the customer. Rather than trying to reason out all of the vendor claims about how the work was done, they were able to work from their specification of what they needed the array to handle. It was then up to the vendors to propose configurations to meet the need, and to discuss how to document that to the customer.
Additional Information
It turns out that there was additional value in actually performing the storage performance test. There were some host settings that were not optimized for multi-path I/O. As a result, the array was not able to meet the performance needs during the initial test - we were waiting on one HBA to do the work of 4. By finding this early, the customer eliminated a problem that might not have otherwise been found until production was live, when it would be both harder to diagnose and more complex to change (as changes to a live production system always are). This gave the team confidence in what they could expect out of the array and from an individual server to the array. When questions came up later in the pre-production testing, the team knew if the storage was getting close to their tested values - and if not, then there was probably not a problem in the storage area.
As a proper disclaimer, those who read through the details of iorate, or read my about page or LinkedIn profile, will find that I am the author of the iorate performace tool discussed here. That has limited impact (beyond being familiar with it) on the choice of this tool for general storage testing. The tool is array independent, and was developed to help another customer do storage performance profile testing back in the late 1990s. Since then it has become fairly widely used across the industry (I have seen IBM propose it as a tool to test their storage, among others) and around the world (I get e-mail questions from all over). I don't ask for anyone to trust that the code is independent - feel free to read through the source code, or even compile your own version.
One additional note on storage performance testing is that not all arrays test the same. For example, the IBM XIV array does not save blocks of all zeros to disk (they log it as empty tracks of data). So a performance tool that was only writing zeros (and then reading them back) would be very skewed if testing XIV against other platforms, since the XIV would not be doing any I/O to disk. Since the real data will be non-zero, the testing data needs to be as well (iorate uses non-zero data by default). Similarly, the NetApp arrays with WAFL work to turn all writes into sequential writes (much easier on the back-end disk drives). As a result, the write performance seen from a 'fresh' NetApp file system can be dramatically different from one that is very full/fragmented (so that the array has to find smaller places to put writes, forcing the RAID read/modify/write workload onto the drives). Customers need to understand such attributes of their arrays to be able to model the performance that will be sustained for their actual production workloads.
This customer has a need to provide replicated data for disaster recovery. Their current choice for this system is to use the database tools to perform the replication. However, they have a complex (3 site) recovery model, and made sure that they understood the storage-based options that were available should the database replication not met their needs in the future.
Conclusion
Customers can benefit from documenting the business/application requirements for their systems and leaning on their vendors to meet those needs. The vendors understand the abilities of their systems much better than most customers ever will. And, of course, the customer needs to 'trust but verify.' In the process, everyone will learn more about the final solution before it goes into production.