Many customers thinking of a new storage array focus on the work required to set up a new server with storage. This is an important item to research. And some storage vendors can make it a bit simpler than others to connect the host HBA WWNs to the capacity in the array. However, I submit that that there is a much more common storage management challenge, and one that can be made a lot simpler.
The Problem
Customers are constantly adding more capacity to their systems. Application retain more data. New applications are added to servers. Upgrades need more space. It seems that every few months each server is getting some additional storage capacity. So how hard should it be? After all, the servers and storage are already connected.
So let's make this a little more complicated. Many customers are now rolling out VMware ESX clusters of 8 or more servers. Some of these servers have more than 2 HBA ports. So when adding a device to the cluster, there may be 30 or more HBAs that all need to get their access updated.
The Solution
The EMC Symmetrix VMAX has a unique way of managing host access to LUNs. Most arrays fit into one of two storage methods:
- Manage LUNs to each HBA, or
- Manage LUNs to groups of HBAs (one one or more servers)
The EMC Symmetrix DMX fits into the first camp. Each LUN to HBA relationship is managed independently. For large cluster environments (ESX or otherwise), manging access to each HBA is both time consuming and risky. It takes careful processes to ensure that the new LUNs are always matched with all of the right HBAs when it is added. Are the processes always updated when an HBA is replaced or a server is added to/removed from the cluster? This was a very workable design when SANs were smaller (and so were the clusters). A better option is needed.
The EMC CLARiiON line has been a great example in the second camp for years. Host groups are defined, and a new host can fairly easily be added to or removed from the group. HBA changes are made on the host and reflected in the group, so that the group view stays up to date. Adding a new LUN to the group takes seconds, and is reflected across all of the hosts in the group.
The EMC Symmetrix VMAX with Auto-provisioning Groups (AP) takes this one step further. AP adds the idea of cascaded initiator groups - like nested host groups. So ESX servers can boot from the SAN, and each set of HBAs will be stored in an initiator group for that server. These are paired with private LUN(s) to create a storage view. Then a cluster-wide initiator group is created which contains the initiator group of each server - not by WWN, but using the (cascaded) names of the server initiator groups. The cluster-wide LUNs are then associated with the cluster-wide initiator group to expand the storage 'view' for each host in the cluster. In this way, it remains easy to manage both the host access to private LUNs and the cluster access to shared LUNs.
A nice description of all the details of such an implementation with VMware can be found in the EMC White Paper HERE.
The Results
VMAX customers can easily manage large amounts of storage even in complex clustering environments. For cluster hosts that need private LUNs, whether for boot information, local configuration storage, logging, etc., each server can easily have a private storage view. Adding a private LUN just means adding it to the private view for that host. And changing an HBA is a simple update to the initiator group for that host.
For the shared LUNs, adding capacity is just as easy. Add the LUN to the shared view, and all servers in the cluster get access on all of their defined HBAs. And since the initiator group is cascaded, updates to the HBA information for a given server automatically flow up to the cluster view.
Additional Information
I am confident that some folks reading this will say that the answer is easy: use 'Thin Provisioning' (for VMAX, Virtual Provisioning) to give the servers huge LUNs, and let them consume the space as they need it. And for some environments this works very well. However, it also presents a new risk: storage bankruptcy. That is, an application (like an Oracle database) goes to write to some of the 'free' space it has on its drive, only to have the write fail because the array (or at least that thin pool) is out of space. Hopefully, the application can crash cleanly and be recovered when new storage is added. Maybe not. The point is, storage management under Thin Provisioning changes from something that can slow down the implementation of a project to something that needs to be monitored 24x7 or it can create catastrophic data loss.
The point is, some customers have a very tough time ordering 'just in time' storage. They need a purchasing process that allows them to purchase at a later time the storage they did not buy when the applications were implemented. If the project was justified based on the initial costs, then subsequent growth costs may be left on the budget of IT, rather than being paid by the business unit that is using the application. This presents a major process and accounting challenge for some businesses, which they will need to adjust to as IT moves into a service delivery model.
There is also the challenge that handing out huge LUNs invites the users of those systems to use the space. After all, who would not like to have space for an extra database export, or room to rebuild a set of image files, or.... Unused disk space has a way of filling up, and unless there is a reason to be careful about what it fills up with, there can be a lot of waste.
Hopefully this won't be taken the wrong way. I believe that Thin Provisioning is a great thing for customers (would we make if if EMC did not see the value?). I also think that it cannot live up to the industry hype ("all systems at 100% utilization with less administrative work and no risk" - if your thin pools are at 100% full, there is a lot of risk...). I recommend that customers not over-subscribe their production 'thin' space until they have run out of space twice in a non-production setting. Once that has happened, they will understand what to watch for, how serious it is, and what needs to be done to restore normal operations (even if that means deleting some devices so that others have the space to function).
Better storage management tools is another option. There are some applications, such as EMC Ionix ControlCenter, that can mask the complexity of storage management from the users. By creating cluster entities and allocating storage to those entities rather than the individual servers, customers are able to achieve a simplified view of the ongoing operations. The actual array views of the access, however, are still those of the native array tools (where VMAX excels).
Conclusion
VMAX customers can more easily manage both private and shared LUN access to clustered servers thanks to Auto-provisioning Groups and the cascaded initiator groups they support. And simpler storage management is always helpful.