Indiana Answers To UC Davis SAN Questions

Response from Troy Williams @ Indiana

Jon,

We do have a SAN involved for our Sakai storage, but are not performing fiber based delivery to the directly to the application servers.  The application servers are virtual machines hosted via ESX installation which does utilize fiber channel based storage.

In January 2007 we replaced our existing storage area network via a purchasing RFP, several vendors bid with Hitachi Data Systems (HDS) wining.  The HDS system replaced our existing IBM fiber channel storage subsystems.  The bid also included a provision for NAS replacement as well; HDS again won replacing a 4-node NetApp cluster.

1) Please describe your configuration, including hardware and software.

Our "Enterprise" SAN consists of:
Hitachi TagmaStore USP 600
128GB cache
92TB Raid5 FC disk

Hitachi AMS500
8GB cache
40TB Raid5 SATA disk


NAS solution:
Hitachi BlueArc 2100 NAS gateway cluster

Application Server:
The application servers are hosted on 4 HP physical servers running ESX.  Each physical server is an 8-core AMD based system with 48GB of RAM.  The ESX servers run with dual fibre channel attachment to 2 physical fiber channel switches (brocade model 48000).  The ESX servers have an active-active connection to the HDS SAN.

The virtual machines are running RedHat linux 4 with 4CPUs and 12GB RAM.  The ESX provides the ability to dynamically provision additional AppServer resources, clone AppServers and provide a disaster recovery for the base operating systems.

Application Server Storage:
The base operating system is installed on virtualized disk presented to the AppServer via the ESX layer.  Only the operating system resides on this storage.

The application tier is hosted via the NAS solution (gateway cluster), delivered via NFS over a private non-routed network.

We perform backups to a Tivoli Storage Manager backup server as well as file level snapshots.  The Sakai Application group has the capability of recovering file content via CIFS access to the NAS (limited to support desk via Active Directory group assignment).

Database Server:
The database server is currently hosted from an IBM pSeries server running AIX.  The server is an LPAR which allows dynamic CPU and RAM allocation for peak workloads.  The server had fiber channel connection to the HDS san for the Oracle storage needs.  The connections are active-active for high availability and load balancing.


2) What protocol do you use?
-       NFS for the Sakai Application Server tier
-       Fiber channel NAS tier
-       Fiber channel ESX tier
-       Fiber channel Database tier

3) Can, and do you use multipath-ing for high availability?
Yes, multipath-ing is used for all tiers of fiber connected devices:
-       ESX servers
-       Hitachi NAS
-       Oracle Database Server
-       Each device is connected with minimally 2 physical fiber connections to 2 physical fiber switches.

4) How do your application servers connect to the data?  Are they all
directly connected to the SAN, does one node connect and share the data?
Described above, summary: The Virtual machines connect via NFS to a NAS server via NFS.  The data sharing occurs over NFS delivered via a private non-routed network.


5) What strategy do you use to keep snapshots in sync with the sakai database?
I'm not directly involved with this portion, I can provide you a contact if desired.   The help desk can restore file-level snapshots, I'm assuming there may be some meta-data updates that need to occur to complete file-level recovery.


6) Are the filesystem snapshots done via the SAN hardware or some external software?
Performed via NAS tools.


7) Do you have any advice or warnings for moving from using AFS to a SAN?
Not a lot to add, few random thoughts:
-       Fiber channel SANs can be complex to manage
-       Appropriate training should be considered before implementing a SAN.
-       Be sure that you match your workload with appropriate hardware (need appropriate amount of CACHE and SPINDLES for your workload)
-       If the SAN will be a shared SAN, ideally have the capability to segment resources for applications (devote cache for application delivery)
-       Ensure you match correct disk for workload (7200RPM 750GB SATA drive will not perform adequately for most random I/O workloads)

Hopefully this information will be useful.
-Troy

Troy Williams
Storage and Virtualization
UITS, Indiana University