Michigan Answers To UC Davis SAN Questions

Answer from R.P. Aditya @ Michigan

Hi Joncarlo,

On Wed, Jun 18, 2008 at 03:18:51PM -0700, Joncarlo Ruggieri wrote:
> 1) Please describe your configuration, including hardware and software.

Our primary production appserver cluster for CTools (our branding of Sakai
locally) consists of six Dell 1950s (64bit) plus one Dell 2650 (32bit) acting
as a Sakai search server running Redhat 5 each connected to a public VLAN and
also to a private VLAN with GigE (one interface each). The private VLAN is
used for NFS to a Netapp 3020 filer. The same filer is used for both Oracle
database and Sakai file (content resource) storage. We have a single Sun T2000
running Solaris 10 as the Oracle 10g database server.

We also have a secondary (failover) production cluster that is in a different
datacenter with the exact same hardware backed by another Netapp 3020. The
file storage is snapmirrored every 10 minutes from the primary netapp to the
secondary; Oracle standby replication is used between the two sites every 10
minutes too.

You can see stats on our primary filer at:

https://ctstats.ds.itd.umich.edu/stats-bin/drraw.cgi?Mode=view&Dashboard=1151462293.28892&View=3&Filter=

95% of what is stored on the filer and what the filer does is CTools related,
though we do store a few other things on there to support CTools (Oracle
backups, RHEL5 images etc. etc.)

We copied everything over from AFS and switched to using NFS for content
resource in late May of this year (though we had started to store new files in
NFS starting in 2008 in antcipation of the move).

> 2) What protocol do you use?

NFS

> 3) Can, and do you use multipath-ing for high availability?

The filer is connected redundantly (2 interfaces) to the same switch for
load-sharing and failover, however the appservers are singly connected to the
switch.

> 4) How do your application servers connect to the data?  Are they all
> directly connected to the SAN, does one node connect and share the data?

They all mount the same volume off of the filer using NFS.

> 5) What strategy do you use to keep snapshots in sync with the sakai database?

we backup the database nightly with RMAN; as long as the sakai database backup
image is older than the filesystem snapshot, everything is fine (apart from a
few orphans in the filesystem)

> 6) Are the filesystem snapshots done via the SAN hardware or some external software?

the NAS hardware, Netapp, does the filesystem snapshots as part of the
snapmirror license

> 7) Do you have any advice or warnings for moving from using AFS to a SAN?

on the appservers, we used to have, in sakai.properties:

bodyPath@org.sakaiproject.content.api.ContentHostingService = /afs/umich.edu/group/ctfs
bodyVolumes@org.sakaiproject.content.api.ContentHostingService = fs1,fs2,fs3...fs99

where ... was a comma separated list of all 99 volumes -- Sakai spread the
files over all of those (we did that to limit AFS volume size) and now in NFS
it looks like:

bodyPath@org.sakaiproject.content.api.ContentHostingService = /ctfs
bodyVolumes@org.sakaiproject.content.api.ContentHostingService = fs2008

and we'll change the bodyVolumes property to be fs2009 next year etc.. However
to preserve access and to do the file migration one volume at a time, /ctfs
has a bunch of symlinks that look like this now:

lrwxrwxrwx   1 root    root     16 May 18 23:41 fs1 -> /ctfs/fs2007/fs1
lrwxrwxrwx   1 root    root     17 May 18 23:41 fs10 -> /ctfs/fs2007/fs10
lrwxrwxrwx   1 root    root     17 May 18 23:41 fs11 -> /ctfs/fs2007/fs11
lrwxrwxrwx   1 root    root     17 May 18 23:41 fs12 -> /ctfs/fs2007/fs12
lrwxrwxrwx   1 root    root     17 May 18 23:41 fs13 -> /ctfs/fs2007/fs13
lrwxrwxrwx   1 root    root     17 May 18 23:41 fs14 -> /ctfs/fs2007/fs14
lrwxrwxrwx   1 root    root     17 May 18 23:41 fs15 -> /ctfs/fs2007/fs15
lrwxrwxrwx   1 root    root     17 May 18 23:41 fs16 -> /ctfs/fs2007/fs16
lrwxrwxrwx   1 root    root     17 May 18 23:41 fs17 -> /ctfs/fs2007/fs17
lrwxrwxrwx   1 root    root     17 May 18 23:41 fs18 -> /ctfs/fs2007/fs18
lrwxrwxrwx   1 root    root     17 May 18 23:41 fs19 -> /ctfs/fs2007/fs19
lrwxrwxrwx   1 root    root     16 May 18 23:41 fs2 -> /ctfs/fs2007/fs2
lrwxrwxrwx   1 root    root     17 May 18 23:41 fs20 -> /ctfs/fs2007/fs20
drwxr-x--- 102 tomcat5 tomcat 4096 May 20 23:17 fs2007
drwxr-x---   5 tomcat5 tomcat 4096 Dec 31 19:00 fs2008
lrwxrwxrwx   1 root    root     17 May 18 23:41 fs21 -> /ctfs/fs2007/fs21
....

and pointed to the corresponding AFS volume while we were copying files over.

If I haven't answered your questions or you have others, feel free to ask.

Thanks,
Adi