/
Sakai Data Storage - Campus Installations

Sakai Data Storage - Campus Installations

Sakai Data Storage - Campus Installations

Existing File System - AFS

Currently, UC Davis people have home directories in AFS space, and that CourseManagement also utilizes this space.

  1. User's AFS Space Allocation
    User space in AFS is in the form of the directory stucture below:
    Ex.
    /afs/home.ucdavis.edu/home/sn/samerson
    Above, sn is the first initial of firstname, and last initial of lastname.
  1. Course AFS Space Allocation
    AFS space for courses is allocated via CRN, and CourseManagement in MyUCDavis writes data to this directory. The ACL is the MyUCDavis user.
    Ex.
    /afs/.ucdavis.edu/myucd/class/200603/ARE112001/
    Above uses TermCodeCRN, and class uniqueness

Problem Statement

In order to store content in AFS, to enable clustering in Sakai app servers, we must consider certain questions. Here are some of these questions, and respective brief answers to them:
Further information and discussion on the answers to these questions may be found on child pages (e.g. AFS File Mappings, etc)

1. How do we integrate Sakai's resources and our current AFS file system?
*Answer: via replacing ContentHostingService with our own implementation for AFS storage. The current ContentHostingService, backed by DbContentService provides minimal file storage capability

2. When user's create worksites, how does this get mapped?
*Answer:
For user worksites, those would resolve to their current afs home mapping scheme (e.g. home/sn/samerson)
For other worksites, the /afs/ root will be used as a starting point to store content. Project sites will have 2 types (user project sites, and institutional project sites), and course sites will be stored via /TermCode/CRN/Sakai instance structure. See AFS File Mapping documentation for more details.

3. What location will content get mapped to?
*Answer: /afs/ as root path, and according to what type of site. If this is a user workspace, the home cell will be referenced. If this is a course site, then content will be mapped under CRN per given instance. For project sites, if they are user project sites they will be mapped to user home space, according to the ownership of the project (e.g. creator). For other sites, including project sites used for institutional purposes, they will be mapped to another designated area in the content system (to be determined)

4. Do course type of worksites get stored in existing AFS CourseManagement space?
*Answer: Yes, this will get saved in the existing AFS structure, however each instance of Sakai (e.g. SmartSite, CERE) will be able to write to the content appropriately.

5. Where do project/research/ or any other non-user and non-course Sakai sites get stored?
*Answer: possibly under the /afs/ root path, but we need to determine a volume and/or whether or not the site name resolves to anything meaningful over time. Also, we need to define whether or not we are going to allow personal project sites. Institutional project sites are already determined as a requirement.

How Sakai stores content

Current Sakai architecture supports storing course and user content both inside/outside of a database. The content path can be mapped to any given path one gives it in the sakai.properties configuration file:

ref:
http://bugs.sakaiproject.org/confluence/display/FAQ/2.2.7.1+Configuration

The content below is taken from bugs.sakaiproject.org/confluence:

The best place for configuring this is the sakai.properties file.


# the file system root for content hosting's external stored files (default is null, i.e. store them in the db)
bodyPath@org.sakaiproject.service.legacy.content.ContentHostingService =${sakai.home}content


Enable the above line, and point at the root folder for the files to be stored.


# when storing content hosting's body bits in files, an optional set of folders just within the content.filesystem.root

# to act as volumes to distribute the files among - a comma separate list of folders.  If left out, no volumes will be used.
bodyVolumes@org.sakaiproject.service.legacy.content.ContentHostingService = v1,v2,v3


Enable the above line, and set the list of "volumes" for storage.  You can specify one or more volume names, comma separated on this line.  These are folders under the file system root.  Files will be distributed among these volumes.

If you are going to use multiple volume devices, you need to map them to these volume names that live "under" the root.  We have done this with our AFS file storage system at the University of Michigan.  If you are not using separate devices, then you can use any folder names for the volumes.  Provide at least one.

Files will be stored under each volume in a way so that there are not too many in any one folder.  The folder structure we use is:

{{YYYY/DDD/HH/id, where YYYY=year, DDD=day of year, HH=hour of day, and the 1111...=an id-based file name}}

for example,

{{2005/070/03/3223479379834-2343}}

or, using the above root and volumes, it might be:

{{/usr/local/tomcat/sakai/content/v2/2005/070/03/3223479379834-2343}}

Note that the resource name and type is not at all encoded here.  The date/time used to form the file name is the date/time of file creation.

Proposed solution(s)

For both spaces, use the ACL associated with the current MyUCDavis user, however make the reference now the Sakai user. Use the same IP's registered with the MyUCDavis user for the Sakai user.

This solution should be targeted for Fall Quarter 2006. However, in the meantime a proposed solution is to use /afs/ as the root path for file content storage so that a common file system can be utilized. Until the AFS solution can be fully implemented, we are to utilize this strategy.

See current AFS File Path Mappings documentation for further specific details about the AFS file mappings for each one of the spaces below:

a. user's space
Create a .sakai directory within the user's AFS space that the Sakai user account has access to write to. User's will not be allowed to browse this directory, since it is only pertinent to Sakai. Also add a sakai instance directory to the path which only the specific instance (e.g. smartsite or cere) would be able to write to.

b. course space
This would fall under a root sakai directory, and utilize University of Michigan's current file path logic for setting course content file path, and to avoid name collisions.

c. projects/research space
Sakai has many types of sites, and each install can configure these. There will be sites related to projects and research that can be expected. User's space, and possibly another space in AFS, will be utilized for these types of sites. Content stored in user's AFS space will count against the user's AFS quota, specifically the user who owns the site.
A couple of alternatives to project sites can be defined.
1) For each personal project site, store the content in the user's space. For each institutional project site, store content in the AFS project space. This would allow us to utilize AFS quotas on both user's and project space. The metadata in the Sakai database would point to the given AFS paths above.
2) The same pattern as #1, except that sim links are written in the institutional project space directory for personal projects. Institutional project content would be written the same way as above. This option would allow us the flexibility of not having to update the Sakai database in the future (links would stay the same), and also allow us to quickly find dead links (e.g. user's who may have left the institution).

Tool-specific file system storage

Some Sakai tools use custom paths to store assets in the file system, outside the ContentHostingService. Each case will have to to be addressed accordingly.

  • Samigo (Some 2.2 release notes remark on this topic)
    • Pre-2.2 Samigo can be configured to store content in either the database, which is not recommended by the Samigo team, or in the file system. In reality, the non-DB option simply removes the 2nd of a two-part process for file uploading:
      • files are uploaded to the systems configtured temp directory (/tmp in linux). ref's to those files are functional
      • if DB storage is enable, those files are then moved to the DB and ref's are updated
    • the following refers to 2.2 but may also apply to prior versions:
      • for general questionType media, sakai.properties has
        samigo.answerUploadRepositoryPath=/sakaitmp/
        samigo.sizeThreshold=512
        samigo.sizeMax=20480
        samigo.saveMediaToDb=false
        
      • for QTI imports, the com.corejsf.UploadFilter.repositoryPath parameter in web.xml is set to /tmp by default and can be overridden in the sakai.properties file as well:
        samigo.answerUploadRepositoryPath=/sakaitmp/
        
  • Melete
    • Melete uses a propertie to confiure the file system location for documents. However in pre-2.2, this configuration does not appear to be functional. The assumed location is in /var

Some Implementation details/status

The default registered service that handles content in Sakai is the DbContentService, which extends the BaseContentHostingService. A modification to the path generation in the DbContentService will allow the path to be configured for proper content storage in AFS. Creating a UCDavisContentService, which extends the functionality of the DbContentService, will enable custom configurations and "overrides" of methods which establish the file path used to store content. There are several considerations for this to be handled properly:

1. The site type must be determined, so the user's space may be used or that of MyUCDavis Course Management.
This is currently in a somewhat working state in code (e.g. prototype), the only question is the project site types, and where to map them.

2. The user's quota should be checked prior to storage?
Needs to be implemented, and quota check should occur when files are attempted to be saved.

3. The path must be customizable
Needs to be implemented. Currently, this customization can be implemented by overriding the bodyPath method in the DbContentService. This can later be passed in, via parameterization, to a n existing shell/perl script. There are some advantages and disadvantages of each method, and error handling is one of them.

4. Error handling considerations
The content hosting implementation must be able to handle file write exceptions appropriately. File read exceptions should not occur, as they should be handled internally by Sakai (Java) exception handling methods. If there is a file write error, they may be Quota extension errors, AFS downtime errors, or others. The ability to trap these type of errors and respond to them well, which the current CM system (MyUCDavis) does not handle, is crucial.

5. Script language considerations
The current script that handles volume create, quota extension, etc. is a Korn shell script. We can choose to modify this existing script, however there are alternative possibilities of using Perl (e.g. AFS Perl), or AFS Java (e.g. openAFS APIs). The latter two (Perl,Java) are preferred, and Java most likely because the existing code base for Sakai is developed in Java, and this would minimize future maintenance issues. Also, using Java APIs and JNI would allow flexibility of getting back stronger error messages from AFS Errors.

A preliminary UCDavisDbContentService file is attached, which will replace the DbContentService as the registered service for storing content using Spring injection. This is a Sakai 2.1 example.

Implementation Algorithm In Summary

*Note: The algorithm below refers to file writes into AFS, from Sakai. File read process involves a separate process of reading file paths from the Sakai internal database, and trying to access this file in AFS. Currently, standard Sakai error handling will apply to file read errors.

  1. Determine bodypath for file storage, from ContentHostingService (e.g. append instance name, etc.)
    • Determine if bodypath exists via ContentHostingService. In order to determine existence, find:
      • Site type from a Sakai reference object (that ContentHostingService uses) if site type is:
      • Depending upon which type of site, find the appropriate volume relative to the bodypath that should be created
      • Determine userid from reference path (e.g. /content/user), instance, or other pertinent information from reference depending on site type (e.g. siteid, etc)
    • else if bodypath doesn't exist.. try to run volume create script based on volume previously determined
  2. Save file given by ContentHostingService, and run quota extension check against current size of resource (e.g. byte length) vs. volume quota
    **If volume quota gt resource bytes, store content. Else increase quota by a factor of x
  3. Handle errors (checked exceptions) at the Java level via: AFS errors either bubble up from shell/perl script to Java, or determined by Java OpenAFS APIs.
  4. If errors, log errors and either try again (e.g. quota extend), or fail and throw exception (e.g. AFS down). No logging written if no errors captured

Further Implementation Considerations

1. There will be need to be a mechanism (tool?) to display how much user's space is taken by Sakai, etc. This will reduce the amount of support calls, and be consistent with the current CourseManagement tools available for MySpace. The tool ideally would be part of the user's resource area, integrated in their message of the day, etc.

2. The path for storing content should be highly configurable via properties, or other variables.

3. We could utilize fixed quotas on course and project sites (~1GB) in order to minimize quota extend checks during the process of saving content to the file system.