Tracking known Sakai performance issues that we need need to address
sakai_2-5-x
sakai_2-4-x
- Lance's response to "UC Berkeley in crisis"
There has been a lot of excellent input on this issue. I would add the following comments: 1) Do not discount the possibility of SAK-8932 as Stephen Marquard suggested. This issue is not limited to Chat but can be triggered by any JSF application AFAIK. The stuck threads could lead to the kind of request backlogs you are reporting. We have only seen this crop up once or twice, but it does seem to be load related so you may be seeing it in your environment. 2) Hardware is cheap - Since we have upgraded to eight servers with 10GB heaps (i.e. total 80GB heap), Sakai is behaving *MUCH* better under load. Our hardware change included moving from 32bit to 64bit OS, cutting the number of app servers in half (i.e. 16 --> 8), keeping the total number of CPUs in the cluster at 32, and going from 32GB total heap to 80GB. 3) Are you seeing any OutOfMemoryExceptions? Before our 64bit upgrade, we were seeing 10 - 15 of these a day. Since the upgrade, I have not seen one OOM error. 4) Turning off quotas does significantly reduce the amount of XML DOM parsing you will do, but it was not a major contributing factor to our stability. Let us know what we can do to help... L
- Stanford: Critical performance problems in production. Please help!
- Email from Lance Speelmon
- https://oncourse.iu.edu/access/wiki/site/3001b886-1069-4fb7-00d5-8db4b3a85f74/home.html
Adi, Let me see if I can outline the changes: 1) DBCP settings we have been running for 2+ years: minSize=10 initialSize=10 maxSize=50 2) When we started seeing DBCP having problems establishing new database connections, we switched to: minSize=50 initialSize=50 maxSize=50 * These settings served us pretty well until we saw the 2x load increase the first week of classes. 3) Once the load really hit we tried: minSize=150 initialSize=150 maxSize=150 * We were still seeing errors with creating new database connections and DBCP deadlocks. 4) Our current settings after switching to c3p0: minSize=150 initialSize=150 maxSize=150 * We still saw connection errors, but c3p0 was able to cope without any deadlocking. 5) Now that we think we have resolved our Oracle connection issues, we are considering moving to the following settings for c3p0: minSize=10 initialSize=10 maxSize=150 * The change that we think resolved the Oracle connection issues were increasing number of dispatchers, and disabling automatic memory management. Thanks, L
Do you have minIdle and maxIdle set? and does maxIdle = maxActive? That will ensure you don't create new db connections and will help you scale much better. We have 8 appservers and use: minIdle@javax.sql.BaseDataSource=1 maxIdle@javax.sql.BaseDataSource=14 initialSize@javax.sql.BaseDataSource=15 maxActive@javax.sql.BaseDataSource=14 with 400 requests per second peak, I'm don't see why you would need 2400 db pool connections -- maybe 400 * 2 for safety, but you are just eating PGA unnecessarily with all those connections, and that memory could be used for SGA instead (we reduced our PGA from 512m to 256m and haven't seen problems). Adi
- Email exchange with R. P. Aditya : aditya@grot.org
On Fri, Aug 31, 2007 at 11:50:34AM -0700, Thomas Amsler wrote: > > Are the 15 connections in the DBCP connection pool you max setting? I think > > the default is max=50 in OOTB. on our 8 appservers, we use: minIdle@javax.sql.BaseDataSource=1 maxIdle@javax.sql.BaseDataSource=14 initialSize@javax.sql.BaseDataSource=15 maxActive@javax.sql.BaseDataSource=14 and in typical use, even at peak, we only see 2-3 active via Oracle The most important thing for Oracle is that maxIdle = maxActive so that the pool connections are never dropped or recycled since setting up new connections is terribly expensive... Adi
Adi, Would you mind sharing your Oracle memory settings? We are currently running with: db_cache_size = 4096M (from 5120M) shared_pool_size = 3072M (from 4096M) java_pool_size = 250M (no change) large_pool_size= 2048M (from 4096M) sga_max_size = 20480M (from 24576M) Thanks, L
Hi Lance, We are using automatic shared memory management in Oracle. Base on your settings and our ours, I think are the keys are as the follow. 1. Your shared_pool is too large. SAKAI application codes does not need such a large shared pool. 1 gig can be a good start point (unless you have other applications in the same database). 2. You can set the db_cache_size much greater. We have a total SGA of 6560M and 5872M is used in buffer cache (Oracle assigned it). 3. 255M of PGA is enough based on our settings. 4. If you can set the sga_max_size = 20480M (or even higher as your from), try to use AMM and set the sga_target to at least 18gigs. The following are our parameter settings: sga_max_size =6560M sga_target=6560M pga_aggregate_target=256M The following are automatically generated by Oracle based our target: Shared Pool 624M Buffer Cache 5872M Large Pool 16M Java Pool 32M Other 16M Luke has created a site to put the parameter as a reference: http://confluence.sakaiproject.org/confluence/display/ENC/Oracle+Admini All the parameters can be seen below. Thanks, Drew Zhu Oracle DBA ITCS, University of Michigan
What is the "cursor_sharing" parameter? Setting it FORCE or SIMILAR will force the sharing of similar SQLs and may help in reducing the shared_pool_size. We use force as you can see in the parameter file. Also, if you are using more tools than we use, it should be larger. Thanks, Drew
- SAK-9860 : Excessive db queries generated from Site Info / user service
- From Ian:
Just commited a fix agaisnt SAK-9860 Its not a total fix, but you should be able to patch 2.4.x (once fully tested) and the profiler is saying the number of queries for a since request is now 1 rather than 4 the first time per user and then 0 after that. Needs testing though, and only eliminates the EID/ID SQL.
- From Ian:
- SAK-11279 : Spurious presence events
- From Stephen:
Hi all, If you're running 2.4.0 or 2-4-x in production with presence enabled, you will probably want to apply the fix to presence/courier from: http://jira.sakaiproject.org/jira/browse/SAK-11279 This is a bug that logs 2 presence events every time a presence refresh is made (every 30s per user). Fixing this reduced the volume of presence events in our production system by a factor of 10 or more. Regards Stephen
- From Stephen: