Thursday, 6 August 2009

Weblogic JMS Performance Tuning Tips

Here are a couple of real-life tips on tuning JMS performance.

Problem:
The creates and sends JMS messages on an outgoing queue for consumption by another application.

It was observed that when the consuming application was offline for a period of time the number of messages that could be retained on the queue before the JVM heap was filled up was quite low. This was tested to be roughly 7000 messages, after which OutOfMemory exceptions begin to occur. Given that the consuming application could realistically be offline for a period, hence the use of an asynchronous queue, we needed to increase the number of messages that could realistically be stored in the queue.

The exception we get is shown below:


Start server side stack trace:
java.lang.OutOfMemoryError:

Start server side stack trace:
java.lang.OutOfMemoryError
<<no stack trace available>>
End server side stack trace
at weblogic.rmi.internal.BasicOutboundRequest.sendReceive(BasicOutboundRequest.java:109)
at weblogic.rmi.internal.BasicRemoteRef.invoke(BasicRemoteRef.java:127)
at weblogic.jms.dispatcher.DispatcherImpl_WLStub.dispatchSyncFuture(Unknown Source)
at weblogic.jms.dispatcher.DispatcherWrapperState.dispatchSync(DispatcherWrapperState.java:286)
at weblogic.jms.client.JMSSession.createProducer(JMSSession.java:1484)
at weblogic.jms.client.JMSSession.createSender(JMSSession.java:1335)
...


The GC logs also show frequent Full GCs before the server goes out of memory.


Solution Steps:


1. Enabling JMS Paging

Paging had not been enabled for the queue. Despite this queue being persistent, this meant that every message was stored in the JVM memory heap in its entirety. Enabling message paging for this queue means that only the headers for paged messages are kept in memory, significantly reducing the amount heap utilized.

As the messages were being persisted via a JDBCStore to a database, this functioned as a paging store as well, however a FileStore must still be specified as the paging store for the JMS Server, or the JMS Server will not deploy at WLS server start time. This would be due to the need to cater for any non-persistent destinations when paging is enabled. If non-persistent messages are not paged, the size of this FileStore will be negligible. Paged messages still occupy some space on the memory heap as the message headers are still kept in memory.

On enabling paging for the specific queue, the test could cater for roughly 15000 messages before OutOfMemory exceptions occurred. The point at which paging began was set deliberately low to 100. Recovering from the page store does incur a certain performance cost, so in Production this was set to a more reasonable number based on the peak number of messages expected in the queue under normal conditions.

Despite the gain in number of messages that could be catered for, the heap utilisation graphs were very similar to those before paging was enabled. This showed no minor GCs, only full GCs at fairly frequent intervals.


2. JVM Settings and Garbage Collection Tuning

The untuned JVM heap size was 512Mb. This value could be increased, but test results after tuning the JVM settings indicate that this was probably more than adequate.

Examining the current JVM settings uncovered some settings that needed to be changed.

The most significant issue with the JVM settings was the NewSize value. This was set very high to 384Mb out of the total heap of 512 Mb.
A reasonable New Generation area would normally be 20-25% of the total heap size and setting it larger than the Tenured Generation area is guaranteed to cause unhealthy GC operations. In addition, it is good practice to use NewRatio rather than NewSize to avoid fixing an absolute size. A NewRatio of 3 (1:3, i.e. 25% of heap) or 4 is considered the most appropriate for WebLogic Server applications. NewSize was therefore dropped and a NewRatio was set to 4 (20% of total heap).

The SurvivorRatio value was set to a reasonable value of 3.
TargetSurvivorRatio, however, was unset meaning that the default of 50 applies. 80 would probably be a better setting, meaning that the switch between survivor spaces in the JVM heap would occur at 80% rather than 50%. The performance improvement from this should be noticeable in the frequency of minor GC, though not dramatic.

The PermSize values were high, with PermSize and MaxPermSize both set to 384Mb. These were reduced to 64Mb and 128Mb respectively, which should be more than adequate. Though these changes are unlikely to improve performance, having the PermSize set to high will needlessly consume memory.

The effect of setting a good NewRatio value was dramatic. Many minor GCs were the rule, with infrequent full GCs. 60,000 messages were added to the queue before the heap was approaching full. We ran an overnight, and somewhere between 60,000 and 70,000 messages the OutOfMemory exceptions occurred.

Scaling this up to the Production environment which has 1Gb Heap, shows that the server could easily cater for the expected load.