OC4J Blocked threads

Infrastructure Setup

It all started back in November 2007 when I was asked by my colleagues at Foreach if I could come and help out to do a performance review. The site in question was the site of De Tijd (or L’Echo in french) which was being ported from Microsoft SQL Server and ASP to a mix of Oracle Database (for articles) and Microsoft SQL Server 2005 (for quotations). The frontend would be handled by Oracle Application Servers (OAS) – with the usual Oracle Containers for Java (OC4J – version 10.1.3.3.0) and Apache servers above. In front of that there’s the usual setup of load balancers and Oracle WebCache servers.

Performance review

Of course you start by the usual performance bottlenecks. Check primary keys, integrities, add unobstrusive logging to the DAO objects (log4j), start caching relevant objects with reasonable expire times (ehcache) … do more stress tests, and so on … Performance was reasonably satisfactory for launch and then that day finally came. The site was launched.

Blocked threads

It wasn’t before long until we started seeing strange behaviours on the origin servers (the servers behind the webcaches). At random times they would just lock-up, needing a restart of the OC4J container about every day. After doing a thread dump we would notice several blocked threads like these:

frame: 0: com.evermind.server.http.HttpSite.getApplication(HttpSite.java:418)
frame: 1: com.evermind.server.http.AJPRequestHandler.initAJP(AJPRequestHandler.java:1011)
frame: 2: com.evermind.server.http.AJPRequestHandler.initRequest(AJPRequestHandler.java:615)
frame: 3: com.evermind.server.http.AJPRequestHandler.run(AJPRequestHandler.java:243)
frame: 4: com.evermind.server.http.AJPRequestHandler.run(AJPRequestHandler.java:187)
frame: 5: oracle.oc4j.network.ServerSocketReadHandler$SafeRunnable.run(ServerSocketReadHandler.java:260)
frame: 6: com.evermind.util.ReleasableResourcePooledExecutor$MyWorker.run(ReleasableResourcePooledExecutor.java:303)
frame: 7: java.lang.Thread.run(Thread.java:595)

Most of them were also pointing to the URLs of the detail of an article. After finding no relevant bottlenecks in our application. We did notice a lot of BLOCKED threads when we did a stress on the article page. If only I could look at HttpSite.getApplication(). I tried finding the source code for Orion, but of course that failed. So I reverted to look at the bytecode from oc4j-internal.jar *coughs*.

Once you head over there you will see that this method looks for a contextCache which is based on the path. The path is actually just the name of the URI in this case (without the querystring). You will also notice there’s a JVM lock there – when the path is not found, creating a new entry in the contextCache. So euhm wait … we have pretty URLs … does that mean we are “poluting” the contextCache with entries to our pretty URLs and locking the whole thing down for no reason ?
Seems like it was … we needed a quick fix so we changed the backend to work with non-pretty URLs and used a ProxyPass to put the logic of turning pretty URLs in non-pretty URLs.

This means you end up with 2 virtual hosts. One that does the frontend serving with pretty URLs and ProxyPasses to … the same machine (other vhost) with non-pretty URLs.

Result

No more blocked threads. contextCache size was kept to a minimum. Apache handles the ProxyPasses quite well. Stable running sites since then.
Are there drawbacks ? Yes:

  • You are doing things more than once by ProxyPassing
  • You loose your pretty URLs and need to rebuild them when generating page … but hey that can be done pretty quick performance wise
  • You have to be careful how you configure your applications and mount points.
  • You have to be careful how you redirect … You don’t want to redirect to a non-pretty URL (which might only resolve internally anyway).

So seems like having your pretty URLs handled by OC4J was not a good idea. I can’t keep but wonder if Tomcat would have had any issue with this…

Comments are closed.