dinsdag 24 maart 2009

Oracle Coherence experiences from the field

Last week I had a small conversation about Oracle Coherence on Twitter (#OraCohMDM). Here a more in depth description about the experience we've got with this product. Last year we came across a couple of projects which were an excellent case for the usage of a grid database based on the middle tier.

The first case was about processing bulk data in a small timeframe. Every minute some 150MB XML data was delivered (zipped) to our application by a Web Service. The requirement was that the data needed to be processed within 20 seconds and selected parts of the data needed to be send to to a bunch of subsribers. The architecture should be scalable for consumers and data load. For history reasons all the data needed to be stored in a database.

The second case could be described a having a high CIA constraint (Confidentiality, Integrity and Availablity). Multiple providers send data via Web Service, the messages were, contrary to the first case, small in size. The amount of messages however is very much higher, some 10-100 messages per second.

Since we know that (Oracle) databases are known to be stable, why can't we implement this with the help of a database? The answer is simple, there is a performance bottleneck between the middle tier and the database. In 'normal' situation, when dealing with for instance a web Based transaction application, the JDBC connection is able to do it's work properly. But when faced upon a situation with high data volume or high message volume, JDBC is clogging up. So how does a Grid database does a better job? Elements of success for a Grid database are the ability to scale up and down easiliy and minimize the risk of loosing data when for instance a server goes down. Preferrably there should be no master of the grid, all grid elements are equal. Simply put the Grid works as follows. A Grid is a series of interconnected nodes, which communicate via an optimized network protocol. Every Grid element has knowledge of the grid, what kind of work to do and the existence of their neighbours, just like bees in a beehive. When a new data object is entered in the Grid one Grid node takes responsibility and contacts another Grid node to be it's backup (preferably a node on another server). When a new node enters the grid, it communicates some sort of "Hello, I'm here", just like when you enter a party. All grid nodes start communicating with the new node, and (some) data is redistributed across the grid. This in order to minimize loss of data. When a node leaves the grid and again the (some) data is redistributed, this to avoid that a Data Object is only available on one node.
Oracle obtained Coherence (Tangosol) in 2007 because of it's excellent middle tier Grid capabilities. We've performed tests with Oracle Coherence in different hardware and network configurations. Our tests have shown that Coherence nearly scales linearly, with an optimum of around one node (or JVM) per CPU core. The addition of the JRockit JVM even removes further a smart part of the performance.

So how do you start with Oracle Coherence. Coherence is Java based, so a good knowledge of Java is essential. The data structure is stored as key/value pairs in maps (comparable to the Database tables). On each map a Listener can be placed, so an object change results into an event. By adding Object-Relational mapping (for instance with TopLink) you can percolate the newly added data into the database, or load the data from the database. By adding expiration time to the data within Coherence, you don't have to clean up your data.

Where to get more information

- SOA Patterns, deferred service state
- Oracle Coherence
- Oracle Coherence Incubator

Thanks to Oracle PTS (especially Flavius Sana & Kevin Li)

Geen opmerkingen:


  • www.elzmiro.com