Chapter 10.  Caching

Table of Contents

1. Data Cache
1.1. Data Cache Configuration
1.1.1. Distributing instances across cache partitions
1.2. Data Cache Usage
1.2.1. Using the JPA standard Cache interface
1.2.2. Using the OpenJPA StoreCache extensions
1.3. Cache Statistics
1.4. Query Cache
1.5. Cache Extension
1.6. Important Notes
1.7. Known Issues and Limitations
2. Query Compilation Cache
3. Prepared SQL Cache

OpenJPA utilizes several configurable caches to maximize performance. This chapter explores OpenJPA's data cache, query cache, and query compilation cache.

1.  Data Cache

The OpenJPA data cache is an optional cache of persistent object data that operates at the EntityManagerFactory level. This cache is designed to significantly increase performance while remaining in full compliance with the JPA standard. This means that turning on the caching option can transparently increase the performance of your application, with no changes to your code.

OpenJPA's data cache is not related to the EntityManager cache dictated by the JPA specification. The JPA specification mandates behavior for the EntityManager cache aimed at guaranteeing transaction isolation when operating on persistent objects.

OpenJPA's data cache is designed to provide significant performance increases over cacheless operation, while guaranteeing that behavior will be identical in both cache-enabled and cacheless operation.

There are five ways to access data via the OpenJPA APIs: standard relation traversal, large result set relation traversal, queries, looking up an object by id, and iteration over an Extent. OpenJPA's cache plugin accelerates three of these mechanisms. It does not provide any caching of large result set relations or Extent iterators. If you find yourself in need of higher-performance Extent iteration, see Example 10.22, “ Query Replaces Extent ”.

Table 10.1.  Data access methods

Access methodUses cache
Standard relation traversal Yes
Large result set relation traversal No
QueryYes
Lookups by object id Yes
Iteration over an Extent No

When enabled, the cache is checked before making a trip to the datastore. Data is stored in the cache when objects are committed and when persistent objects are loaded from the datastore.

OpenJPA's data cache can operate in both single-JVM and multi-JVM environments. Multi-JVM caching is achieved through the use of the distributed event notification framework described in Section 2, “ Remote Event Notification Framework ”, or through custom integrations with a third-party distributed cache.

The single JVM mode of operation maintains and shares a data cache across all EntityManager instances obtained from a particular EntityManagerFactory. This is not appropriate for use in a distributed environment, as caches in different JVMs or created from different EntityManagerFactory objects will not be synchronized.

1.1.  Data Cache Configuration

To enable the basic single-factory cache set the openjpa.DataCache property to true:

Example 10.1.  Single-JVM Data Cache

<property name="openjpa.DataCache" value="true"/>

To configure the data cache to remain up-to-date in a distributed environment, set the openjpa.RemoteCommitProvider property appropriately, or integrate OpenJPA with a third-party caching solution. Remote commit providers are described in Section 2, “ Remote Event Notification Framework ”.

OpenJPA's default implementation maintains a map of object ids to cache data. By default, 1000 elements are kept in cache. When the cache overflows, random entries are evicted. The maximum cache size can be adjusted by setting the CacheSize property in your plugin string - see below for an example. Objects that are pinned into the cache are not counted when determining if the cache size exceeds its maximum size.

Expired objects are moved to a soft reference map, so they may stick around for a little while longer. You can control the number of soft references OpenJPA keeps with the SoftReferenceSize property. Soft references are unlimited by default. Set to 0 to disable soft references completely.

Both the QueryCache and DataCache can be configured to use a backing Lru map rather than the default concurrent HashMap. Note that enabling the Lru cache can hurt performance as this map in not as scalable as the default map.

Example 10.2.  Lru Cache

<property name="openjpa.DataCache" value="true(Lru=true)"/>
<property name="openjpa.QueryCache" value="true(Lru=true)"/>

Example 10.3.  Data Cache Size

<property name="openjpa.DataCache" value="true(CacheSize=5000, SoftReferenceSize=0)"/>

You can specify a cache timeout value for a class by setting the timeout metadata extension to the amount of time in milliseconds a class's data is valid. Use a value of -1 for no expiration. This is the default value.

Example 10.4.  Data Cache Timeout

Timeout Employee objects after 10 seconds.

@Entity
@DataCache(timeout=10000)
public class Employee {
    ...
}

Entities may be explicitly excluded from the cache by providing a list of fully qualified class names in the ExcludedTypes argument. The entities provided via ExcludedTypes will not be cached regardless of the DataCache annotation.

Example 10.5.  Excluding entities

Exclude entities foo.bar.Person and foo.bar.Employee from the cache.

<property name="openjpa.DataCache" value="true(ExcludedTypes=foo.bar.Person;foo.bar.Employee)"/>


Entities may be explicitly included in the cache by providing a list of fully qualified class names in the Types argument. Any entities which are not included in this list will not be cached.

Example 10.6.  Including entities

Include only entity foo.bar.FullTimeEmployee in the cache.

<property name="openjpa.DataCache" value="true(Types=foo.bar.FullTimeEmployee)"/>


See the org.apache.openjpa.persistence.DataCache Javadoc for more information on the DataCache annotation.

A cache can specify that it should be cleared at certain times rather than using data timeouts. The EvictionSchedule property of OpenJPA's cache implementation can be input in two different formats. The first is a cron style eviction schedule. The format of this property is a whitespace-separated list of five tokens, where the * symbol (asterisk), indicates match all. The tokens are, in order:

  • Minute

  • Hour of Day

  • Day of Month

  • Month

  • Day of Week

For example, the following openjpa.DataCache setting schedules the default cache to evict values from the cache at 15 and 45 minutes past 3 PM on Sunday.

true(EvictionSchedule='15,45 15 * * 1')

The second format for this property is an interval style eviction schedule. The format of this property is a + followed by the number of minutes between each time that the cache should be evicted.

For example, the following openjpa.DataCache setting schedules the default cache to evict values from the cache every 120 minutes.

true(EvictionSchedule='+120')

Example 10.7.  Bulk updates and cache eviction

Setting EvictOnBulkUpdate to false will tell OpenJPA to not evict from the DataCache when executing an UPDATE or DELETE statement. The default for the value is true.

<property name="openjpa.DataCache" value="true(EvictOnBulkUpdate=false)"/>


1.1.1. Distributing instances across cache partitions

OpenJPA also supports a partitioned cache configuration where the cached instances can be distributed across partitions by an application-defined policy. Each partition behaves as a data cache by itself, identified by its name and can be configured individually. The distribution policy determines the specific partition that stores the state of a managed instance. The default distribution policy distributes the instances by their type as specified by the name attribute in @DataCache annotation. Cache distribution policy is a simple interface that can be implemented by an application to distribute among the partitions on a per instance basis. To enable a partitioned cache set the openjpa.DataCache property to partitioned, and configure individual partitions as follows:

Example 10.8.  Partitioned Data Cache

<property name="openjpa.CacheDistributionPolicy" value="org.acme.foo.DistributionPolicy"/>
<property name="openjpa.DataCache" value="partitioned(PartitionType=concurrent,partitions=
                '(name=a,cacheSize=100),(name=b,cacheSize=200)')"/>

The distribution policy is configured by a full-qualified class name that implements org.apache.openjpa.datacahe.CacheDistributionPolicy. The partitions are specified as value of the partitions attribute as a series of individually configurable plug-in strings. As the example shows, i) each partition plug-in configuration must be enclosed in parentheses, ii) must be separated by comma and iii) the complete set be enclosed in single quote. Each individual partition is a Data Cache by itself and the class that implements the partition can be configured via PartitionType attribute. The above example configuration will configure a partitioned cache with two partitions named a and b of cache size 100 and 200, respectively. The partitions are of concurrent type which is a mnemonic or alias for org.apache.openjpa.datacache.ConcurrentDataCache. The PartitionType is defaulted to concurrent though explicitly mentioned in this example.

1.2.  Data Cache Usage

The org.apache.openjpa.datacache package defines OpenJPA's data caching framework. While you may use this framework directly (see its Javadoc for details), its APIs are meant primarily for service providers. In fact, Section 1.5, “ Cache Extension ” below has tips on how to use this package to extend OpenJPA's caching service yourself.

Rather than use the low-level org.apache.openjpa.datacache package APIs, JPA users should typically access the data cache through the JPA standard javax.persistence.Cache interface, or OpenJPA's high-level org.apache.openjpa.persistence.StoreCache facade.

Both interfaces provide methods to evict data from the cache and detect whether an entity is in the cache. The OpenJPA facade adds methods to pin and unpin records, additional methods to evict data, and provides basic statistics of number of read or write requests and hit ratio of the cache.

1.2.1. Using the JPA standard Cache interface

You may obtain the javax.persistence.Cache through the EntityManagerFactory.getCache() method.

Example 10.9.  Accessing the Cache

import javax.persistence.Cache;
import javax.persistence.EntityManagerFactory;
import javax.persistence.Persistence;
. . .
EntityManagerFactory emf = 
    Persistence.createEntityManagerFactory("myPersistenceUnit");
Cache cache = emf.getCache();
. . .
        

Example 10.10. Using the javax.persistence.Cache interface

// Check whether the cache contains an entity with a provided ID
Cache cache = emf.getCache();
boolean contains = cache.contains(MyEntity.class, entityID);

// evict a specific entity from the cache
cache.evict(MyEntity.class, entityID);

// evict all instances of an entity class from the cache
cache.evict(AnotherEntity.class);

// evict everything from the cache 
cache.evictAll();
        

1.2.2. Using the OpenJPA StoreCache extensions

You obtain the StoreCache through the OpenJPAEntityManagerFactory.getStoreCache method.

Example 10.11.  Accessing the StoreCache

import org.apache.openjpa.persistence.*;
...
OpenJPAEntityManagerFactory oemf = OpenJPAPersistence.cast(emf);
StoreCache cache = oemf.getStoreCache();
...
Alternatively you can just cast the same object returned from the EntityManager.getCache() method.
import org.apache.openjpa.persistence.StoreCache;
...
StoreCache cache = (StoreCache) emf.getCache();

public void evict(Class cls, Object oid);
public void evictAll();
public void evictAll(Class cls, Object... oids);
public void evictAll(Class cls, Collection oids);

The evict methods tell the cache to release data. Each method takes an entity class and one or more identity values, and releases the cached data for the corresponding persistent instances. The evictAll method with no arguments clears the cache. Eviction is useful when the datastore is changed by a separate process outside OpenJPA's control. In this scenario, you typically have to manually evict the data from the datastore cache; otherwise the OpenJPA runtime, oblivious to the changes, will maintain its stale copy.

public void pin(Class cls, Object oid);
public void pinAll(Class cls, Object... oids);
public void pinAll(Class cls, Collection oids);
public void unpin(Class cls, Object oid);
public void unpinAll(Class cls, Object... oids);
public void unpinAll(Class cls, Collection oids);

Most caches are of limited size. Pinning an identity to the cache ensures that the cache will not kick the data for the corresponding instance out of the cache, unless you manually evict it. Note that even after manual eviction, the data will get pinned again the next time it is fetched from the store. You can only remove a pin and make the data once again available for normal cache overflow eviction through the unpin methods. Use pinning when you want a guarantee that a certain object will always be available from cache, rather than requiring a datastore trip.

Example 10.12.  StoreCache Usage

import org.apache.openjpa.persistence.*;
...
OpenJPAEntityManagerFactory oemf = OpenJPAPersistence.cast(emf);
StoreCache cache = oemf.getStoreCache();
cache.pin(Magazine.class, popularMag.getId());
cache.evict(Magazine.class, changedMag.getId());

See the StoreCache Javadoc for information on additional functionality it provides. Also, Chapter 9, Runtime Extensions discusses OpenJPA's other extensions to the standard set of JPA runtime interfaces.

The examples above include calls to evict to manually remove data from the data cache. Rather than evicting objects from the data cache directly, you can also configure OpenJPA to automatically evict objects from the data cache when you use the OpenJPAEntityManager's eviction APIs.

Example 10.13.  Automatic Data Cache Eviction

<property name="openjpa.BrokerImpl" value="EvictFromDataCache=true"/>
import org.apache.openjpa.persistence.*;

...

OpenJPAEntityManager oem = OpenJPAPersistence.cast(em);
oem.evict(changedMag);  // will evict from data cache also

1.3.  Cache Statistics

Number of requests to read and write requests and hit ratio of the data cache is available via org.apache.openjpa.datacache.CacheStatistics interface. The collection of cache statistics is disabled by default and needs to be enabled on a per cache basis. By default all counts returned from the CacheStatistics interface will return 0.

Example 10.14.  Configuring CacheStatistics

<property name="openjpa.DataCache" value="true(EnableStatistics=true)"/>


Once cache statistics are enabled you can access them via StoreCache

import org.apache.openjpa.datacache.CacheStatistics;
...
OpenJPAEntityManagerFactory oemf = OpenJPAPersistence.cast(emf);
CacheStatistics statistics = oemf.getStoreCache().getCacheStatistics();

The statistics includes number of read and write requests made to the cache since start and last reset. The statistics can be obtained also per class basis.

public interface org.apache.openjpa.datacache.CacheStatistics extends java.io.Serializable{
    // Statistics since last reset
    public long getReadCount();
    public long getHitCount();
    public long getWriteCount();
    
    // Statistics since start
    public long getTotalReadCount();
    public long getTotalHitCount();
    public long getTotalWriteCount();
    
    // Per-Class statistics since last reset
    public long getReadCount(java.lang.Class);
    public long getHitCount(java.lang.Class);
    public long getWriteCount(java.lang.Class);
    
    // Per-Class statistics since start
    public long getTotalReadCount(java.lang.Class);
    public long getTotalHitCount(java.lang.Class);
    public long getTotalWriteCount(java.lang.Class);
    
    // Starting and last reset time 
    public java.util.Date since();
    public java.util.Date start();
    
    // Resets the statistics. 
    public void reset();
    
    // Returns whether or not statistics will be collected.
    public boolean isEnabled();
}

Collecting per-class statistics depends on determining the runtime type of a cached data element, when the given context does not permit determination of exact runtime type, the statistics is registered against generic java.lang.Object. Also each method that accepts Class argument, treats null argument as java.lang.Object

1.4.  Query Cache

In addition to the data cache, the org.apache.openjpa.datacache package defines service provider interfaces for a query cache. The query cache is disabled by default and needs to be enabled separately from the data cache. The query cache stores the object ids returned by query executions. When you run a query, OpenJPA assembles a key based on the query properties and the parameters used at execution time, and checks for a cached query result. If one is found, the object ids in the cached result are looked up, and the resultant persistence-capable objects are returned. Otherwise, the query is executed against the database, and the object ids loaded by the query are put into the cache. The object id list is not cached until the list returned at query execution time is fully traversed.

OpenJPA exposes a high-level interface to the query cache through the org.apache.openjpa.persistence.QueryResultCache class. You can access this class through the OpenJPAEntityManagerFactory.

Example 10.15.  Accessing the QueryResultCache

import org.apache.openjpa.persistence.*;

...

OpenJPAEntityManagerFactory oemf = OpenJPAPersistence.cast(emf);
QueryResultCache qcache = oemf.getQueryResultCache();

The default query cache implementation caches 100 query executions in a least-recently-used cache. This can be changed by setting the cache size in the CacheSize plugin property. Like the data cache, the query cache also has a backing soft reference map. The SoftReferenceSize property controls the size of this map. It is disabled by default.

Example 10.16.  Query Cache Size

<property name="openjpa.QueryCache" value="true(CacheSize=1000, SoftReferenceSize=100)"/>

To disable the query cache (default), set the openjpa.QueryCache property to false:

Example 10.17.  Disabling the Query Cache

<property name="openjpa.QueryCache" value="false"/>

Query Cache's default behaviour on eviction is to evict all the queries from the cache if any of the entities that are in the access path of the query are modified. Scanning through the whole query cache to evict the queries upon an entity update slows down the entity update action. The configurable eviction policy "timestamp" is to track the timestamp of the query and the timestamp of last update for each entity class and compare the timestamps when retrieving the query for reuse. If the timestamp of the query result is older than the last update time of any entity in the access path of the query, the query result would not be reused and the query result would be evicted from the query cache. To configure the EvictPolicy to timestamp, here is an example:

Example 10.18.  Query Cache Eviction Policy

<property name="openjpa.QueryCache" value="true(EvictPolicy='timestamp')"/>

There are certain situations in which the query cache is bypassed:

  • Caching is not used for in-memory queries (queries in which the candidates are a collection instead of a class or Extent).

  • Caching is not used in transactions that have IgnoreChanges set to false and in which modifications to classes in the query's access path have occurred. If none of the classes in the access path have been touched, then cached results are still valid and are used.

  • Caching is not used in pessimistic transactions, since OpenJPA must go to the database to lock the appropriate rows.

  • Caching is not used when the data cache does not have any cached data for an id in a query result.

  • Queries that use persistence-capable objects as parameters are only cached if the parameter is directly compared to field, as in:

    select e from Employee e where e.company.address = :addr
    

    If you extract field values from the parameter in your query string, or if the parameter is used in collection element comparisons, the query is not cached.

  • Queries that result in projections of custom field types or BigDecimal or BigInteger fields are not cached.

Cache results are removed from the cache when instances of classes in a cached query's access path are touched. That is, if a query accesses data in class A, and instances of class A are modified, deleted, or inserted, then the cached query result is dropped from the cache.

It is possible to tell the query cache that a class has been altered. This is only necessary when the changes occur via direct modification of the database outside of OpenJPA's control. You can also evict individual queries, or clear the entire cache.

public void evict(Query q);
public void evictAll(Class cls);
public void evictAll();

For JPA queries with parameters, set the desired parameter values into the Query instance before calling the above methods.

Example 10.19.  Evicting Queries

import org.apache.openjpa.persistence.*;

...

OpenJPAEntityManagerFactory oemf = OpenJPAPersistence.cast(emf);
QueryResultCache qcache = oemf.getQueryResultCache();

// evict all queries that can be affected by changes to Magazines
qcache.evictAll(Magazine.class);

// evict an individual query with parameters
EntityManager em = emf.createEntityManager();
Query q = em.createQuery(...).
    setParameter(0, paramVal0).
    setParameter(1, paramVal1);
qcache.evict(q);

When using one of OpenJPA's distributed cache implementations, it is necessary to perform this in every JVM - the change notification is not propagated automatically. When using a third-party coherent caching solution, it is not necessary to do this in every JVM (although it won't hurt to do so), as the cache results are stored directly in the coherent cache.

Queries can also be pinned and unpinned through the QueryResultCache. The semantics of these operations are the same as pinning and unpinning data from the data cache.

public void pin(Query q);
public void unpin(Query q);

For JPA queries with parameters, set the desired parameter values into the Query instance before calling the above methods.

The following example shows these APIs in action.

Example 10.20.  Pinning, and Unpinning Query Results

import org.apache.openjpa.persistence.*;

...

OpenJPAEntityManagerFactory oemf = OpenJPAPersistence.cast(emf);
QueryResultCache qcache = oemf.getQueryResultCache();
EntityManager em = emf.createEntityManager();

Query pinQuery = em.createQuery(...).
    setParameter(0, paramVal0).
    setParameter(1, paramVal1);
qcache.pin(pinQuery);
Query unpinQuery = em.createQuery(...).
    setParameter(0, paramVal0).
    setParameter(1, paramVal1);
qcache.unpin(unpinQuery);

Pinning data into the cache instructs the cache to not expire the pinned results when cache flushing occurs. However, pinned results will be removed from the cache if an event occurs that invalidates the results.

You can disable caching on a per-EntityManager or per-Query basis:

Example 10.21.  Disabling and Enabling Query Caching

import org.apache.openjpa.persistence.*;

...

// temporarily disable query caching for all queries created from em
OpenJPAEntityManager oem = OpenJPAPersistence.cast(em);
oem.getFetchPlan().setQueryResultCacheEnabled(false);

// re-enable caching for a particular query
OpenJPAQuery oq = oem.createQuery(...);
oq.getFetchPlan().setQueryResultCacheEnabled(true);

1.5.  Cache Extension

The provided data cache classes can be easily extended to add additional functionality. If you are adding new behavior, you should extend org.apache.openjpa.datacache.ConcurrentDataCache. To use your own storage mechanism, extend org.apache.openjpa.datacache.AbstractDataCache (preferred), or implement org.apache.openjpa.datacache.DataCache directly. If you want to implement a distributed cache that uses an unsupported method for communications, create an implementation of org.apache.openjpa.event.RemoteCommitProvider. This process is described in greater detail in Section 2.2, “ Customization ”.

The query cache is just as easy to extend. Add functionality by extending the default org.apache.openjpa.datacache.ConcurrentQueryCache. Implement your own storage mechanism for query results by extending org.apache.openjpa.datacache.AbstractQueryCache (preferred) or implementing the org.apache.openjpa.datacache.QueryCache interface directly.

1.6.  Important Notes

  • The default cache implementations do not automatically refresh objects in other EntityManagers when the cache is updated or invalidated. This behavior would not be compliant with the JPA specification.

  • Invoking OpenJPAEntityManager.evict does not result in the corresponding data being dropped from the data cache, unless you have set the proper configuration options as explained above (see Example 10.13, “ Automatic Data Cache Eviction ”). Other methods related to the EntityManager cache also do not affect the data cache.

    The data cache assumes that it is up-to-date with respect to the datastore, so it is effectively an in-memory extension of the database. To manipulate the data cache, you should generally use the data cache facades presented in this chapter.

1.7.  Known Issues and Limitations

  • When using datastore (pessimistic) transactions in concert with the distributed caching implementations, it is possible to read stale data when reading data outside a transaction.

    For example, if you have two JVMs (JVM A and JVM B) both communicating with each other, and JVM A obtains a data store lock on a particular object's underlying data, it is possible for JVM B to load the data from the cache without going to the datastore, and therefore load data that should be locked. This will only happen if JVM B attempts to read data that is already in its cache during the period between when JVM A locked the data and JVM B received and processed the invalidation notification.

    This problem is impossible to solve without putting together a two-phase commit system for cache notifications, which would add significant overhead to the caching implementation. As a result, we recommend that people use optimistic locking when using data caching. If you do not, then understand that some of your non-transactional data may not be consistent with the datastore.

    Note that when loading objects in a transaction, the appropriate datastore transactions will be obtained. So, transactional code will maintain its integrity.

  • Extents are not cached. So, if you plan on iterating over a list of all the objects in an Extent on a regular basis, you will only benefit from caching if you do so with a Query instead:

    Example 10.22.  Query Replaces Extent

    import org.apache.openjpa.persistence.*;
    
    ...
    
    OpenJPAEntityManager oem = OpenJPAPersistence.cast(em);
    Extent extent = oem.createExtent(Magazine.class, false);
    
    // This iterator does not benefit from caching...
    Iterator uncachedIterator = extent.iterator();
    
    // ... but this one does.
    OpenJPAQuery extentQuery = oem.createQuery(...);
    extentQuery.setSubclasses(false);
    Iterator cachedIterator = extentQuery.getResultList().iterator();