Chapter 15. Optimization Guidelines

Chapter 15. Optimization Guidelines
Prev	Part 3. Reference Guide	Next

There are numerous techniques you can use in order to ensure that OpenJPA operates in the fastest and most efficient manner. Following are some guidelines. Each describes what impact it will have on performance and scalability. Note that general guidelines regarding performance or scalability issues are just that - guidelines. Depending on the particular characteristics of your application, the optimal settings may be considerably different than what is outlined below.

In the following table, each row is labeled with a list of italicized keywords. These keywords identify what characteristics the row in question may improve upon. Many of the rows are marked with one or both of the performance and scalability labels. It is important to bear in mind the differences between performance and scalability (for the most part, we are referring to system-wide scalability, and not necessarily only scalability within a single JVM). The performance-related hints will probably improve the performance of your application for a given user load, whereas the scalability-related hints will probably increase the total number of users that your application can service. Sometimes, increasing performance will decrease scalability, and vice versa. Typically, options that reduce the amount of work done on the database server will improve scalability, whereas those that push more work onto the server will have a negative impact on scalability.

Table 15.1. Optimization Guidelines

Plugin in a Connection Pool performance, scalability	OpenJPA's built-in datasource does not perform connection pooling or prepared statement caching. Plugging in a third-party pooling datasource may drastically improve performance.
Optimize database indexes performance, scalability	The default set of indexes created by OpenJPA's mapping tool may not always be the most appropriate for your application. Manually setting indexes in your mapping metadata or manually manipulating database indexes to include frequently-queried fields (as well as dropping indexes on rarely-queried fields) can yield significant performance benefits. A database must do extra work on insert, update, and delete to maintain an index. This extra work will benefit selects with WHERE clauses, which will execute much faster when the terms in the WHERE clause are appropriately indexed. So, for a read-mostly application, appropriate indexing will slow down updates (which are rare) but greatly accelerate reads. This means that the system as a whole will be faster, and also that the database will experience less load, meaning that the system will be more scalable. Bear in mind that over-indexing is a bad thing, both for scalability and performance, especially for applications that perform lots of inserts, updates, or deletes.
JVM optimizations performance, reliability	Manipulating various parameters of the Java Virtual Machine (such as hotspot compilation modes and the maximum memory) can result in performance improvements. For more details about optimizing the JVM execution environment, please see http://java.sun.com/docs/hotspot/HotSpotFAQ.html.
Use the data cache performance, scalability	Using OpenJPA's data and query caching features can often result in a dramatic improvement in performance. Additionally, these caches can significantly reduce the amount of load on the database, increasing the scalability characteristics of your application.
Set `LargeTransaction` to true, or set `PopulateDataCache` to false performance vs. scalability	When using OpenJPA's data caching features in a transaction that will delete, modify, or create a very large number of objects you can set `LargeTransaction` to true and perform periodic flushes during your transaction to reduce its memory requirements. See the Javadoc: OpenJPAEntityManager.setTrackChangesByType. Note that transactions in large mode have to more aggressively flush items from the data cache. If your transaction will visit objects that you know are very unlikely to be accessed by other transactions, for example an exhaustive report run only once a month, you can turn off population of the data cache so that the transaction doesn't fill the entire data cache with objects that won't be accessed again. Again, see the Javadoc: OpenJPAEntityManager.setPopulateDataCache
Run the OpenJPA enhancer on your persistent classes, either at build-time or deploy-time. performance, scalability, memory footprint	OpenJPA performs best when your persistent classes have been run through the OpenJPA post-compilation bytecode enhancer. When dealing with enhanced classes, OpenJPA can make a number of assumptions that reduce memory footprint and accelerate persistent data access. When evaluating OpenJPA's performance, build-time or deploy-time enhancement should be enabled. See Section 2, “ Enhancement ” for details.
Disable logging, performance tracking performance	Developer options such as verbose logging and the JDBC performance tracker can result in serious performance hits for your application. Before evaluating OpenJPA's performance, these options should all be disabled.
Set `IgnoreChanges` to true, or set `FlushBeforeQueries` to true performance vs. scalability	When both the `openjpa.IgnoreChanges` and `openjpa.FlushBeforeQueries` properties are set to false, OpenJPA needs to consider in-memory dirty instances during queries. This can sometimes result in OpenJPA needing to evaluate the entire extent objects in order to return the correct query results, which can have drastic performance consequences. If it is appropriate for your application, configuring `FlushBeforeQueries` to automatically flush before queries involving dirty objects will ensure that this never happens. Setting `IgnoreChanges` to false will result in a small performance hit even if `FlushBeforeQueries` is true, as incremental flushing is not as efficient overall as delaying all flushing to a single operation during commit. Setting `IgnoreChanges` to `true` will help performance, since dirty objects can be ignored for queries, meaning that incremental flushing or client-side processing is not necessary. It will also improve scalability, since overall database server usage is diminished. On the other hand, setting `IgnoreChanges` to `false` will have a negative impact on scalability, even when using automatic flushing before queries, since more operations will be performed on the database server.
Configure `openjpa.ConnectionRetainMode` appropriately performance vs. scalability	The `ConnectionRetainMode` configuration option controls when OpenJPA will obtain a connection, and how long it will hold that connection. The optimal settings for this option will vary considerably depending on the particular behavior of your application. You may even benefit from using different retain modes for different parts of your application. The default setting of `on-demand` minimizes the amount of time that OpenJPA holds onto a datastore connection. This is generally the best option from a scalability standpoint, as database resources are held for a minimal amount of time. However, if you are not using connection pooling, or if your `DataSource` is not efficient at managing its pool, then this default value could cause undesirable pool contention.
Use flat inheritance performance, scalability vs. disk space	Mapping inheritance hierarchies to a single database table is faster for most operations than other strategies employing multiple tables. If it is appropriate for your application, you should use this strategy whenever possible. However, this strategy will require more disk space on the database side. Disk space is relatively inexpensive, but if your object model is particularly large, it can become a factor.
High sequence increment performance, scalability	For applications that perform large bulk inserts, the retrieval of sequence numbers can be a bottleneck. Increasing sequence increments and using table-based rather than native database sequences can reduce or eliminate this bottleneck. In some cases, implementing your own sequence factory can further optimize sequence number retrieval.
Use optimistic transactions performance, scalability	Using datastore transactions translates into pessimistic database row locking, which can be a performance hit (depending on the database). If appropriate for your application, optimistic transactions are typically faster than datastore transactions. Optimistic transactions provide the same transactional guarantees as datastore transactions, except that you must handle a potential optimistic verification exception at the end of a transaction instead of assuming that a transaction will successfully complete. In many applications, it is unlikely that different concurrent transactions will operate on the same set of data at the same time, so optimistic verification increases the concurrency, and therefore both the performance and scalability characteristics, of the application. A common approach to handling optimistic verification exceptions is to simply present the end user with the fact that concurrent modifications happened, and require that the user redo any work.
Use query aggregates and projections performance, scalability	Using aggregates to compute reporting data on the database server can drastically speed up queries. Similarly, using projections when you are interested in specific object fields or relations rather than the entire object state can reduce the amount of data OpenJPA must transfer from the database to your application.
Always close resources scalability	Under certain settings, `EntityManager` s, OpenJPA `Extent` iterators, and `Query` results may be backed by resources in the database. For example, if you have configured OpenJPA to use scrollable cursors and lazy object instantiation by default, each query result will hold open a `ResultSet` object, which, in turn, will hold open a `Statement` object (preventing it from being re-used). Garbage collection will clean up these resources, so it is never necessary to explicitly close them, but it is always faster if it is done at the application level.
Use detached state managers performance	Attaching and even persisting instances can be more efficient when your detached objects use detached state managers. By default, OpenJPA does not use detached state managers when serializing an instance across tiers. See Section 1.3, “ Defining the Detached Object Graph ” for how to force OpenJPA to use detached state managers across tiers, and for other options for more efficient attachment. The downside of using a detached state manager across tiers is that your enhanced persistent classes and the OpenJPA libraries must be available on the client tier.
Utilize the `EntityManager` cache performance, scalability	When possible and appropriate, re-using `EntityManager`s and setting the `RetainState` configuration option to `true` may result in significant performance gains, since the `EntityManager`'s built-in object cache will be used.
Enable multithreaded operation only when necessary performance	OpenJPA respects the `openjpa.Multithreaded` option in that it does not impose as much synchronization overhead for applications that do not set this value to `true`. If your application is guaranteed to only use single-threaded access to OpenJPA resources and persistent objects, leaving this option as `false` will reduce synchronization overhead, and may result in a modest performance increase.
Enable large data set handling performance, scalability	If you execute queries that return large numbers of objects or have relations (collections or maps) that are large, and if you often only access parts of these data sets, enabling large result set handling where appropriate can dramatically speed up your application, since OpenJPA will bring the data sets into memory from the database only as necessary.
Disable large data set handling performance, scalability	If you have enabled scrollable result sets and on-demand loading but you do not require it, consider disabling it again. Some JDBC drivers and databases (SQL Server for example) are much slower when used with scrolling result sets.
Use the `DynamicSchemaFactory` performance, validation	If you are using an `openjpa.jdbc.SchemaFactory` setting of something other than the default of `dynamic`, consider switching back. While other factories can ensure that object-relational mapping information is valid when a persistent class is first used, this can be a slow process. Though the validation is only performed once for each class, switching back to the `DynamicSchemaFactory` can reduce the warm-up time for your application.
Do not use XA transactions performance, scalability	XA transactions can be orders of magnitude slower than standard transactions. Unless distributed transaction functionality is required by your application, use standard transactions. Recall that XA transactions are distinct from managed transactions - managed transaction services such as that provided by EJB declarative transactions can be used both with XA and non-XA transactions. XA transactions should only be used when a given business transaction involves multiple different transactional resources (an Oracle database and an IBM transactional message queue, for example).
Use `Set`s instead of `List/Collection`s performance, scalability	There is a small amount of extra overhead for OpenJPA to maintain collections where each element is not guaranteed to be unique. If your application does not require duplicates for a collection, you should always declare your fields to be of type `Set, SortedSet, HashSet,` or `TreeSet`.
Use query parameters instead of encoding search data in filter strings performance	If your queries depend on parameter data only known at runtime, you should use query parameters rather than dynamically building different query strings. OpenJPA performs aggressive caching of query compilation data, and the effectiveness of this cache is diminished if multiple query filters are used where a single one could have sufficed.
Tune your fetch groups appropriately performance, scalability	The fetch groups used when loading an object control how much data is eagerly loaded, and by extension, which fields must be lazily loaded at a future time. The ideal fetch group configuration loads all the data that is needed in one fetch, and no extra fields - this minimizes both the amount of data transferred from the database, and the number of trips to the database. If extra fields are specified in the fetch groups (in particular, large fields such as binary data, or relations to other persistence-capable objects), then network overhead (for the extra data) and database processing (for any necessary additional joins) will hurt your application's performance. If too few fields are specified in the fetch groups, then OpenJPA will have to make additional trips to the database to load additional fields as necessary.
Use eager fetching performance, scalability	Using eager fetching when loading subclass data or traversing relations for each instance in a large collection of results can speed up data loading by orders of magnitude.
Disable BrokerImpl finalization performance, scalability	Outside of a Java EE 5 application server or other JPA persistence container, OpenJPA's EntityManagers use finalizers to ensure that resources get cleaned up. If you are properly managing your resources, this finalization is not necessary, and will introduce unneeded synchronization, leading to scalability problems. You can disable this protective behavior by setting the `openjpa.BrokerImpl` property to `non-finalizing`. See Section 1.1, “ Broker Finalization ” for details.
Preload MetaDataRepository scalability	By default, the MetaDataRepository is lazily loaded which means that fair amounts of locking is used to ensure that metadata is processed properly. Enabling preloading allows OpenJPA to load metadata upfront and remove locking. See Section 2, “Metadata Repository” for details.

Prev	Up	Next
Chapter 14. Third Party Integration	Home	Part 4. Appendices