2. Salient Features

2. Salient Features
Prev	Chapter 13. Slice: Distributed Persistence	Next

2.1. Transparency

The primary design objective for Slice is to make the user application transparent to the change in storage strategy where data resides in multiple (possibly heterogeneous) databases instead of a single database. Slice achieves this transparency by virtualization of multiple databases as a single database such that OpenJPA object management kernel continues to interact in exactly the same manner with storage layer. Similarly, the existing application or the persistent domain model requires no change to upgrade from a single database to a distributed database environment.

An existing application developed for a single database can be adapted to work with multiple databases purely by configuring a persistence unit via META-INF/persistence.xml.

2.2. Scaling

The primary performance characteristics for Slice is to scale against growing data volume by horizontal partitioning data across many databases.

Slice executes the database operations such as query or flush in parallel across each physical database. Hence, scaling characteristics against data volume are bound by the size of the maximum data partition instead of the size of the entire data set. The use cases where the data is naturally amenable to horizontal partitions, for example, by temporal interval (e.g. Purchase Orders per month) or by geographical regions (e.g. Customer by Zip Code) can derive significant performance benefit and favorable scaling behavior by using Slice.

2.3. Distributed Query

The queries are executed in parallel across one or more slices and the individual query results are merged into a single list before being returned to the caller application. The merge operation is more complex for the queries that involve sorting and/or specify a range. Slice supports both sorting and range queries.

Slice also supports aggregate queries where the aggregate operation is commutative to partitioning such as COUNT() or MAX() but not AVG().

By default, any query is executed against all available slices. However, the application can target the query only to a subset of slices by setting hint on javax.persistence.Query. The hint key is openjpa.hint.slice.Target and hint value is an array of slice identifiers. The following example shows how to target a query only to a pair of slices with logical identifier "One" and "Two".

              EntityManager em = ...;
              em.getTransaction().begin();
              String hint = "openjpa.hint.slice.Target";
              Query query = em.createQuery("SELECT p FROM PObject")
				              .setHint(hint, new String[]{"One", "Two"});
              List result = query.getResultList();
              // verify that each instance is originating from the hinted slices
              for (Object pc : result) {
                 String sliceOrigin = SlicePersistence.getSlice(pc);
                 assertTrue ("One".equals(sliceOrigin) || "Two".equals(sliceOrigin));
              }

To confine queries to a subset of slices via setting query hints can be considered intrusive to existing application. The alternative means of targeting queries is to configure a Query Target Policy. This policy is configured via plug-in property openjpa.slice.QueryTargetPolicy. The plug-in property is fully-qualified class name of an implementation for org.apache.openjpa.slice.QueryTargetPolicy interface. This interface contract allows a user application to target a query to a subset of slices based on the query and its bound parameters. The query target policy is consulted only when no explicit target hint is set on the query. By default, the policy executes a query on all available slices.

A similar policy interface org.apache.openjpa.slice.FinderTargetPolicy is available to target queries that originate from find() by primary key. This finder target policy is consulted only when no explicit target hint is set on the current fetch plan. By default, the policy executes a query on all available slices to find an instance by its primary key.

2.4. Data Distribution

The user application decides how the newly persistent instances be distributed across the slices. The user application specifies the data distribution policy by implementing org.apache.openjpa.slice.DistributionPolicy. The DistributionPolicy interface is simple with a single method. The complete listing of the documented interface follows:

 
		    
			public interface DistributionPolicy {
			/**
			 * Gets the name of the slice where the given newly persistent 
			 * instance will be stored.
			 *  
			 * @param pc The newly persistent or to-be-merged object. 
			 * @param slices name of the configured slices.
			 * @param context persistence context managing the given instance.
			 * 
			 * @return identifier of the slice. This name must match one of the
			 * configured slice names. 
			 * @see DistributedConfiguration#getSliceNames()
			 */
			String distribute(Object pc, List<String> slices, Object context);
			}

Slice runtime invokes this user-supplied method for the newly persistent instance that is explicit argument of the javax.persistence.EntityManager.persist(Object pc) method. The user application must return a valid slice name from this method to designate the target slice for the given instance. The data distribution policy may be based on the attribute of the data itself. For example, all Customer whose first name begins with character 'A' to 'M' will be stored in one slice while names beginning with 'N' to 'Z' will be stored in another slice. The noteworthy aspect of such policy implementation is the attribute values that participate in the distribution policy logic should be set before invoking EntityManager.persist() method.

The user application needs to specify the target slice only for the root instance i.e. the explicit argument for the EntityManager.persist(Object pc) method. Slice computes the transitive closure of the graph i.e. the set of all instances directly or indirectly reachable from the root instance and stores them in the same target slice.

Slice tracks the original database for existing instances. When an application issues a query, the resultant instances can be loaded from different slices. As Slice tracks the original slice for each instance, any subsequent update to an instance is committed to the appropriate original database slice.

Note

You can find the original slice of an instance pc by the static utility method SlicePersistence.getSlice(pc). This method returns the slice identifier associated with the given managed instance. If the instance is not being managed then the method return null because any unmanaged or detached instance is not associated with any slice.

2.5. Data Replication

While Slice ensures that the transitive closure is stored in the same slice, there can be data elements that are commonly referred by many instances such as Country or Currency code. Such quasi-static master data can be stored as identical copies in multiple slices. The user application must enumerate the replicated entity type names in openjpa.slice.ReplicatedTypes as a comma-separated list and implement a org.apache.openjpa.slice.ReplicationPolicy interface. The ReplicationPolicy interface is quite similar to DistributionPolicy interface except it returns an array of target slice names instead of a single slice.

 
             
			 String[] replicate(Object pc, List<String> slices, Object context);

The default implementation assumes that replicated instances are stored in all available slices. If any such replicated instance is modified then the modification is updated to all target slices to maintain the critical assumption that the state of a replicated instance is identical across all its target slices.

2.6. Heterogeneous Database

Each slice can be configured independently with its own JDBC driver and other connection parameters. Hence the target database environment can constitute of heterogeneous databases.

2.7. Distributed Transaction

The database slices participate in a global transaction provided each slice is configured with a XA-compliant JDBC driver, even when the persistence unit is configured for RESOURCE_LOCAL transaction.

Warning

If any of the configured slices is not XA-compliant and the persistence unit is configured for RESOURCE_LOCAL transaction then each slice is committed without any two-phase commit protocol. If commit on any slice fails, then atomic nature of the transaction is not ensured.

2.8. Collocation Constraint

No relationship can exist across database slices. In O-R mapping parlance, this condition translates to the limitation that the transitive closure of an object graph must be collocated in the same database. For example, consider a domain model where Person relates to Address. Person X refers to Address A while Person Y refers to Address B. Collocation Constraint means that both X and A must be stored in the same database slice. Similarly Y and B must be stored in a single slice.

Slice, however, helps to maintain collocation constraint automatically. The instances in the closure set of any newly persistent instance reachable via cascaded relationship is stored in the same slice. The user-defined distribution policy requires to supply the slice for the root instance only.