The primary design objective for Slice is to make the user application transparent to the change in storage strategy where data resides in multiple (possibly heterogeneous) databases instead of a single database. Slice achieves this transparency by virtualization of multiple databases as a single database such that OpenJPA object management kernel continues to interact in exactly the same manner with storage layer. Similarly, the existing application or the persistent domain model requires no change to upgrade from a single database to a distributed database environment.
An existing application developed for a single database can be
adapted to work with multiple databases purely by configuring
a persistence unit via META-INF/persistence.xml
.
The primary performance characteristics for Slice is to scale against growing data volume by horizontal partitioning data across many databases.
Slice executes the database operations such as query or flush in parallel across each physical database. Hence, scaling characteristics against data volume are bound by the size of the maximum data partition instead of the size of the entire data set. The use cases where the data is naturally amenable to horizontal partitions, for example, by temporal interval (e.g. Purchase Orders per month) or by geographical regions (e.g. Customer by Zip Code) can derive significant performance benefit and favorable scaling behavior by using Slice.
The queries are executed in parallel across one or more slices and the individual query results are merged into a single list before being returned to the caller application. The merge operation is more complex for the queries that involve sorting and/or specify a range. Slice supports both sorting and range queries.
Slice also supports aggregate queries where the aggregate operation
is commutative to partitioning such as
COUNT()
or MAX()
but not AVG()
.
By default, any query is executed against all available slices.
However, the application can target the query only to a subset of
slices by setting hint on javax.persistence.Query
.
The hint key is openjpa.hint.slice.Target
and
hint value is an array of slice identifiers. The following
example shows how to target a query only to a pair of slices
with logical identifier "One"
and "Two"
.
EntityManager em = ...; em.getTransaction().begin(); String hint = "openjpa.hint.slice.Target"; Query query = em.createQuery("SELECT p FROM PObject") .setHint(hint, new String[]{"One", "Two"}); List result = query.getResultList(); // verify that each instance is originating from the hinted slices for (Object pc : result) { String sliceOrigin = SlicePersistence.getSlice(pc); assertTrue ("One".equals(sliceOrigin) || "Two".equals(sliceOrigin)); }
To confine queries to a subset of slices via setting query hints can be considered
intrusive to existing application. The alternative means of targeting queries is to
configure a Query Target Policy. This policy is configured
via plug-in property openjpa.slice.QueryTargetPolicy
. The
plug-in property is fully-qualified class name of an implementation
for org.apache.openjpa.slice.QueryTargetPolicy
interface.
This interface contract allows a user application to target a query to a subset
of slices based on the query and its bound parameters. The query target policy is consulted
only when no explicit target hint is set on the query. By default, the policy
executes a query on all available slices.
A similar policy interface org.apache.openjpa.slice.FinderTargetPolicy
is available to target queries that originate from find()
by primary key. This finder target policy is consulted
only when no explicit target hint is set on the current fetch plan. By default, the policy
executes a query on all available slices to find an instance by its primary key.
The user application decides how the newly persistent instances be
distributed across the slices. The user application specifies the
data distribution policy by implementing
org.apache.openjpa.slice.DistributionPolicy
.
The DistributionPolicy
interface
is simple with a single method. The complete listing of the
documented interface follows:
public interface DistributionPolicy { /** * Gets the name of the slice where the given newly persistent * instance will be stored. * * @param pc The newly persistent or to-be-merged object. * @param slices name of the configured slices. * @param context persistence context managing the given instance. * * @return identifier of the slice. This name must match one of the * configured slice names. * @see DistributedConfiguration#getSliceNames() */ String distribute(Object pc, List<String> slices, Object context); }
Slice runtime invokes this user-supplied method for the newly
persistent instance that is explicit argument of the
javax.persistence.EntityManager.persist(Object pc)
method. The user application must return a valid slice name from
this method to designate the target slice for the given instance.
The data distribution policy may be based on the attribute
of the data itself. For example, all Customer whose first name
begins with character 'A' to 'M' will be stored in one slice
while names beginning with 'N' to 'Z' will be stored in another
slice. The noteworthy aspect of such policy implementation is
the attribute values that participate in
the distribution policy logic should be set before invoking
EntityManager.persist()
method.
The user application needs to specify the target slice only
for the root instance i.e. the explicit argument for the
EntityManager.persist(Object pc)
method. Slice computes
the transitive closure of the graph i.e. the set of all instances
directly or indirectly reachable from the root instance and stores
them in the same target slice.
Slice tracks the original database for existing instances. When an application issues a query, the resultant instances can be loaded from different slices. As Slice tracks the original slice for each instance, any subsequent update to an instance is committed to the appropriate original database slice.
You can find the original slice of an instance pc
by
the static utility method
SlicePersistence.getSlice(pc)
.
This method returns the slice identifier associated with the
given managed instance. If the instance is not
being managed then the method return null because any unmanaged or
detached instance is not associated with any slice.
While Slice ensures that the transitive closure is stored in the
same slice, there can be data elements that are commonly referred by
many instances such as Country or Currency code. Such quasi-static
master data can be stored as identical copies in multiple slices.
The user application must enumerate the replicated entity type names in
openjpa.slice.ReplicatedTypes
as a comma-separated list
and implement a org.apache.openjpa.slice.ReplicationPolicy
interface. The ReplicationPolicy
interface
is quite similar to DistributionPolicy
interface except it returns an array of target slice names instead
of a single slice.
String[] replicate(Object pc, List<String> slices, Object context);
The default implementation assumes that replicated instances are stored in all available slices. If any such replicated instance is modified then the modification is updated to all target slices to maintain the critical assumption that the state of a replicated instance is identical across all its target slices.
Each slice can be configured independently with its own JDBC driver and other connection parameters. Hence the target database environment can constitute of heterogeneous databases.
The database slices participate in a global transaction provided
each slice is configured with a XA-compliant JDBC driver, even
when the persistence unit is configured for RESOURCE_LOCAL
transaction.
RESOURCE_LOCAL
transaction then each slice is committed without any two-phase
commit protocol. If commit on any slice fails, then atomic nature of
the transaction is not ensured.
No relationship can exist across database slices. In O-R mapping parlance, this condition translates to the limitation that the transitive closure of an object graph must be collocated in the same database. For example, consider a domain model where Person relates to Address. Person X refers to Address A while Person Y refers to Address B. Collocation Constraint means that both X and A must be stored in the same database slice. Similarly Y and B must be stored in a single slice.
Slice, however, helps to maintain collocation constraint automatically. The instances in the closure set of any newly persistent instance reachable via cascaded relationship is stored in the same slice. The user-defined distribution policy requires to supply the slice for the root instance only.