简体繁体 English

JCR存储库同步API

[英]JCR Repository Synchronization API

原文 2013-12-22 06:59:37 9 2 java/ content-management-system/ jackrabbit/ jcr/ sling

I'm looking for an API to synchronize two different JCR repositories. 我正在寻找一个API来同步两个不同的JCR存储库。

The synchronization will be done frequently (eg each 1 hour). 同步将经常进行（例如每1小时）。
Only specified subtrees must be synchronized. 只有指定的子树必须同步。
A repository is master and another is slave repository. 存储库是主存储库，另一个存储库是从存储库
The slave repository is read-onley and must be accessible in emergencies. 从属存储库是read-onley，必须在紧急情况下可访问。

Is there any API to do such a synchronization operation? 有没有API可以进行这样的同步操作？

Any suggestion is appreciated. 任何建议表示赞赏。

2 个解决方案

I can think of several ways to do this using just the JCR API in any JCR implementation: 在任何JCR实现中只使用JCR API，我可以想到几种方法：

Create and register an event listener on the master repository that monitors events that happen on the specific subtrees of interest, and then record these events in some persisted form (eg, in a queue, file system, a third repository, etc... whatever works best in your environment). 在主存储库上创建并注册事件监听器，监视在特定感兴趣的子树上发生的事件，然后以某种持久化形式记录这些事件（例如，在队列，文件系统，第三个存储库等等......等等在您的环境中效果最佳）。 Then periodically process those recorded events and "replay" them by manipulating the nodes in the slave repository. 然后定期处理这些记录的事件，并通过操纵从属存储库中的节点来“重放”它们。
Create and register an event listener on the master repository that monitors events that happen on the specific subtrees of interest, and then immediately connect to the slave repository and "replay" these events. 在主存储库上创建并注册事件侦听器，以监视在特定感兴趣的子树上发生的事件，然后立即连接到从属存储库并“重放”这些事件。
Periodically connect to the master repository and use the journaling feature (if supported) to obtain what has changed in the master repository since the last time this was done, and then connect to the slave repository and "replay" those events that apply to the specific subtrees of interest. 定期连接到主存储库并使用日记功能（如果支持）获取自上次完成以来主存储库中已更改的内容，然后连接到从属存储库并“重放”适用于特定事件的事件感兴趣的子树。

Another option might be to make the master and slave repositories completely in-sync by clustering them. 另一种选择可能是通过聚类来使主存储库和从属存储库完全同步。 Jackrabbit and ModeShape can both do this, but they both do it completely differently as it is not defined in the JCR specification. Jackrabbit和ModeShape都可以这样做，但它们都完全不同，因为它没有在JCR规范中定义。

For example, with ModeShape (disclosure: I'm the project lead) you can create small clusters of just 2 processes or larger clusters with many processes. 例如，使用ModeShape （披露：我是项目负责人），您可以创建仅包含2个流程的小型集群或具有许多流程的更大集群。 You can choose up front whether each process in the cluster has a complete copy of all of the content (ie, "replicated" and "invalidation" modes) or just some of the content (ie, "distributed" mode). 您可以预先选择群集中的每个进程是否具有所有内容的完整副本（即“已复制”和“无效”模式）或仅包含某些内容（即“分布式”模式）。 See the documentation for details. 有关详细信息，请参阅文档 These clusters can also span multiple sites, helping to increase fault tolerance. 这些集群还可以跨越多个站点，有助于提高容错能力。 ModeShape is elastic, so you can simply add more processes to the cluster at any time, and you can even remove them. ModeShape具有弹性，因此您可以随时向集群添加更多进程，甚至可以删除它们。 The best part is that client applications still just use the JCR API yet see the whole repository content just as they would a non-clustered repository. 最好的部分是客户端应用程序仍然只使用JCR API，但是看到整个存储库内容就像它们是非集群存储库一样。

The (brand new and still unreleased) Apache Sling replication module does that out of the box. （全新且尚未发布的） Apache Sling复制模块开箱即用。 It requires running Sling on top of your repositories, but that's fairly lightweight and brings lots of useful functionality for JCR applications. 它需要在您的存储库之上运行Sling，但这相当轻量级并为JCR应用程序带来了许多有用的功能。