在Wildfly 9 AS中实现Infinispan缓存（考虑集群）

Question

The situation: 情况：

I have a clearing table with multiple thousands of records. 我有一个包含数千条记录的清算表。 They are split into packages of eg 500 records. 它们被分成例如500条记录的包。 Then each packet is sent to the AS via Message Driven Beans. 然后，每个数据包通过Message Driven Beans发送到AS。 The AS calculates a key depending on the contents (eg currency, validStart, validEnd) of each record and needs to store this key in the database (together withe the combination of the contents). AS根据每条记录的内容（例如currency，validStart，validEnd）计算密钥，并且需要将该密钥存储在数据库中（与内容的组合一起）。

The request: 请求：

To avoid duplicates i want a centralized "tool" which calculates the key and stores them and thus reduces communication with the database by caching those keys with the records. 为了避免重复，我想要一个集中的“工具”，它计算密钥并存储它们，从而减少与数据库的通信，方法是用记录缓存这些密钥。

Now I tried to use a local Infinispan cache accessed in a Utility-class-implementation for each package-processing-thread. 现在，我尝试在每个包处理线程的Utility-class-implementation中使用本地Infinispan缓存。 This resulted in the fact, that multiple packages calculated the same key and thus duplicates were inserted in the database. 这导致了这样的事实：多个包计算了相同的密钥，因此在数据库中插入了重复的密钥。 Or sometimes I got deadlocks. 或者有时我遇到了僵局。

I tried to implement a "lock" via a static variable to block access for the cache during a database insert, but without success. 我尝试通过静态变量实现“锁定”以在数据库插入期间阻止对缓存的访问，但没有成功。 Next attempt was to use a replicated- respectively distributed-Infinispan cache. 下一次尝试是使用复制的分布式Infinispan缓存。 This did not change the results in AS behavior. 这并没有改变AS行为的结果。

My last idea would be to implement as a bean managed singleton session bean to acquire a transaction lock during inserting into the database. 我的最后一个想法是实现作为bean管理的单例会话bean，以在插入数据库期间获取事务锁。

The AS currently runs in standalone mode, but will be moved to a cluster in near future, so a High Availability solution is preferred. AS当前以独立模式运行，但在不久的将来将被移动到集群，因此首选高可用性解决方案。

Resuming: 恢复：

What's the correct way to lock Infinispan cache access during creation of (Key, Value) pairs to avoid duplicates? 在创建（Key，Value）对期间锁定Infinispan缓存访问的正确方法是什么，以避免重复？

Update: 更新：

@cruftex: My Request is: I have a set of (Key, Value) pairs, which shall be cached. @cruftex：我的请求是：我有一组（Key，Value）对，应该被缓存。 If an insert of a new record should happen, then an algorithm is applied to it and the Key is calculated. 如果要发生新记录的插入，则对其应用算法并计算密钥。 Then the cache shall be checked if the key already exists and the Value will be appended to the new record. 然后，如果密钥已经存在，则应检查缓存，并将Value附加到新记录。 But if the Value does not exist, it shall be created and stored in the database. 但如果Value不存在，则应创建并存储在数据库中。

The cache needs to be realized using Infinispan because the AS shall run in a cluster. 需要使用Infinispan实现缓存，因为AS应在集群中运行。 The algorithm for creating the Keys exists. 存在用于创建密钥的算法。 Inserting the Value in the database too (via JDBC or Entities). 也可以在数据库中插入Value（通过JDBC或实体）。 But i have the problem, that using Message Driven Beans (and thus multithreading in the AS) the same (Key, Value) Pair is calculated in different threads and thus each thread tries to insert the Values in the database (which i want to avoid!). 但我有问题，使用消息驱动Bean（以及AS中的多线程）相同（密钥，值）对在不同的线程中计算，因此每个线程尝试在数据库中插入值（我想避免！）。

@Dave: @戴夫：

public class Cache {
     private static final Logger log = Logger.getLogger(Cache.class);
     private final Cache<Key, FullValueViewer> fullCache;
     private HomeCache homes;        // wraps EntityManager
     private final Session session;

     public Cache(Session session, EmbeddedCacheManager cacheContainer, HomeCache homes) {
         this.session = session;
         this.homes = homes;
         fullCache = cacheContainer.getCache(Const.CACHE_CONDCOMBI);
     } 

     public Long getId(FullValueViewer viewerWithoutId) {
         Long result = null;

         final Key key = new Key(viewerWithoutId);
         FullValueViewer view = fullCache.get(key);

         if(view == null) {
             view = checkDatabase(viewerWithoutId);
             if(view != null) {
                 fullCache.put(key, view);
             }
         }

         if(view == null) {
             view = createValue(viewerWithoutId);

             // 1. Try
             fullCache.put(key, view);

             // 2. Try
             //      if(!fullCache.containsKey(key)) {
             //           fullCache.put(key, view);
             //       } else {
             //           try {
             //               homes.condCombi().remove(view.idnr);
             //           } catch (Exception e) {
             //               log.error("remove", e);
             //           }
             //       }

             // 3. Try
             //       synchronized(fullCache) {
             //           view = createValue(viewerWithoutId);
             //           fullCache.put(key, view);
             //       }
         }
         result = view.idnr;
         return result;
     }

     private FullValueViewer checkDatabase(FullValueViewer newView) {
         FullValueViewer result = null;
         try {
             CondCombiBean bean = homes.condCombi().findByTypeAndKeys(_parameters_);
             result = bean.getAsView();
         } catch (FinderException e) {
         }
         return result;
     }

     private FullValueViewer createValue(FullValueViewer newView) {
         FullValueViewer result = null;
         try {
             CondCombiBean bean = homes.condCombi().create(session.subpk);
             bean.setFromView(newView);
             result = bean.getAsView();
         } catch (Exception e) {
             log.error("createValue", e);
         }
         return result;
     }

     private class Key {

         private final FullValueViewer view;

         public Key(FullValueViewer v) {
            this.view = v;
         }

         @Override
         public int hashCode() {
             _omitted_
         }

         @Override
         public boolean equals(Object obj) {
             _omitted_
         }
     }
 }

The cache configurations i tried with Wildfly: 我尝试使用Wildfly的缓存配置：

<cache-container name="server" default-cache="default" module="org.wildfly.clustering.server">
   <local-cache name="default">
      <transaction mode="BATCH"/>
   </local-cache>
</cache-container>

<cache-container name="server" default-cache="default" module="org.wildfly.clustering.server">
   <transport lock-timeout="60000"/>
   <distributed-cache name="default" mode="ASYNC"/>
</cache-container>

Answer 1

I'll react only to the resume question: 我只会对简历问题作出反应：

You can't lock whole cache; 你无法锁定整个缓存; that wouldn't scale. 那不会扩展。 The best way would be to use cache.putIfAbsent(key, value) operation, and generate different key if the entry is already there (or use list as value and replace it using conditional cache.replace(key, oldValue, newValue) ). 最好的方法是使用cache.putIfAbsent(key, value)操作，如果条目已经存在则生成不同的键（或使用list作为值并使用条件cache.replace(key, oldValue, newValue)替换它）。

If you want to really prohibit writes to some key, you can use transactional cache with pessimistic locking strategy, and issue cache.getAdvancedCache().lock(key) . 如果要真正禁止对某些键进行写入，可以使用具有悲观锁定策略的事务高速缓存，并发出cache.getAdvancedCache().lock(key) 。 Note that there's no unlock: all locks are released when the transaction is committed/rolled back through transaction manager. 请注意，没有解锁：当事务通过事务管理器提交/回滚时，将释放所有锁。

Answer 2

You cannot generate your own key and use it to detect duplicates at the same time. 您无法生成自己的密钥并使用它来同时检测重复项。

Either each data row is guaranteed to arrive only once, or it needs embodied a unique identifier from the external system that generates it. 每个数据行都保证只到达一次，或者它需要体现来自生成它的外部系统的唯一标识符。

If there is a unique identifier in the data, which, if all goes wrong, and no id is in there, is just all properties concatenated, then you need to use this to check for duplicates. 如果数据中有唯一的标识符，如果全部出错，并且没有id，那么只有所有属性连接在一起，那么您需要使用它来检查重复项。

Now you can go with that unique identifier directly, or generate an own internal identifier. 现在，您可以直接使用该唯一标识符，或生成自己的内部标识符。 If you do the latter, you need a translation from the external id to the internal id. 如果您执行后者，则需要从外部ID到内部ID的转换。

If duplicates arrive, you need to lock based on the external id when you generate the internal id, and then record what internal id you assigned. 如果重复到达，则需要在生成内部标识时根据外部标识锁定，然后记录您分配的内部标识。

To generate a unique sequence of long values, in a cluster, you can use the CAS-operations of the cache. 要在群集中生成唯一的长值序列，可以使用缓存的CAS操作。 For example something like this: 例如这样的事情：

@NotThreadSafe
class KeyGeneratorForOneThread {

  final String KEY = "keySequenceForXyRecords";
  final int INTERVAL = 100;
  Cache<String,Long> cache = ...;
  long nextKey = 0;
  long upperBound = -1;

  void requestNewInterval() {
    do {
      nextKey = cache.get(KEY);
      upperBound = nextKey + INTERVAL;
    } while (!cache.replace(KEY, nextKey, upperBound));
  } 

  long generateKey() {
    if (nextKey >= upperBound) {
     requestNewInterval();
    }
    return nextKey++;
  }
}

Every thread has its own key generator and would generate 100 keys without needing coordination. 每个线程都有自己的密钥生成器，无需协调即可生成100个密钥。

You may need separate caches for: 您可能需要单独的缓存：

locking by external id 通过外部ID锁定
lookup from external to internal id 从外部查找到内部ID
sequence number, attention that is actually not a cache, since it must know the last number after a restart 序列号，注意实际上不是缓存，因为它必须知道重启后的最后一个数字
internal id to data 内部id到数据

Answer 3

We found a solution that works in our case and might be helpful for somebody else out there: 我们找到了一个适用于我们案例的解决方案，可能对其他人有帮助：

We have two main components, a cache-class and a singleton bean. 我们有两个主要组件，一个缓存类和一个单例bean。
The cache contains a copy of all records currently present in the database and a lot of logic. 缓存包含当前存在于数据库中的所有记录的副本以及许多逻辑。
The singleton bean has access to the infinispan-cache and is used for creating new records. 单例bean可以访问infinispan-cache并用于创建新记录。

Initialy the cache fetches a copy of the infinispan-cache from the singleton bean. 最初，缓存从singleton bean中获取infinispan-cache的副本。 Then, if we search a record in the cache, we first apply a kind of hash-method, which calculates a unqiue key for the record. 然后，如果我们在缓存中搜索记录，我们首先应用一种哈希方法，它计算记录的unqiue密钥。 Using this key we can identify, if the record needs to be added to the database. 如果需要将记录添加到数据库中，我们可以使用此密钥进行识别。 If so, then the cache calls the singleton bean using a create-method with a @Lock(WRITE) Annotation. 如果是，则缓存使用带有@Lock（WRITE）Annotation的create-method调用singleton bean。 The create method first checks, if the value is contained in the infinispan-cache and if not, it creates a new record. 如果值包含在infinispan-cache中，create方法首先检查，如果不包含，则创建新记录。

Using this approach we can guarantee, that even if the cache is used in multiple threads and each thread sends a request to create the same record in the database, the create process is locked and all following requests won't be proceeded because the value was already created in a previous request. 使用这种方法，我们可以保证，即使缓存在多个线程中使用，并且每个线程发送请求以在数据库中创建相同的记录，创建过程也会被锁定，并且所有后续请求都不会继续，因为值是已在先前的请求中创建。

在Wildfly 9 AS中实现Infinispan缓存（考虑集群）

问题描述

3 个解决方案

解决方案1
0 2016-06-15 07:29:26

解决方案2
0 2016-06-15 10:38:35

解决方案3
0 2016-09-13 09:56:12

在Wildfly 9 AS中实现Infinispan缓存（考虑集群）

问题描述

3 个解决方案

解决方案1 0 2016-06-15 07:29:26

解决方案2 0 2016-06-15 10:38:35

解决方案3 0 2016-09-13 09:56:12

解决方案1
0 2016-06-15 07:29:26

解决方案2
0 2016-06-15 10:38:35

解决方案3
0 2016-09-13 09:56:12