简体   繁体   English

具有良好性能的多图

[英]A multimap with good performance

In my code I have a map that is used heavily, several thousand times in a few seconds. 在我的代码中,我有一张地图被大量使用,几秒钟就会被使用几千次。 Originally I had a TreeMap, but when testing with 9,000 entries I watched my old processor melt. 最初我有一个TreeMap,但在测试9,000个条目时,我看到我的旧处理器融化了。 And this needs to scale. 这需要扩大规模。 So I moved to a HashMap and performance was excellent. 所以我搬到了HashMap,性能非常好。

Now I am changing my design and am looking for a MultiMap. 现在我正在改变我的设计,正在寻找一个MultiMap。 However I'm afraid of the performance impact on the get() side, as it must iterate over said large map picking out the matching keys, and when called many many times even synchronized it seems like it would be slow. 但是我害怕对get()方面的性能影响,因为它必须遍历所述大型地图,挑选匹配的键,并且当被调用多次甚至同步时,它似乎会很慢。

Is there a good MultiMap that can handle such large values with great performance? 是否有一个好的MultiMap可以处理如此大的值并具有出色的性能? Performance is critical in this application, as there could be many large separate maps handling a very large workload, making "small" performance losses very big issues. 性能在此应用程序中至关重要,因为可能有许多大型单独的映射处理非常大的工作负载,使“小”性能损失成为非常大的问题。

Bonus points if it can be extracted to work alone without any dependencies. 如果可以提取单独工作而没有任何依赖性,则可以获得奖励积分。

The one that was recommended to me in one of my questions was the Apache Commons MultiMap: http://commons.apache.org/collections/api-3.2.1/org/apache/commons/collections/MultiHashMap.html 在我的一个问题中向我推荐的是Apache Commons MultiMap: http//commons.apache.org/collections/api-3.2.1/org/apache/commons/collections/MultiHashMap.html

It's free software, so you can at least get the source to look at it, and depending on your license situation, you can modify it or use it standalone. 它是免费软件,因此您至少可以让源代码查看它,并根据您的许可证情况,您可以修改它或单独使用它。

It uses an ArrayList internally, but I imagine you can probably change it to use a HashSet or something. 它在内部使用ArrayList,但我想你可能会改变它以使用HashSet或其他东西。 I would look at the createCollection(Collection coll) method. 我会看一下createCollection(Collection coll)方法。

UPDATE: Actually, Guava's HashMultiMap appears to already be what I was talking about: https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/Multimap.java 更新:实际上,Guava的HashMultiMap似乎已经是我所说的: https//github.com/google/guava/blob/master/guava/src/com/google/common/collect/Multimap.java

I looked at the source and it seems that each collection of values is in fact backed by a HashSet. 我查看了源代码,似乎每个值集合实际上都是由HashSet支持的。

I had a requirement where I had to have a Map<Comparable, Set<Comparable>> where insertion on the Map be concurrent and also on the corresponding Set, but once a Key was consumed from the Map, it had to be deleted, think if as a Job running every two seconds which is consuming the whole Set<Comparable> from an specific Key but insertion be totally concurrent so that most values be buffered when the Job kicks in, here is my implementation: 我有一个要求,我必须有一个Map<Comparable, Set<Comparable>> ,其中Map上的插入是并发的,也是相应的Set,但是一旦Key从Map中消耗掉,就必须将其删除,想一想如果作为一个每两秒运行一次的Job,它从一个特定的Key消耗整个Set<Comparable>但插入完全并发,以便在Job启动时缓冲大多数值,这是我的实现:

Note: I use Guava's helper class Maps to create the concurrent Maps, also, this solution emulates Java concurrency in Practice Listing 5.19 : 注意:我使用Guava的帮助类Maps来创建并发映射,此解决方案在实践清单5.19中模拟Java并发

import com.google.common.collect.MapMaker;

import java.util.concurrent.ConcurrentMap;

/**
 * Created by IntelliJ IDEA.
 * User: gmedina
 * Date: 18-Sep-2012
 * Time: 09:17:50
 */
public class LockMap<K extends Comparable>
{
  private final ConcurrentMap<K, Object> locks;

  public LockMap()
  {
    this(16, 64);
  }

  public LockMap(final int concurrencyLevel)
  {
    this(concurrencyLevel, 64);
  }

  public LockMap(final int concurrencyLevel, final int initialCapacity)
  {
    locks=new MapMaker().concurrencyLevel(concurrencyLevel).initialCapacity(initialCapacity).weakValues().makeMap();
  }

  public Object getLock(final K key)
  {
    final Object object=new Object();
    Object lock=locks.putIfAbsent(key, object);
    return lock == null ? object : lock;
  }

}


import com.google.common.collect.MapMaker;
import com.google.common.collect.Sets;

import java.util.Collection;
import java.util.Set;
import java.util.concurrent.ConcurrentMap;

/**
 * A general purpose Multimap implementation for delayed processing and concurrent insertion/deletes.
 *
 * @param <K> A comparable Key
 * @param <V> A comparable Value
 */
public class ConcurrentMultiMap<K extends Comparable, V extends Comparable>
{
  private final int initialCapacity;
  private final LockMap<K> locks;
  private final ConcurrentMap<K, Set<V>> cache;

  public ConcurrentMultiMap()
  {
    this(16, 64);
  }

  public ConcurrentMultiMap(final int concurrencyLevel)
  {
    this(concurrencyLevel, 64);
  }

  public ConcurrentMultiMap(final int concurrencyLevel, final int initialCapacity)
  {
    this.initialCapacity=initialCapacity;
    cache=new MapMaker().concurrencyLevel(concurrencyLevel).initialCapacity(initialCapacity).makeMap();
    locks=new LockMap<K>(concurrencyLevel, initialCapacity);
  }

  public void put(final K key, final V value)
  {
    synchronized(locks.getLock(key)){
      Set<V> set=cache.get(key);
      if(set == null){
        set=Sets.newHashSetWithExpectedSize(initialCapacity);
        cache.put(key, set);
      }
      set.add(value);
    }
  }

  public void putAll(final K key, final Collection<V> values)
  {
    synchronized(locks.getLock(key)){
      Set<V> set=cache.get(key);
      if(set == null){
        set=Sets.newHashSetWithExpectedSize(initialCapacity);
        cache.put(key, set);
      }
      set.addAll(values);
    }
  }

  public Set<V> remove(final K key)
  {
    synchronized(locks.getLock(key)){
      return cache.remove(key);
    }
  }

  public Set<K> getKeySet()
  {
    return cache.keySet();
  }

  public int size()
  {
    return cache.size();
  }

}

I've been using Google Guava as a replacement to Apache Commons whenever possible... Here's an example with its Multimap's implementation HashMultiMap, and notice that the values of the map is a collection of the values instead of a single reference. 我一直在使用Google Guava作为Apache Commons的替代品......这是Multimap的实现HashMultiMap的一个例子,并注意到map的值是值的集合而不是单个引用。 The method "contains()" is used for the result of get(key). 方法“contains()”用于get(key)的结果。

private Multimap<Phase, ResultingState> phaseResults = HashMultimap.create();

/**
 * @param withState is the state to be verified.
 * @param onPhase is the phase to be verified.
 * @return Whether the given result was reported in the given phase.
 */
public boolean wasReported(ResultingState withState, Phase onPhase) {
    return phaseResults.containsKey(onPhase) && phaseResults.get(onPhase).contains(withState);
}

/**
 * @param resultingState is the resulting state.
 * @return Whether the given resulting state has ever been reported.
 */
public boolean anyReported(ResultingState resultingState) {
    return phaseResults.values().contains(resultingState);
}

The choice would largely depend on what you want to do. 选择在很大程度上取决于你想做什么。 There are many data-structures and some are better than others in specific areas and vice versa. 有许多数据结构,有些在特定领域比其他更好,反之亦然。

I could recommend you potential candidates. 我可以推荐你潜在的候选人。 If it is entirely read, ImmutableMultiMap might be a good fit. 如果完全阅读,ImmutableMultiMap可能是一个不错的选择。

If you need concurrent read/write, then I'd implement my own multimap, perhaps using ConcurrentHashMap and ConcurrentSkipListSet (you need to be careful because the semantics between a synchronized multimap and a multipmap created this way using non-blocking data structures differ). 如果你需要并发读/写,那么我实现自己的multimap,可能使用ConcurrentHashMap和ConcurrentSkipListSet(你需要小心,因为同步多图和使用非阻塞数据结构以这种方式创建的多重图之间的语义不同)。 If you use ConcurrentSkipListSet, you can then use binary search and it's faster than just iterating. 如果使用ConcurrentSkipListSet,则可以使用二进制搜索,它比迭代更快。

If you have a lot of rows, you could also start by just using a ConcurrentHashMap and a synchronized list. 如果你有很多行,你也可以从使用ConcurrentHashMap和同步列表开始。 That could significantly reduce the contention, which might be enough to resolve your performance problem and it's simple. 这可以显着减少争用,这可能足以解决您的性能问题,而且很简单。

When you mention that you "iterate over said large map picking out the matching keys", that makes me wonder whether you're using the best data structure. 当你提到你“遍历所说的大地图挑选出匹配的密钥”时,这让我想知道你是否在使用最好的数据结构。 Is there a way you could avoid that iteration? 有没有办法可以避免这种迭代?

Note that Guava includes multiple multimap implementations with different performance characteristics. 请注意,Guava包含多个具有不同性能特征的多图实现。 As Zwei mentioned, an ImmutableMultimap has better performance than the mutable multimaps. 正如Zwei所提到的,ImmutableMultimap比可变多重映射具有更好的性能。 The SetMultimaps are faster if your code checks whether the multimap contains a particular value; 如果您的代码检查multimap是否包含特定值,则SetMultimaps会更快; otherwise an ArrayListMultimap performs better. 否则ArrayListMultimap表现更好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM