简体   繁体   English

线程问题在Java HashMap中

[英]Threading issues in a Java HashMap

Something happened that I'm not sure should be possible. 发生了一些我不确定应该可行的事情。 Obviously it is, because I've seen it, but I need to find the root cause & I was hoping you all could help. 显然它是,因为我已经看过了,但我需要找到根本原因而且我希望你们都能提供帮助。

We have a system that looks up latitude & longitude for a zipcode. 我们有一个查找纬度和经度的邮政编码系统。 Rather than access it every time, we cache the results in a cheap in-memory HashTable cache, since the lat & long of a zip code tend to change less often than we release. 我们不是每次都访问它,而是将结果缓存在廉价的内存中HashTable缓存中,因为邮政编码的纬度和长度往往比我们发布的更少。

Anyway, the hash is surrounded by a class that has a "get" and "add" method that are both synchronized. 无论如何,哈希被一个具有“get”和“add”方法的类所包围,这两个方法都是同步的。 We access this class as a singleton. 我们以单身形式访问此类。

I'm not claiming this is the best setup, but it's where we're at. 我并不是说这是最好的设置,但它就是我们所处的位置。 (I plan to change to wrap the Map in a Collections.synchronizedMap() call ASAP.) (我计划更改为尽快将Map包装在Collections.synchronizedMap()中。)

We use this cache in a multi-threaded environment, where we thread 2 calls for 2 zips (so we can calculate the distance between the two). 我们在多线程环境中使用此缓存,其中我们为2个拉链进行2次调用(因此我们可以计算两者之间的距离)。 These sometimes happen at very nearly the same time, so its very possible that both calls access the map at the same time. 这些有时几乎同时发生,因此两个调用很可能同时访问地图。

Just recently we had an incident where two different zip codes returned the same value. 就在最近,我们遇到了两个不同邮政编码返回相同值的事件。 Assuming that the initial values were actually different, is there any way that writing the values into the Map would cause the same value to be written for two different keys? 假设初始值实际上是不同的,有没有办法将值写入Map会导致为两个不同的键写入相同的值? Or, is there any way that 2 "gets" could cross wires and accidentally return the same value? 或者,2“获取”是否有任何方式可以穿过电线并意外返回相同的值?

The only other explanation I have is that the initial data was corrupt (wrong values), but it seems very unlikely. 我唯一的另一个解释是初始数据已损坏(错误的值),但似乎不太可能。

Any ideas would be appreciated. 任何想法,将不胜感激。 Thanks, Peter 谢谢,彼得

(PS: Let me know if you need more info, code, etc.) (PS:如果您需要更多信息,代码等,请告诉我)

public class InMemoryGeocodingCache implements GeocodingCache
{

private Map cache = new HashMap();
private static GeocodingCache instance = new InMemoryGeocodingCache();

public static GeocodingCache getInstance()
{
    return instance;
}

public synchronized LatLongPair get(String zip)
{
    return (LatLongPair) cache.get(zip);
}

public synchronized boolean has(String zip)
{
    return cache.containsKey(zip);
}

public synchronized void add(String zip, double lat, double lon)
{
    cache.put(zip, new LatLongPair(lat, lon));
}
}


public class LatLongPair {
double lat;
double lon;

LatLongPair(double lat, double lon)
{
    this.lat = lat;
    this.lon = lon;
}

public double getLatitude()
{
    return this.lat;
}

public double getLongitude()
{
    return this.lon;
}
}

The code looks correct. 代码看起来正确。

The only concern is that lat and lon are package visible, so the following is possible for the same package code: 唯一的问题是lat和lon是包可见的,因此对于相同的包代码可以使用以下内容:

LatLongPair llp = InMemoryGeocodingCache.getInstance().get(ZIP1);
llp.lat = x;
llp.lon = y;

which will obviously modify the in-cache object. 这显然会修改缓存中的对象。

So make lat and lon final too. 所以也让lat和lon决赛。

PS Since your key (zip-code) is unique and small, there is no need to compute hash on every operation. PS由于您的密钥(zip-code)是唯一且小的,因此无需在每个操作上计算哈希值。 It's easier to use TreeMap (wrapped into Collections.synchronizedMap()). 使用TreeMap(包装到Collections.synchronizedMap()中)更容易。

PPS Practical approach: write a test for two threads doing put/get operations in never-ending loop, validating the result on every get. PPS实用方法:为两个线程编写测试,在永不停止的循环中执行put / get操作,在每次获取时验证结果。 You would need a multi-CPU machine for that though. 你需要一台多CPU机器。

Why it's happening is hard to tell. 为什么会发生这种情况很难说。 More code could help. 更多代码可以帮助。

You should probably just be using a ConcurrentHashMap anyway. 你应该只是使用ConcurrentHashMap。 This will be more efficient, in general, than a synchronized Map. 一般来说,这比同步Map更有效。 You don't synchronize access to it, it handles it internally (more efficiently than you could). 您不同步对它的访问,它在内部处理它(比您更有效)。

One thing to look out for is if the key or the value might be changing, for instance if instead of making a new object for each insertion, you're just changing the values of an existing object and re-inserting it. 要注意的一件事是,如果键或值可能正在改变,例如,如果不是为每个插入创建一个新对象,那么您只需更改现有对象的值并重新插入即可。

You also want to make sure that the key object defines both hashCode and equals in such a way that you don't violate the HashMap contract (ie if equals returns true, the hashCodes need to be the same, but not necessarily vice versa). 您还需要确保密钥对象以不违反HashMap契约的方式定义hashCode和equals(即如果equals返回true,则hashCodes需要相同,但不一定相反)。

is it possible the LatLonPair is being modified? 是否有可能修改LatLonPair? I'd suggest making the lat and lon fields final so that they are not accidentally being modified elsewhere in the code. 我建议将lat和lon字段设为final,这样它们就不会在代码中的其他地方被意外修改。

note, you should also make your singleton "instance" and the map reference "cache" final. 请注意,您还应该使您的单例“实例”和地图引用“缓存”最终。

James is correct. 詹姆斯是对的。 Since you are handing back an Object its internals could be modified and anything holding a reference to that Object (Map) will reflect that change. 由于您正在交还一个对象,因此可以修改其内部结构,并且任何持有对该对象(地图)的引用的内容都将反映该更改。 Final is a good answer. 决赛是一个很好的答案。

Here is the java doc on HashMap: 这是HashMap上的java doc:

http://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html http://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html

Note that this implementation is not synchronized. 请注意,此实现不同步。 If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. 如果多个线程同时访问哈希映射,并且至少有一个线程在结构上修改了映射,则必须在外部进行同步。 (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. (结构修改是添加或删除一个或多个映射的任何操作;仅更改与实例已包含的键关联的值不是结构修改。)这通常通过同步自然封装映射的某个对象来完成。 。 If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. 如果不存在此类对象,则应使用Collections.synchronizedMap方法“包装”该映射。 This is best done at creation time, to prevent accidental unsynchronized access to the map: 这最好在创建时完成,以防止意外地不同步访问地图:

Map m = Collections.synchronizedMap(new HashMap(...)); Map m = Collections.synchronizedMap(new HashMap(...));

Or better, use java.util.concurrent.ConcurrentHashMap 或者更好,使用java.util.concurrent.ConcurrentHashMap

I don't really see anything wrong with the code you posted that would cause the problem you described. 我发现您发布的代码没有任何问题,这会导致您所描述的问题。 My guess would be that it's a problem with the client of your geo-code cache that has problems. 我的猜测是,你的地理代码缓存客户端存在问题。

Other things to consider (some of these are pretty obvious, but I figured I'd point them out anyway): 其他需要考虑的事情(其中一些非常明显,但我认为无论如何我都会指出它们):

  1. Which two zip codes were you having problems with? 你有哪两个邮政编码问题? Are you sure they don't have identical geocodes in the source system? 您确定它们在源系统中没有相同的地理编码吗?
  2. Are you sure you aren't accidentally comparing two identical zip codes? 你确定你不小心比较两个相同的邮政编码吗?

The presence of the has(String ZIP) method implies that you have something like the following in your code: has(String ZIP)方法的存在意味着您的代码中包含以下内容:

GeocodingCache cache = InMemoryGeocodingCache.getInstance();

if (!cache.has(ZIP)) {
    cache.add(ZIP, x, y);
}

Unfortunately this opens you up to sync problems between the has() returning false and the add() adding which could result in the issue you described. 不幸的是,这会让你在has()返回false和add()添加之间同步问题,这可能会导致你所描述的问题。

A better solution would be to move the check inside the add method so the check and update are covered by the same lock like: 更好的解决方案是在add方法中移动检查,以便检查和更新由同一个锁覆盖,如:

public synchronized void add(String zip, double lat, double lon) {
    if (cache.containsKey(zip)) return;
    cache.put(zip, new LatLongPair(lat, lon));
}

The other thing I should mention is that if you are using getInstance() as a singleton you should have a private constructor to stop the possibility of additional caches being created using new InMemoryGeocodingCache() . 我应该提到的另一件事是,如果你使用getInstance()作为单例,你应该有一个私有构造函数来阻止使用新的InMemoryGeocodingCache()创建额外的缓存的可能性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM