简体   繁体   English

字符串键不会从 HashMap 中删除

[英]String key is not removed from HashMap

I have a database with urls and their md5 code stored.我有一个数据库,其中存储了 url 和它们的 md5 代码。 I need to check if links from a set of urls already present in db.我需要检查来自一组 url 的链接是否已经存在于 db 中。 I have the following piece of code, which is run in multiple threads.我有以下一段代码,它在多个线程中运行。 Each tread is for specific md5 key, which is the first three digits of url's md5:每个tread对应特定的md5 key,即url的md5的前三位:

String[] urls = new String[linksMap.size()];
String[] md5s = new String[linksMap.size()];
boolean expectingMittcom = false;
boolean hadMittcom = false;
int i = 0;
for (String url : linksMap.keySet()) {
  urls[i] = url;
  if (url.equals("http://mittcom.com/")) {
    expectingMittcom = true;
  }
  md5s[i++] = linksMap.get(url).variationMd5;
}

int offset = 0;
while (offset < urls.length) {

  Array arMd5 = pqm.getConnection().createArrayOf("text",
                Arrays.copyOfRange(md5s, offset, Math.min(offset + MAX_NUM,
                urls.length)));
  Array arUrl = pqm.getConnection().createArrayOf("text",
                Arrays.copyOfRange(urls,
                offset, Math.min(offset + MAX_NUM, urls.length)));

  PreparedStatement ps = pqm.getConnection().prepareStatement(
                         "select url from links.links_" + key
                       + " where md5=any(?) and url=any(?)");

  ps.setArray(1, arMd5);
  ps.setArray(2, arUrl);
  ResultSet rs = ps.executeQuery();

  while (rs.next()) {
    String url = rs.getString(1);
    boolean printDebug = false;
    if (url.equals("http://mittcom.com/")) {
      hadMittcom = true;
      printDebug = true;
    }
    LinkVariation r = linksMap.remove(url);
    if (printDebug) {
      logger.info("Link variation: " + r);
    }
    if (r != null) {
      Map<String, String[]> linksMapOriginal = 
           linksByMD5MapOriginal.get(r.original[INDEX_MD5].substring(0, 3));
      if (printDebug) {
        logger.info("will try to fliter out ["
                    + r.original[INDEX_URL] + "]");
      }
      String[] remove = linksMapOriginal.remove(r.original[INDEX_URL]);
      if (remove != null) {
        if (printDebug) {
          logger.info("Filtered mittcom");
          filtered.incrementAndGet();
          checkStillHere();
        }
      } else {
        if (printDebug) {
          logger.info("Did not filter mittcom");
        }
      }
    }
  }

  rs.close();

  ps.close();

  offset += MAX_NUM;
}
if (expectingMittcom) {
  if (hadMittcom) {
    logger.info("was expecting mittcom and found");
  } else {
    logger.info("was expecting mittcom but didn't find");
  }
}

The problem is that url " http://mittcom.com " (and some other else, I just debug for this in particular) still stays in linksByMD5MapOriginal hashMap.问题是网址“ http://mittcom.com ”(以及其他一些,我只是专门为此进行调试)仍然保留在 linksByMD5MapOriginal hashMap 中。 I can see in log file that it was removed and filtered, but after threads finish running it is still there!我可以在日志文件中看到它已被删除和过滤,但在线程完成运行后它仍然存在! I do not understand how it can happen!我不明白它怎么会发生! I would suspect problems with different hashCode etc, but keys are plain String, there should be no problems like that.我怀疑不同的 hashCode 等有问题,但键是纯字符串,应该没有这样的问题。 I am really puzzled.我真的很困惑。

I check it like this after all the treads finish running:在所有踏板完成运行后,我像这样检查它:

    for (Map.Entry<String, Map<String, String[]>> entrySet : linksByMD5MapOriginal.entrySet()) {
        String key = entrySet.getKey();
        Map<String, String[]> value = entrySet.getValue();
        if (value.containsKey("http://mittcom.com/")) {
            logger.info("STILL HERE in " + key);
        }
    }

The hashMap is initilaized as follows: hashMap 初始化如下:

    protected Map<String, Map<String, String[]>> linksByMD5MapOriginal = new TreeMap<>();

... ...

    linksByMD5MapOriginal.put(md5Key, linksByKeyMap = Collections.synchronizedMap(new TreeMap<String, String[]>()));

Here TreeMap is for the easier debugging, it does not have to be ordered.这里的TreeMap是为了方便调试,不必排序。 The underlying hashMap is synchronized and there should be no problem concurrently modifying it.底层hashMap是同步的,并发修改应该没有问题。 Nothing adds anything to hashMap while threads are running.当线程运行时,没有任何东西添加到 hashMap 中。 Another strange thing is that I cannot use remote debugger (the programs runs on remote server), because the program hangs eventually if I try to do this, so I forced to debug with log printout.另一个奇怪的事情是我不能使用远程调试器(程序在远程服务器上运行),因为如果我尝试这样做,程序最终会挂起,所以我被迫使用日志打印输出进行调试。 But that not the general issue I am asking here for.但这不是我在这里要求的一般问题。 The problem is that filtered urls still hang in hashMap!问题是过滤后的 url 仍然挂在 hashMap 中!

Sorry if my question seems unclear, I will update my post if there will be any follow up questions.抱歉,如果我的问题不清楚,如果有任何后续问题,我会更新我的帖子。 Any help will be greatly appreciated.任何帮助将不胜感激。

UPD: log print out: UPD:日志打印:

[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Link variation: routines.queue.CheckUnique$LinkVariation@3f89fc46
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] will try to fliter out [http://www.mittcom.com/]
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Filtered mittcom
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Link variation: null
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Link variation: null
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Link variation: null
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Link variation: null
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Link variation: null
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Link variation: null
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Link variation: null
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Link variation: null
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] Link variation: null
[2017-10-04 07:25:57,580] [INFO ] [CheckUnique] [Thread-46229] was expecting mittcom and found

... ...

[2017-10-04 07:46:35,337] [INFO ] [CheckUnique] [main] STILL HERE in cd2

I fixed the error and it was somewhere in a piece of code unrelated to this.我修复了错误,它位于一段与此无关的代码中。 In short link variation checking mechanism was broken.简而言之,链接变化检查机制被破坏了。 Can I delete this question?我可以删除这个问题吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM