简体   繁体   English

将两个LinkedHashMaps与列表中的值进行比较

[英]Comparing two LinkedHashMaps with values as a list

I've asked this question in different ways a couple of times already. 我已经以不同的方式问过几次这个问题。 Each time I get a breakthrough I encounter another issue. 每当我取得突破时,我都会遇到另一个问题。 This is also due to the fact that I am not proficient in Java yet and have difficulty with collections like Maps. 这也是由于我还不精通Java,并且在使用Maps这样的集合时遇到了困难。 So please bear with me. 所以,请忍受我。

I have two maps like this: 我有两张这样的地图:

Map1 -{ORGANIZATION=[Fulton Tax Commissioner 's Office, Grady Hospital, Fulton Health Department], LOCATION=[Bellwood, Alpharetta]}

Map2 - {ORGANIZATION=[Atlanta Police Department, Fulton Tax Commissioner, Fulton Health Department], LOCATION=[Alpharetta], PERSON=[Bellwood, Grady Hospital]}

The maps are defined as : LinkedHashMap<String, List<String>> sampleMap = new LinkedHashMap<String, List<String>>(); 映射定义为: LinkedHashMap<String, List<String>> sampleMap = new LinkedHashMap<String, List<String>>();

I am comparing these two maps based on the values and there are only 3 keys ie ORGANIZATION, PERSON and LOCATION. 我正在根据值比较这两个地图,并且只有3个键,即ORGANIZATION,PERSON和LOCATION。 Map1 is my goldset that I am comparing Map2 against. Map1是我比较Map2的黄金地段。 Now the problem that I am facing is when I iterate over the values of ORGANIZATION key in Map1 and check for matches in Map2, even though my first entry does have a partial match in Map2 (Fulton Tax Commissioner) but because the first entry of Map2 (Atlanta Police Department) is not a match I get an incorrect result(I am looking for exact and partial matches both). 现在,我面临的问题是当我遍历Map1中的ORGANIZATION键的值并检查Map2中的匹配项时,即使我的第一项确实在Map2中具有部分匹配项(富尔顿税收专员),但是因为Map2的第一项是(亚特兰大警察局)不是匹配项,我得到的结果不正确(我正在寻找完全匹配和部分匹配项)。 The result here being increment the true positive, false positive and false negative counters which enable me to ultimately calculate precision and recall for this ie Named Entity Recognition. 结果是增加了真实的正,假正和假负计数器,这使我能够最终计算精度并为此调用,即命名实体识别。

EDIT 编辑

The result I am expecting for this is 我期望的结果是

Organization: 
True Positive Count = 2
False Negative Count = 1
False Positive Count = 1

Person:
False Positive Count = 2

Location:
True Positive Count = 1
False Negative Count = 1

The output I am currently getting is : 我目前得到的输出是:

Organization: 
    True Positive Count = 1
    False Negative Count = 2
    False Positive Count = 0

    Person:
    True Positive Count = 0
    False Negative Count = 0
    False Positive Count = 2

    Location:
    True Positive Count = 0
    False Negative Count = 1
    False Positive Count = 0

CODE

private static List<Integer> compareMaps(LinkedHashMap<String, List<String>> annotationMap, LinkedHashMap<String, List<String>> rageMap) 
    {
        List<Integer> compareResults = new ArrayList<Integer>();  

         if (!annotationMap.entrySet().containsAll(rageMap.entrySet())){
               for (Entry<String, List<String>> rageEntry : rageMap.entrySet()){
                   if (rageEntry.getKey().equals("ORGANIZATION") && !(annotationMap.containsKey(rageEntry.getKey()))){
                       for (int j = 0; j< rageEntry.getValue().size(); j++) {
                           orgFalsePositiveCount++;
                       }
               }
                   if (rageEntry.getKey().equals("PERSON") && !(annotationMap.containsKey(rageEntry.getKey()))){
                      // System.out.println(rageEntry.getKey());
                      // System.out.println(annotationMap.entrySet());
                       for (int j = 0; j< rageEntry.getValue().size(); j++) {
                           perFalsePositiveCount++;
                       }
               }
                   if (rageEntry.getKey().equals("LOCATION") && !(annotationMap.containsKey(rageEntry.getKey()))){
                       for (int j = 0; j< rageEntry.getValue().size(); j++) {
                           locFalsePositiveCount++;
                     }
                 }
              }
           }



               for (Entry<String, List<String>> entry : annotationMap.entrySet()){

                   int i_index = 0;
                   if (rageMap.entrySet().isEmpty()){
                       orgFalseNegativeCount++;
                       continue;
                   }

                  // for (Entry<String, List<String>> rageEntry : rageMap.entrySet()){

                   if (entry.getKey().equals("ORGANIZATION")){
                       for(String val : entry.getValue()) {
                           if (rageMap.get(entry.getKey()) == null){
                               orgFalseNegativeCount++;
                               continue;
                       }
            recusion:      for (int i = i_index; i< rageMap.get(entry.getKey()).size();){
                                String rageVal = rageMap.get(entry.getKey()).get(i);
                               if(val.equals(rageVal)){
                                   orgTruePositiveCount++;
                                   i_index++;
                                   break recusion;
                       }

                           else if((val.length() > rageVal.length()) && val.contains(rageVal)){  //|| dataB.get(entryA.getKey()).contains(entryA.getValue())){
                               orgTruePositiveCount++;
                               i_index++;
                               break recusion;
                       }
                           else if((val.length() < rageVal.length()) && rageVal.contains(val)){
                               orgTruePositiveCount++;
                                i_index++;
                                break recusion;
                           }

                           else if(!val.contains(rageVal)){
                               orgFalseNegativeCount++;
                               i_index++;
                               break recusion;
                           }
                           else if(!rageVal.contains(val)){
                                 orgFalsePositiveCount++;
                                 i_index++;
                                 break recusion;
                             }


                      }
                    }
                   }

                  ......................... //(Same for person and location)


                    compareResults.add(orgTruePositiveCount); 
                    compareResults.add(orgFalseNegativeCount); 
                    compareResults.add(orgFalsePositiveCount);  
                    compareResults.add(perTruePositiveCount); 
                    compareResults.add(perFalseNegativeCount);  
                    compareResults.add(perFalsePositiveCount); 
                    compareResults.add(locTruePositiveCount); 
                    compareResults.add(locFalseNegativeCount);  
                    compareResults.add(locFalsePositiveCount); 

                    System.out.println(compareResults);
                    return compareResults;

            }  

here if i got this right it may help. 如果我做对了,这可能会有所帮助。

i created a custom String to override the equals for partial match 我创建了一个自定义字符串来覆盖部分匹配的等式

public class MyCustomString {

    private String myString;

    public MyCustomString(String myString) {
        this.myString = myString;
    }

    public String getMyString() {
        return myString;
    }

    public void setMyString(String myString) {
        this.myString = myString;
    }

    @Override
    public boolean equals(Object obj) {
        if (obj == null) {
            return false;
        }
        if (getClass() != obj.getClass()) {
            return false;
        }
        final MyCustomString other = (MyCustomString) obj;
        if (!Objects.equals(this.myString, other.myString) && !other.myString.contains(this.myString)) {
            return false;
        }
        return true;
    }

    // add getter and setter for myString 
    // or delegate needed methods to myString object.
    @Override
    public int hashCode() {
        int hash = 3;
        hash = 47 * hash + Objects.hashCode(this.myString);
        return hash;
    }
}

and here the code i tried with the first part of your map 这是我尝试使用地图第一部分的代码

LinkedHashMap<String, List<MyCustomString>> sampleMap1 = new LinkedHashMap<String, List<MyCustomString>>();
        sampleMap1.put("a", new ArrayList<>());
        sampleMap1.get("a").add(new MyCustomString("Fulton Tax Commissioner 's Office"));
        sampleMap1.get("a").add(new MyCustomString("Grady Hospital"));
        sampleMap1.get("a").add(new MyCustomString("Fulton Health Department"));

        LinkedHashMap<String, List<MyCustomString>> sampleMap2 = new LinkedHashMap<String, List<MyCustomString>>();
        sampleMap2.put("a", new ArrayList<>());
        sampleMap2.get("a").add(new MyCustomString("Atlanta Police Department"));
        sampleMap2.get("a").add(new MyCustomString("Fulton Tax Commissioner"));
        sampleMap2.get("a").add(new MyCustomString("Fulton Health Department"));

        HashMap<String, Integer> resultMap = new HashMap<String, Integer>();

        for (Map.Entry<String, List<MyCustomString>> entry : sampleMap1.entrySet()) {
            String key1 = entry.getKey();
            List<MyCustomString> value1 = entry.getValue();
            List<MyCustomString> singleListOfMap2 = sampleMap2.get(key1);
            if (singleListOfMap2 == null) {
                // all entry are false negative
                System.out.println("Number of false N" + value1.size());
            }
            for (MyCustomString singleStringOfMap2 : singleListOfMap2) {
                if (value1.contains(singleStringOfMap2)) {
                    //True positive
                    System.out.println("true");
                } else {
                    //false negative
                    System.out.println("false N");
                }
            }
            int size = singleListOfMap2.size();
            System.out.println(size + " - numero di true");
            //false positive = size - true
        }
        for (String string : sampleMap2.keySet()) {
            if (sampleMap1.get(string) == null) {
                //all these are false positive
                System.out.println("numero di false P: " + sampleMap2.get(string).size());
            }
        }

I wrote this class to compare the maps: 我编写了此类来比较地图:

public class MapComparison<K, V> {
    private final Map<K, Collection<ValueCounter>> temp;
    private final Map<K, Collection<V>> goldMap;
    private final Map<K, Collection<V>> comparedMap;
    private final BiPredicate<V, V> valueMatcher;

    public MapComparison(Map<K, Collection<V>> mapA, Map<K, Collection<V>> mapB, BiPredicate<V, V> valueMatcher) {
        this.goldMap = mapA;
        this.comparedMap = mapB;
        this.valueMatcher = valueMatcher;

        this.temp = new HashMap<>();

        goldMap.forEach((key, valueList) -> {
            temp.put(key, valueList.stream().map(value -> new ValueCounter(value, true)).collect(Collectors.toList()));
        });

        comparedMap.entrySet().stream().forEach(entry -> {

            K key = entry.getKey();
            Collection<V> valueList = entry.getValue();

            if(temp.containsKey(key)) {
                Collection<ValueCounter> existingMatches = temp.get(key);

                Stream<V> falsePositives = valueList.stream().filter(v -> existingMatches.stream().noneMatch(mv -> mv.match(v)));

                falsePositives.forEach(fp -> existingMatches.add(new ValueCounter(fp, false)));
            } else {
                temp.putIfAbsent(key, valueList.stream().map(value -> new ValueCounter(value, false)).collect(Collectors.toList()));
            }
        });
    }

    public String formatMatchedCounters() {
        StringBuilder sb = new StringBuilder();

        for(Entry<K, Collection<ValueCounter>> e : temp.entrySet()) {
            sb.append(e.getKey()).append(":");

            int[] counters = e.getValue().stream().collect(() -> new int[3], (a, b) -> {
                a[0] += b.truePositiveCount;
                a[1] += b.falsePositiveCount;
                a[2] += b.falseNegativeCount;
            }, (c, d) -> {
                c[0] += d[0];
                c[1] += d[1];
                c[2] += d[2];
            });
            sb.append(String.format("\ntruePositiveCount=%s\nfalsePositiveCount=%s\nfalseNegativeCount=%s\n\n", counters[0], counters[1], counters[2]));
        }
        return sb.toString();
    }


    private class ValueCounter {
        private final V goldValue;

        private int truePositiveCount = 0;
        private int falsePositiveCount = 0;
        private int falseNegativeCount = 0;

        ValueCounter(V value, boolean isInGoldMap) {
            this.goldValue = value;

            if(isInGoldMap) {
                falseNegativeCount = 1;
            } else {
                falsePositiveCount = 1;
            }
        }

        boolean match(V otherValue) {
            boolean result = valueMatcher.test(goldValue, otherValue);

            if(result) {
                truePositiveCount++;

                falseNegativeCount = 0;
            }
            return result;
        }
    }
}

What is does is basically creating a union of map items, and each item has it's own mutable counter to calculate matching values. 基本上,这是创建映射项的并集,并且每个项都有自己的可变计数器来计算匹配值。 The method formatMatchedCounters() just iterates and sums these counters for each of the keys. 方法formatMatchedCounters()只是对每个键迭代并累加这些计数器。

The following test: 以下测试:

public class MapComparisonTest {

    private Map<String, Collection<String>> goldMap;
    private Map<String, Collection<String>> comparedMap;
    private BiPredicate<String, String> valueMatcher;

    @Before
    public void initMaps() {
        goldMap = new HashMap<>();
        goldMap.put("ORGANIZATION", Arrays.asList("Fulton Tax Commissioner", "Grady Hospital", "Fulton Health Department"));
        goldMap.put("LOCATION", Arrays.asList("Bellwood", "Alpharetta"));

        comparedMap = new HashMap<>();
        comparedMap.put("ORGANIZATION", Arrays.asList("Atlanta Police Department", "Fulton Tax Commissioner", "Fulton Health Department"));
        comparedMap.put("LOCATION", Arrays.asList("Alpharetta"));
        comparedMap.put("PERSON", Arrays.asList("Bellwood", "Grady Hospital"));

        valueMatcher = String::equalsIgnoreCase;
    }

    @Test
    public void test() {
        MapComparison<String, String> comparison = new MapComparison<>(goldMap, comparedMap, valueMatcher);

        System.out.println(comparison.formatMatchedCounters());
    }
}

has the result of: 结果为:

ORGANIZATION:
truePositiveCount=2
falsePositiveCount=1
falseNegativeCount=1

LOCATION:
truePositiveCount=1
falsePositiveCount=0
falseNegativeCount=1

PERSON:
truePositiveCount=0
falsePositiveCount=2
falseNegativeCount=0

Note, that I don't know how you want to compare similar values (eg "Fulton Tax Commissioner" vs "Fulton Tax Commissioner s"), so I decided to put that decision in the signature (in this case a BiPredicate as parameter). 请注意,我不知道您想如何比较相似的值(例如“ Fulton Tax Commissioner”与“ Fulton Tax Commissioner s”),因此我决定将该决定放在签名中(在这种情况下,将BiPredicate作为参数) 。

For example, the String comparison could be implemented using the Levenshtein distance : 例如,可以使用Levenshtein距离实现String比较:

valueMatcher = (s1, s2) -> StringUtils.getLevenshteinDistance(s1, s2) < 5;

I came up with a simplified version. 我想出了一个简化的版本。 This is the output that I get: 这是我得到的输出:

Organization:
    False Positive: Atlanta Police Department
    True Positive: Fulton Tax Commissioner
    True Positive: Fulton Health Department
    False Negative: Grady Hospital

Person:
    False Positive: Bellwood
    False Positive: Grady Hospital

Location:
    True Positive: Alpharetta
    False Negative: Bellwood

[2, 1, 1, 0, 0, 2, 1, 1, 0]

Here is the code that I created: 这是我创建的代码:


public class MapCompare {

    public static boolean listContains(List<String> annotationList, String value) {
        if(annotationList.contains(value)) {
            // 100% Match
            return true;
        }
        for(String s: annotationList) {
            if (s.contains(value) || value.contains(s)) {
                // Partial Match
                return true;
            }
        }
        return false;
    }

    public static List<Integer> compareLists(List<String> annotationList, List<String> rageList){
        List<Integer> compareResults = new ArrayList<Integer>();
        if(annotationList == null || rageList == null) return Arrays.asList(0, 0, 0);
        Integer truePositiveCount = 0;
        Integer falseNegativeCount = 0;
        Integer falsePositiveCount = 0;

        for(String r: rageList) {
            if(listContains(annotationList, r)) {
                System.out.println("\tTrue Positive: " + r);
                truePositiveCount ++;
            } else {
                System.out.println("\tFalse Positive: " + r);
                falsePositiveCount ++;
            }
        }

        for(String s: annotationList) {
            if(listContains(rageList, s) == false){
                System.out.println("\tFalse Negative: " + s);
                falseNegativeCount ++;
            }
        }

        compareResults.add(truePositiveCount);
        compareResults.add(falseNegativeCount);
        compareResults.add(falsePositiveCount);

        System.out.println();

        return compareResults;
    }

    private static List<Integer> compareMaps(LinkedHashMap<String, List<String>> annotationMap, LinkedHashMap<String, List<String>> rageMap) {
        List<Integer> compareResults = new ArrayList<Integer>();
        System.out.println("Organization:");
        compareResults.addAll(compareLists(annotationMap.get("ORGANIZATION"), rageMap.get("ORGANIZATION")));
        System.out.println("Person:");
        compareResults.addAll(compareLists(annotationMap.get("PERSON"), rageMap.get("PERSON")));
        System.out.println("Location:");
        compareResults.addAll(compareLists(annotationMap.get("LOCATION"), rageMap.get("LOCATION")));
        System.out.println(compareResults);
        return compareResults;
    }

    public static void main(String[] args) {
        LinkedHashMap<String, List<String>> Map1 = new LinkedHashMap<>();
        List<String> m1l1 = Arrays.asList("Fulton Tax Commissioner's Office", "Grady Hospital", "Fulton Health Department");
        List<String> m1l2 = Arrays.asList("Bellwood", "Alpharetta");
        List<String> m1l3 = Arrays.asList();
        Map1.put("ORGANIZATION", m1l1);
        Map1.put("LOCATION", m1l2);
        Map1.put("PERSON", m1l3);

        LinkedHashMap<String, List<String>> Map2 = new LinkedHashMap<>();
        List<String> m2l1 = Arrays.asList("Atlanta Police Department", "Fulton Tax Commissioner", "Fulton Health Department");
        List<String> m2l2 = Arrays.asList("Alpharetta");
        List<String> m2l3 = Arrays.asList("Bellwood", "Grady Hospital");

        Map2.put("ORGANIZATION", m2l1);
        Map2.put("LOCATION", m2l2);
        Map2.put("PERSON", m2l3);

        compareMaps(Map1, Map2);

    }

}

Hope this helps! 希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM