[英]Comparing keys in HashMap and Values
I have a HashMap as follows- 我有一个HashMap如下-
HashMap<String, Integer> BC = new HashMap<String, Integer>();
which stores as keys- "tokens/tages" and as values- "frequency of each tokens/tags". 它存储为键-“令牌/年龄”和值-“每个令牌/标签的频率”。
Example- 例-
"the/at" 153
"that/cs" 45
"Ann/np" 3
I now parse through each key and check whether for same token say "the" whether it's associated with more than one tag and then take the largest of the two. 现在,我解析每个键,并检查同一令牌是否说“ the”是否与多个标签相关联,然后取两个标签中最大的一个。
Example- 例-
"the/at" 153
"the/det" 80
Then I take the key- "the/at"
with value - 153
. 然后,我将键
"the/at"
的值153
。
The code that I have written to do so is as follows- 我编写的代码如下:
private HashMap<String, Integer> Unigram_Tagger = new HashMap<String, Integer>();
for(String curr_key: BC.keySet())
{
for(String next_key: BC.keySet())
{
if(curr_key.equals(next_key))
continue;
else
{
String[] split_key_curr_key = curr_key.split("/");
String[] split_key_next_key = next_key.split("/");
//out.println("CK- " + curr_key + ", NK- " + next_key);
if(split_key_curr_key[0].equals(split_key_next_key[0]))
{
int ck_v = 0, nk_v = 0;
ck_v = BC.get(curr_key);
nk_v = BC.get(next_key);
if(ck_v > nk_v)
Unigram_Tagger.put(curr_key, BC.get(curr_key));
else
Unigram_Tagger.put(next_key, BC.get(next_key));
}
}
}
}
But this code is taking too long to compute since the original HashMap 'BC' has 68442 entries which comes approximately to its square = 4684307364 times (plus some more). 但是此代码的计算时间太长,因为原始的HashMap'BC'具有68442个条目,大约等于其平方= 4684307364倍(加上更多)。
My question is this- can I accomplish the same output using a more efficient method? 我的问题是-我可以使用更有效的方法完成相同的输出吗?
Thanks! 谢谢!
Create a new 创建一个新的
Map<String,Integer> highCount = new HashMap<>();
that will map tokens to their largest count. 会将令牌映射到最大数量。
Make a single pass through the keys. 一次通过按键。
Split each key into its component tokens. 将每个密钥分成其组件令牌。
For each token, look in highMap
. 对于每个令牌,请查看
highMap
。 If the key does not exist, add it with its count. 如果密钥不存在,请添加其数量。 If the entry already exists and the current count is greater than the previous maximum, replace the maximum in the map.
如果条目已经存在并且当前计数大于先前的最大值,请替换映射中的最大值。
When you are done with the single pass the highCount
will contain all the unique tokens along with the highest count seen for each token. 完成单遍操作后,
highCount
将包含所有唯一令牌以及每个令牌看到的最高计数。
Note: This answer is intended to give you a starting point from which to develop a complete solution. 注意:此答案旨在为您提供一个起点,以开发一个完整的解决方案。 The key concept is that you create and populate a new map from token to some "value" type (not necessarily just
Integer
) that provides you with the functionality you need. 关键概念是创建并填充从令牌到某种“值”类型(不一定只是
Integer
)的新映射,该映射可为您提供所需的功能。 Most likely the value type will be a new custom class that stores the tag and the count. 值类型很可能是一个新的自定义类,用于存储标记和计数。
The slowest part of your current method is due to the pairwise comparison of keys. 当前方法最慢的部分是由于密钥的成对比较。 First, define a
Tuple
class: 首先,定义一个
Tuple
类:
public class Tuple<X, Y> {
public final X x;
public final Y y;
public Tuple(X x, Y y) {
this.x = x;
this.y = y;
}
}
Thus you can try an algorithm that does: 因此,您可以尝试执行以下操作的算法:
HashMap<String, Tuple<String, Integer>> result
HashMap<String, Tuple<String, Integer>> result
(key, value)
from the old map, where key = "a/b"
, check whether result.keySet().contains(a)
and result.keySet().contains(b)
. (key, value)
,其中key = "a/b"
,请检查result.keySet().contains(a)
和result.keySet().contains(b)
。 a
and b
is not present, result.put(a, new Tuple<String, Integer>(b, value)
and result.put(b, new Tuple<String, Integer>(a, value))
a
和b
都不同时存在,则result.put(a, new Tuple<String, Integer>(b, value)
和result.put(b, new Tuple<String, Integer>(a, value))
a
is present, compare value
and v = result.get(a)
. a
存在时,比较value
和v = result.get(a)
。 If value > v
, remove a
and b
from result
and do step 3. Do the same for b
. value > v
,则从result
删除a
和b
并执行步骤3。对b
进行相同操作。 Otherwise, get the next key-value pair. After you have iterated through the old hash map and inserted everything, then you can easily reconstruct the output you want by transforming the key-values in result
. 遍历旧的哈希映射并插入所有内容之后,可以通过转换
result
的键值轻松地重建所需的输出。
A basic thought on the algorithm: 关于算法的基本思想:
You should get the entrySet() of the HashMap and convert it to a List: 您应该获取HashMap的entrySet()并将其转换为List:
ArrayList<Map.Entry<String, Integer>> list = new ArrayList<>(map.entrySet());
Now you should sort the list by the keys in alphabetical order. 现在,您应该按字母顺序对列表进行排序。 We do that because the HashMap has no order, so you can expect that the corresponding keys might be far apart.
我们这样做是因为HashMap没有顺序,因此您可以期望相应的键可能相距很远。 But by sorting them, all related keys are directly next to each other.
但是通过对它们进行排序,所有相关的键都直接相邻。
Collections.sort(list, Comparator.comparing(e -> e.getKey()));
The entries "the/at" and "the/det" will be next to each other, thanks to sorting alphabetically. 由于按字母顺序排序,条目“ the / at”和“ the / det”将彼此相邻。
Now you can iterate over the entire list while remembering the best item, until you find a better one or you find the first item which has not the same prefix (eg "the"). 现在,您可以在记住最佳项目的同时遍历整个列表,直到找到一个更好的项目,或者找到前缀不相同的第一个项目(例如“ the”)。
ArrayList<Map.Entry<String, Integer>> bestList = new ArrayList<>(); // The first entry of the list is considered the currently best item for it's group Map.Entry<String, Integer> currentBest = best.get(0); String key = currentBest.getKey(); String currentPrefix = key.substring(0, key.indexOf('/')); for (int i=1; i<list.size(); i++) { // The item we compare the current best with Map.Entry<String, Integer> next = list.get(i); String nkey = next.getKey(); String nextPrefix = nkey.substring(0, nkey.indexOf('/')); // If both items have the same prefix, then we want to keep the best one // as the current best item if (currentPrefix.equals(nextPrefix)) { if (currentBest.getValue() < next.getValue()) { currentBest = next; } // If the prefix is different we add the current best to the best list and // consider the current item the best one for the next group } else { bestList.add(currentBest); currentBest = next; currentPrefix = nextPrefix; } } // The last one must be added here, or we would forget it bestList.add(currentBest);
Now you should have a list of Map.Entry objects representing the desired entries. 现在,您应该具有一个代表所需条目的Map.Entry对象列表。 The complexity should be n(log n) and is limited by the sorting algorithm, while grouping/collection the items has a complexity of n.
复杂度应为n(log n),并受排序算法限制,而分组/收集项的复杂度为n。
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.TreeMap;
import java.util.stream.Collectors;
public class Point {
public static void main(String[] args) {
HashMap<String, Integer> BC = new HashMap<>();
//some random values
BC.put("the/at",5);
BC.put("Ann/npe",6);
BC.put("the/atx",7);
BC.put("that/cs",8);
BC.put("the/aty",9);
BC.put("Ann/np",1);
BC.put("Ann/npq",2);
BC.put("the/atz",3);
BC.put("Ann/npz",4);
BC.put("the/atq",0);
BC.put("the/atw",12);
BC.put("that/cs",14);
BC.put("that/cs1",16);
BC.put("the/at1",18);
BC.put("the/at2",100);
BC.put("the/at3",123);
BC.put("that/det",153);
BC.put("xyx",123);
BC.put("xyx/w",2);
System.out.println("\nUnsorted Map......");
printMap(BC);
System.out.println("\nSorted Map......By Key");
//sort original map using TreeMap, it will sort the Map by keys automatically.
Map<String, Integer> sortedBC = new TreeMap<>(BC);
printMap(sortedBC);
// find all distinct prefixes by spliting the keys at "/"
List<String> uniquePrefixes = sortedBC.keySet().stream().map(i->i.split("/")[0]).distinct().collect(Collectors.toList());
System.out.println("\nuniquePrefixes: "+uniquePrefixes);
TreeMap<String,Integer> mapOfMaxValues = new TreeMap<>();
// for each prefix from the list above filter the entries from the sorted map
// having keys starting with this prefix
//and sort them by value in descending order and get the first which will have the highst value
uniquePrefixes.stream().forEach(i->{
Entry <String,Integer> e =
sortedBC.entrySet().stream().filter(j->j.getKey().startsWith(i))
.sorted(Map.Entry.comparingByValue(Comparator.reverseOrder())).findFirst().get();
mapOfMaxValues.put(e.getKey(), e.getValue());
});
System.out.println("\nmapOfMaxValues...\n");
printMap(mapOfMaxValues);
}
//pretty print a map
public static <K, V> void printMap(Map<K, V> map) {
map.entrySet().stream().forEach((entry) -> {
System.out.println("Key : " + entry.getKey()
+ " Value : " + entry.getValue());
});
}
}
// note: only tested with random values provided in the code
// behavior for large maps untested
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.