I'm trying to process a large amount of data and I'm a bit stuck on the best way to process the final calculation.
I have a HashMap. Each Book object has a data value called COUNT that holds how many times that book appears in my particular context. I want to iterate through the entire HashMap and do record the top ten most-appearing books in an array. At the same time, I also want to remove those top ten books from the HashMap. What is the best way to do this?
I would copy the map into a SortedMap, such as TreeMap, using a comparator that compares the count.
The rest should be obvious.
There is a tournament algorithm that runs in O(n) time and can be useful for large data ,
Optimal algorithm for returning top k values from an array of length N
If the data is not very huge then I would recommend using Collections.sort and creating a subList from your Map.
Another option is it to keep them in TreeMap and implement Comparable in your Book Object , that way your Map is always sorted . This is particularly useful if you are doing additions to your Map as you don't want to sort them every time you change an object.
Yes, you can't remove using a for
loop because like this
for(Book curBook: yourMap.values())
You will get a ConcurrentModificationException
. To remove elements while iterating, you have to use an iterator, for example:
HashMap<Book> yourMap;
Collection<Book> entries = yourMap.values();
Iterator<Book> iterator = entries.iterator();
while(iterator.hasNext()) {
Book curBook = iterator.next();
if (yourConditionToRemove) {
iterator.remove();
}
}
If this is a frequent operation, consider using TreeMap as suggested by Bohemian or at least keep a separate Map with most read Books.
I am not that proficient at Java, but I can think about the following algorithm. Assuming that the HashMap stores books according to their unique identifier (ie it gives you no ordering hints about COUNT
). You can:
COUNT
. For clarity, I will call this sequence O10S
(Ordered 10-element sequence) e
in HashMap
:
O10S
is not full yet insert e
in O10S
e
has a COUNT
higher than the element o
in O10S
with the minimum COUNT
(which should be easily identifiable since O10S
is ordered): remove o
from O10S
, insert e
in O10S
o
in O10S
, remove o
from HashMap
The algorithm is linear with respect of the elements in HashMap
(you only need to traverse the HashMap
once)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.