I have this code which prints me a list of words sorted by keys (alphabetically) from counts, my ConcurrentHashMap which stores words as keys and their frequencies as values.
// Method to create a stopword list with the most frequent words from the lemmas key in the json file
private static List<String> StopWordsFile(ConcurrentHashMap<String, String> lemmas) {
// counts stores each word and its frequency
ConcurrentHashMap<String, Integer> counts = new ConcurrentHashMap<String, Integer>();
// corpus is an array list for all the individual words
ArrayList<String> corpus = new ArrayList<String>();
for (Entry<String, String> entry : lemmas.entrySet()) {
String line = entry.getValue().toLowerCase();
line = line.replaceAll("\\p{Punct}", " ");
line = line.replaceAll("\\d+"," ");
line = line.replaceAll("\\s+", " ");
line = line.trim();
String[] value = line.split(" ");
List<String> words = new ArrayList<String>(Arrays.asList(value));
corpus.addAll(words);
}
// count all the words in the corpus and store the words with each frequency i
//counts
for (String word : corpus) {
if (counts.keySet().contains(word)) {
counts.put(word, counts.get(word) + 1);
} else {counts.put(word, 1);}
}
// Create a list to store all the words with their frequency and sort it by values.
List<Entry<String, Integer>> list = new ArrayList<>(counts.entrySet());
List<String> stopwordslist = new ArrayList<>(counts.keySet()); # this works but counts.values() gives an error
Collections.sort(stopwordslist);
System.out.println("List after sorting: " +stopwordslist);
So the output is:
List after sorting: [a, abruptly, absent, abstractmap, accept,...]
How can I sort them by values as well? when I use List stopwordslist = new ArrayList<>(counts.values());
I get an error,
- Cannot infer type arguments for ArrayList<>
I guess that is because ArrayList can store < String > but not <String,Integer> and it gets confused.
I have also tried to do it with a custom Comparator like so:
Comparator<Entry<String, Integer>> valueComparator = new Comparator<Entry<String,Integer>>() {
@Override
public int compare(Entry<String, Integer> e1, Entry<String, Integer> e2) {
String v1 = e1.getValue();
String v2 = e2.getValue();
return v1.compareTo(v2);
}
};
List<Entry<String, Integer>> stopwordslist = new ArrayList<Entry<String, Integer>>();
// sorting HashMap by values using comparator
Collections.sort(counts, valueComparator)
which gives me another error,
The method sort(List<T>, Comparator<? super T>) in the type Collections is not applicable for the arguments (ConcurrentHashMap<String,Integer>, Comparator<Map.Entry<String,Integer>>)
how can I sort my list by values?
my expected output is something like
[the, of, value, v, key, to, given, a, k, map, in, for, this, returns, if, is, super, null, specified, u, function, and, ...]
Let's go through all the issues of your code
Name conventions. Method names should start with a lowercase letter.
Unnecessary use of ConcurrentHashMap
. For a purely local use like within you method, an ordinary HashMap
will do. For parameters, just use the Map
interface, to allow the caller to use whatever Map
implementation will fit.
Unnecessarily iterating over the entrySet()
. When you're only interested in the values, you don't need to use entrySet()
and call getValue()
on every entry; you can iterate over values()
in the first place. Likewise, you would use keySet()
when you're interested in the keys only. Only iterate over entrySet()
when you need key and value (or want to perform updates).
Don't replace pattern matches by spaces, to split by the spaces afterwards. Specify the (combined) pattern directly to split
, ie line.split("[\\p{Punct}\\d\\s]+")
.
Don't use List<String> words = new ArrayList<String>(Arrays.asList(value));
unless you specifically need the features of an ArrayList
. Otherwise, just use List<String> words = Arrays.asList(value);
But when the only thing you're doing with the list, is addAll
to another collection, you can use Collections.addAll(corpus, value);
without the List
detour.
Don't use counts.keySet().contains(word)
as you can simply use counts.containsKey(word)
. But you can simplify the entire
if (counts.containsKey(word)) { counts.put(word, counts.get(word) + 1); } else {counts.put(word, 1);}
to
counts.merge(word, 1, Integer::sum);
The points above yield
ArrayList<String> corpus = new ArrayList<>(); for(String line: lemmas.values()) { String[] value = line.toLowerCase().trim().split("[\\p{Punct}\\d\\s]+"); Collections.addAll(corpus, value); } for (String word: corpus) { counts.merge(word, 1, Integer::sum); }
But there is no point in performing two loops, the first only to store everything into a potentially large list, to iterate over it a single time. You can perform the second loop's operation right in the first (resp. only) loop and get rid of the list.
for(String line: lemmas.values()) { for(String word: line.toLowerCase().trim().split("[\\p{Punct}\\d\\s]+")) { counts.merge(word, 1, Integer::sum); } }
You already acknowledged that you can't sort a map, by copying the map into a list and sorting the list in your first variant. In the second variant, you created a List<Entry<String, Integer>>
but then, you didn't use it at all but rather tried to pass the map to sort
. (By the way, since Java 8, you can invoke sort
directly on a List
, no need to call Collections.sort
).
You have to keep copying the map data into a list and sorting the list. For example,
List<Map.Entry<String, Integer>> list = new ArrayList<>(counts.entrySet()); list.sort(Map.Entry.comparingByValue());
Now, you have to decide whether you change the return type to List<Map.Entry<String, Integer>>
or copy the keys of the sorted entries to a new list.
Taking all points together and staying with the original return type, the fixed code looks like
private static List<String> stopWordsFile(Map<String, String> lemmas) {
Map<String, Integer> counts = new HashMap<>();
for(String line: lemmas.values()) {
for(String word: line.toLowerCase().trim().split("[\\p{Punct}\\d\\s]+")) {
counts.merge(word, 1, Integer::sum);
}
}
List<Map.Entry<String, Integer>> list = new ArrayList<>(counts.entrySet());
list.sort(Map.Entry.comparingByValue());
List<String> stopwordslist = new ArrayList<>();
for(Map.Entry<String, Integer> e: list) stopwordslist.add(e.getKey());
// System.out.println("List after sorting: " + stopwordslist);
return stopwordslist;
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.