[英]Word Frequency Counter issue with logic java
我正在建立一個基本的單詞頻率計數器。 下面列出了代碼:
public static List<Frequency> computeWordFrequencies(List<String> words)
{
List<Frequency> list_of_frequency = new ArrayList<Frequency>();
List<String> list_of_words = words;
int j = 0;
for(int i=0; i<list_of_words.size(); i++)
{
String current_word = list_of_words.get(i);
boolean added = false;
if(list_of_frequency.size() == 0)
{
list_of_frequency.add(new Frequency(current_word, 1));
System.out.println("added " + current_word);
}
else
{
System.out.println("Current word: " + current_word);
System.out.println("Current Frequency: " + list_of_frequency.get(j).getText());
if(list_of_frequency.contains(current_word))
{
list_of_frequency.get(j).incrementFrequency();
System.out.println("found... incremented " + list_of_frequency.get(j).getText() + " frequency");
added = true;
}
else
{
list_of_frequency.add(new Frequency(current_word, 1));
System.out.println("added " + current_word);
added = true;
}
}
}
}
我得到的輸出是:
added I
Current word: am
Current Frequency: I
added am
Current word: very
Current Frequency: I
added very
Current word: good
Current Frequency: I
added good
Current word: at
Current Frequency: I
added at
Current word: being
Current Frequency: I
added being
Current word: good
Current Frequency: I
added good
Total item count: 7
Unique item count: 7
I:1
am:1
very:1
good:1
at:1
being:1
good:1
因此,我需要一個for循環來遍歷“ list_of_frequency”,但是如果這樣做,則會遇到其他問題,例如重復添加單詞。 我的邏輯在這里嗎,這個項目會有更好的方法嗎? 提前致謝!
您可以使用Collections
類的頻率方法執行此操作
這是一個示例:
public void wordFreq(){
String text = "hello bye hello a bb a bye hello";
List<String> list = Arrays.asList(text.split(" "));
Set<String> uniqueWords = new HashSet<String> (list);
for (String word : uniqueWords) {
System.out.println(word + ": " + Collections.frequency(list, word));
}
}
您使事情變得過於復雜。
您只需要幾行:
public static Map<String, Integer> getFrequencies(List<String> words) {
Map<String, Integer> freq = new HashMap<String, Integer>();
for (String word : words) {
Integer i = freq.get(word);
freq.put(word, i == null ? 1 : i + 1);
}
return freq;
}
將此代碼添加到其他部分中。 你應該做的是
否則將其放在頻率為1的列表中
for(j = 0; j < list_of_frequency.size; j++) if(list_of_frequency.get(i).getText().equals(current_word)) list_of_frequency.get(i).frequency++; // increment frequency //if word is already encountered before
我認為要運行得更快,您應該使用從列表排序開始的另一種算法:
1) sort your list of string (cf. java.util.Collections.sort())
2) in pseudo code :
iterate your sorted list
current_word = word of current iteration
if it's a new word (! current_word.equals( oldWord) )
counter = 1
if (current_word.equals( oldWord)) {
counter++
store current_word in variable oldWord
}
when the word change create your Frequency(oldWord, counter) and add to the list of frequencies
因此,您不需要每次都檢查頻率列表,而一次插入一個單詞就可以了。
由於list_of_frequency的所有條目都是唯一的單詞,因此您也可以使用Set代替list_of_frequency的列表。
以此替換您的方法。 通過在分析數據時使用地圖,您將獲得更好的性能。
public static List<Frequency> computeWordFrequencies(List<String> words) {
Map<String, Integer> counts = new HashMap<String, Integer>();
for(String word : words) {
Integer current = counts.get(word);
if(current != null) {
counts.put(word, current+1);
}
else counts.put(word, 1);
}
// Then, if you really need that list of Frequency
List<Frequency> list_of_frequency = new ArrayList<Frequency>();
for(String s : counts.keySet()) {
list_of_frequency.add(new Frequency(s, counts.get(s)));
}
return list_of_frequency;
}
我將這樣進行:
List<String> words = Arrays.asList("foo", "bar", "qux", "foo");
Map<String, AtomicInteger> frequencyMap = new HashMap<String, AtomicInteger>();
for (String word : words)
{
AtomicInteger freq = frequencyMap.get(word);
if (freq == null) {
frequencyMap.put(word, new AtomicInteger(1));
}
else
{
freq.incrementAndGet();
}
}
for (String word : frequencyMap.keySet())
{
System.out.println(word + " :" + frequencyMap.get(word));
}
通過使用AtomicInteger,您可以輕松地增加頻率計數器。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.