简体   繁体   English

使用HashMap [Java]计算文档频率

[英]Calculate document frequency using HashMap [Java]

I am trying to calculate document frequency (ie in how many documents each word appears), example: 我正在尝试计算文档频率(即每个单词出现的文档数量),例如:

Doc1: this phone is the greatest phone ever. Doc1: this phone is the greatest phone ever.
Doc2: what's your phone number. Doc2: what's your phone number.

Result: 结果:

this              1
phone             2
is                1
the               1
ever              1
what's            1
your              1
number            1

I have the following code in Java 我在Java中有以下代码

HashMap<String, String> wordDoc = new HashMap<String, String>();
HashMap<String, Integer> countDfIndex = new HashMap<String, Integer>();

if (!wordDoc.containsKey(word)) {
    wordDoc.put(word,docno);
    countDfIndex.put(word, 1);
}
if (wordDoc.get(word)!=null) {
    if(!wordDoc.containsValue(docno)) {
        wordDoc.put(word,docno);
        countDfIndex.put(word, countDfIndex.get(word)+1);
    }
}

I am not getting the right result, Kindly help!! 我没有得到正确的结果,请帮助!!

I assume you're trying to count number of documents containing the respective word, rather than total number of occurrences. 我假设您正在尝试计算包含相应单词的文档数量,而不是总出现次数。

If so: 如果是这样的话:

Map<String, Integer> countDfIndex = new HashMap<String, Integer>();

for (... document : documents) {
    Set<String> alreadyAdded = new HashSet<String>(); // new empty set for each document

    ...

    if (!alreadyAdded.contains(word)) {
        if (!countDfIndex.containsKey(word) {
            countDfIndex.put(word, 1);
        } else {
            countDfIndex.put(word, countDfIndex.get(word) + 1);
        }
        alreadyAdded.add(word); // don't add the word anymore if found again in the document
    }

}
public static void add(Map<String, Integer> map, String word) {
    map.put(word, map.containsKey(word) ? map.get(word) + 1 : 1);
}

for (String i : s.replace(".", "").split(" ")) add(map, i);

where, 哪里,

  • map = new HashMap<String, Integer>();
  • s = "this phone is the greatest phone ever. what's your phone number."

Finally, the map contains 最后,地图包含

{the=1, ever=1, number=1, phone=3, this=1, what's=1, is=1, your=1, greatest=1}
HashMap<String, Integer> countDfIndex = new HashMap<String, Integer>();

if (!countDfIndex.containsKey(word))
    {
      countDfIndex.put(word, 1);
    }
else{
int i =countDfIndex.get(word);
countDfIndex.put(word,i+1);
}
for(Map.Entry<String,Integer> pair: countDfIndex.entrySet()){   

                int count=pair.getValue();
                String word=pair.getKey();
                System.out.println("word is "+word+"count is "+count);

            }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM