简体繁体中英

Finding the number of times each word in a hashset occurs in text document

原文 2012-06-03 09:08:34 9 2 java/ text/ classification/ bayesian

I'm implementing a Naive Bayes text classification algorithm in Java.

What I have done so far is, declare a hashset called Vocabulary which stores all the unique words from a given text file (test file).

One of the steps in the algorithm is to concatenate all the members of the test files into a single text file. This turns out to be a fairly big file with the words from each file.

Now, I have to count the number of occurrences of each word in the Vocabulary with the concatenated text file. My first guess is to keep a sort of an array structure which contains the frequencies of each word. But then again, I would have way too many entries.

Could anyone please give me better suggestions?

2 answers

Use a dictionary (HashMap) where the words are the keys and the values are the number of occurrences. If the HashSet fits into memory, HashMap should as well.

您可以尝试使用Tries，并且叶节点可以存储单词的频率。

Finding the number of times an element occurs in Java

Regex match if all characters in a dictionary word are present in the phrase. The number of times each character occurs must also match in each other

Find how many times a word or phrase occurs in a document

Finding the Number of Times an Expression Occurs in a String Continuously and Non Continuously

Check each position in the input entry and return the number of times a character occurs

Number of times a value occurs in each column of a 2d-array in Java?

How to count the number of times a word is in the text file

Extracting a number that occurs after a specific word from a longer text/file

How many times a number occurs

Find the line number of a text file by each word

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Finding the number of times an element occurs in Java Regex match if all characters in a dictionary word are present in the phrase. The number of times each character occurs must also match in each other Find how many times a word or phrase occurs in a document Finding the Number of Times an Expression Occurs in a String Continuously and Non Continuously Check each position in the input entry and return the number of times a character occurs Number of times a value occurs in each column of a 2d-array in Java? How to count the number of times a word is in the text file Extracting a number that occurs after a specific word from a longer text/file How many times a number occurs Find the line number of a text file by each word

Related Tags

Finding the number of times each word in a hashset occurs in text document

Question

2 answers

solution1
4 2012-06-03 09:11:04

solution2
0 2012-06-03 09:15:13

Finding the number of times each word in a hashset occurs in text document

Question

2 answers

solution1 4 2012-06-03 09:11:04

solution2 0 2012-06-03 09:15:13

solution1
4 2012-06-03 09:11:04

solution2
0 2012-06-03 09:15:13