简体   繁体   English

词频计数器 - Java

[英]Word Frequency Counter - Java

import java.io.EOFException;

public interface ICharacterReader {
char GetNextChar() throws EOFException;
void Dispose();
}

import java.io.EOFException;
import java.util.Random;

public class SimpleCharacterReader implements ICharacterReader {
private int m_Pos = 0;

public static final char lf = '\n';

private String m_Content = "It was the best of times, it was the worst of times," + 
lf +
"it was the age of wisdom, it was the age of foolishness," + 
lf +
"it was the epoch of belief, it was the epoch of incredulity," + 
lf +
"it was the season of Light, it was the season of Darkness," + 
lf +
"it was the spring of hope, it was the winter of despair," + 
lf +
"we had everything before us, we had nothing before us," + 
lf +
"countries it was clearer than crystal to the lords of the State" + 
lf +
"preserves of loaves and fishes, that things in general were" + 
lf +
"settled for ever";

Random m_Rnd = new Random();

public char GetNextChar() throws EOFException {

    if (m_Pos >= m_Content.length()) {
        throw new EOFException();
    }

    return m_Content.charAt(m_Pos++);

}

public void Dispose() {
    // do nothing
}
}

Basically I have created an interface called ICharacterReader that gets the next character in a sentence and throws an exception once there are no more characters.基本上,我创建了一个名为 ICharacterReader 的接口,它获取句子中的下一个字符,并在没有更多字符时抛出异常。 Underneath it I created a class called SimpleCharacterReader that includes a list of random sentences that need to be counted in a word frequency.在它下面,我创建了一个名为 SimpleCharacterReader 的类,其中包含需要按词频计算的随机句子列表。 However, now I am trying to make a separate class that takes the ICharacterReader interface as an argument and simply returns the word frequencies.但是,现在我正在尝试创建一个单独的类,该类将 ICharacterReader 接口作为参数并简单地返回词频。 I'm a beginner at programming so not really sure what to do here, any simple suggestion would be appreciated.我是编程的初学者,所以不太确定在这里做什么,任何简单的建议将不胜感激。

Your task can be done in two parts:您的任务可以分两部分完成:

1. Reading the char data and combining it to a String 1. 读取char数据并组合成String

Just use a StringBuilder and append char s until you get a exception.只需使用StringBuilder并附加char直到出现异常。

ICharacterReader reader = ...
StringBuilder sb = new StringBuilder();
try{
    while (true) {
        sb.append(reader.GetNextChar());
    }
} catch (EOFException ex) {
}
String stringData = sb.toString();

2. Counting word fequencies 2. 计算词频

Simply split the words using a regular expression and then simply count how often each word occurs.简单地使用正则表达式拆分单词,然后简单地计算每个单词出现的频率。 You can do this easily by using the Stream API:您可以使用Stream API 轻松完成此操作:

Map<String, Long> frequencies = Arrays.stream(stringData.split(" +|\n"))
                                      .collect(Collectors.groupingBy(Function.identity(),
                                                                     Collectors.counting()));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM