簡體   English   中英

詞頻計數器 - Java

[英]Word Frequency Counter - Java

import java.io.EOFException;

public interface ICharacterReader {
char GetNextChar() throws EOFException;
void Dispose();
}

import java.io.EOFException;
import java.util.Random;

public class SimpleCharacterReader implements ICharacterReader {
private int m_Pos = 0;

public static final char lf = '\n';

private String m_Content = "It was the best of times, it was the worst of times," + 
lf +
"it was the age of wisdom, it was the age of foolishness," + 
lf +
"it was the epoch of belief, it was the epoch of incredulity," + 
lf +
"it was the season of Light, it was the season of Darkness," + 
lf +
"it was the spring of hope, it was the winter of despair," + 
lf +
"we had everything before us, we had nothing before us," + 
lf +
"countries it was clearer than crystal to the lords of the State" + 
lf +
"preserves of loaves and fishes, that things in general were" + 
lf +
"settled for ever";

Random m_Rnd = new Random();

public char GetNextChar() throws EOFException {

    if (m_Pos >= m_Content.length()) {
        throw new EOFException();
    }

    return m_Content.charAt(m_Pos++);

}

public void Dispose() {
    // do nothing
}
}

基本上,我創建了一個名為 ICharacterReader 的接口,它獲取句子中的下一個字符,並在沒有更多字符時拋出異常。 在它下面,我創建了一個名為 SimpleCharacterReader 的類,其中包含需要按詞頻計算的隨機句子列表。 但是,現在我正在嘗試創建一個單獨的類,該類將 ICharacterReader 接口作為參數並簡單地返回詞頻。 我是編程的初學者,所以不太確定在這里做什么,任何簡單的建議將不勝感激。

您的任務可以分兩部分完成:

1. 讀取char數據並組合成String

只需使用StringBuilder並附加char直到出現異常。

ICharacterReader reader = ...
StringBuilder sb = new StringBuilder();
try{
    while (true) {
        sb.append(reader.GetNextChar());
    }
} catch (EOFException ex) {
}
String stringData = sb.toString();

2. 計算詞頻

簡單地使用正則表達式拆分單詞,然后簡單地計算每個單詞出現的頻率。 您可以使用Stream API 輕松完成此操作:

Map<String, Long> frequencies = Arrays.stream(stringData.split(" +|\n"))
                                      .collect(Collectors.groupingBy(Function.identity(),
                                                                     Collectors.counting()));

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM