![](/img/trans.png)
[英]I am trying to count the number of occurrences of “pairs of word” in a text file using Hadoop MapReduce
[英]How would I search for a user determined word and count the occurrences in a text file using java?
我已经到了可以读取文件并在文件中输出实际文本的地步,但是我不太确定如何继续搜索特定单词并显示单词数。
有很多方法。 如果要逐行读取文件,则可以使用String
类上的indexOf
方法在每一行中搜索文本。 您需要反复调用它以遍历该行以查找其他事件。
请参阅indexOf
上的文档,位于:
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#indexOf(java.lang.String,%20int)
据我了解您的问题,如果您正在逐行阅读文本行,则可以使用递归计算该行中出现该单词的次数:
以下方法计算单词在同一行中出现的次数
private static int numberOfLineOccurences;
public static int countNumberOfTimesInALine(String line, String word) {
if (line.indexOf(word) == -1) {
return numberOfLineOccurences;
} else {
numberOfLineOccurences++;
if (line.indexOf(word) + word.length() > line.length() -1 ) {
return numberOfLineOccurences;
}
return countNumberOfTimesInALine(
line.substring(line.indexOf(word) + word.length()), word );
}
}
为了跟踪单词在文件中的首次出现以及出现的次数,我创建了一个WordInfo类,如下所示:
class WordInfo {
private int firstOccurenceLineNumber;
private int firstOccurenceColumnNumber;
private String word;
private int numberOfOccurences;
public String getWord() {
return word;
}
public int getNumberOfOccurences() {
return numberOfOccurences;
}
public WordInfo(String word) {
this.word = word;
}
public void upOccurrence() {
numberOfOccurences++;
}
public void upOccurrence(int numberOfTimes) {
numberOfOccurences+= numberOfTimes;
}
public int getFirstOccurenceLineNumber() {
return firstOccurenceLineNumber;
}
public void setFirstOccurenceLineNumber(int firstOccurenceLineNumber) {
this.firstOccurenceLineNumber = firstOccurenceLineNumber;
}
public int getFirstOccurenceColumnNumber() {
return firstOccurenceColumnNumber;
}
public void setFirstOccurenceColumnNumber(int firstOccurenceColumnNumber) {
this.firstOccurenceColumnNumber = firstOccurenceColumnNumber;
}
}
现在,我可以创建我的searchWord方法。 我给他寻找的单词,fileName和一个WordInfo对象作为输入参数填充
public static boolean searchWord(String word, String filePath, WordInfo wInfo) throws IOException {
boolean result = false;
boolean firstOccurenceFound = false;
int lineNumber = 0;
BufferedReader reader = new BufferedReader(new FileReader(new File(filePath)));
String line = null;
while ( (line = reader.readLine()) != null) {
lineNumber++;
numberOfLineOccurences= 0;
if (line.indexOf(word) != -1) {
if (!result) {
result = true;
}
if (!firstOccurenceFound) {
firstOccurenceFound = true;
wInfo.setFirstOccurenceLineNumber(lineNumber);
wInfo.setFirstOccurenceColumnNumber(line.indexOf(word) + 1);
}
wInfo.upOccurrence(countNumberOfTimesInALine(line, word));
}
}
reader.close();
return result;
}
这是一个例子,下面是结果
我在名为DemoFile.txt的文件中具有以下内容
然后,我使用以下主要方法测试代码(例如,我正在寻找单词concept
):
public static void main(String[] args) throws IOException {
WordInfo wInfo = new WordInfo("concept");
if ( searchWord("concept", FILE_PATH, wInfo)) {
System.out.println("Searching for " + wInfo.getWord());
System.out.println("First line where found : " + wInfo.getFirstOccurenceLineNumber());
System.out.println("First column found: " + wInfo.getFirstOccurenceColumnNumber());
System.out.println("Number of occurrences " + wInfo.getNumberOfOccurences());
}
}
并且我得到以下结果:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.