计算文件中特定字符串的出现

Question

这是我工作的代码：

while ((lineContents = tempFileReader.readLine()) != null)
{
            String lineByLine = lineContents.replaceAll("/\\.", System.getProperty("line.separator")); //for matching /. and replacing it by new line
            changer.write(lineByLine);
            Pattern pattern = Pattern.compile("\\r?\\n"); //Find new line
            Matcher matcher = pattern.matcher(lineByLine);
            while(matcher.find())
            {
                Pattern tagFinder = Pattern.compile("word"); //Finding the word required
                Matcher tagMatcher = tagFinder.matcher(lineByLine);
                while(tagMatcher.find())
                {
                    score++;
                }
                scoreTracker.add(score);
                    score = 0;
            }   
}

我的样本输入包含6行，出现的word是[0,1,0,3,0,0]因此，当我打印scoreTracker （这是ArrayList ）时，我想要上面的输出。 但是，相反，我得到了[4,4,4,4,4,4] word的总出现次数，但不是逐行出现。 请帮助。

Answer 1

lineByLine指向文件的全部内容。 这就是得到[4,4,4,4,4,4]的原因。 您需要将每一行存储在另一个变量line ，然后使用tagFinder.find(line) 。 最终代码将如下所示

while ((lineContents = tempFileReader.readLine()) != null)
{
    String lineByLine = lineContents.replaceAll("/\\.", System.getProperty("line.separator")); //for matching /. and replacing it by new line
    changer.write(lineByLine);
    Pattern pattern = Pattern.compile(".*\\r?\\n"); //Find new line
    Matcher matcher = pattern.matcher(lineByLine);
    while(matcher.find())
    {
        Pattern tagFinder = Pattern.compile("word"); //Finding the word required
        //matcher.group() returns the input subsequence matched by the previous match.
        Matcher tagMatcher = tagFinder.matcher(matcher.group());
        while(tagMatcher.find())
        {
            score++;
        }
        scoreTracker.add(score);
            score = 0;
    }   
}

Answer 2

也许这段代码可以帮助您：

    String str = "word word\n \n word word\n \n word\n";
    Pattern pattern = Pattern.compile("(.*)\\r?\\n"); //Find new line
    Matcher matcher = pattern.matcher(str);
    while(matcher.find())
    {
        Pattern tagFinder = Pattern.compile("word"); //Finding the word required
        Matcher tagMatcher = tagFinder.matcher(matcher.group());
        int score = 0;
        while(tagMatcher.find())
        {
            score++;
        }
        System.out.print(score + " ");
    }

输出为2 0 2 0 1它不是高度优化的，但是您的问题是您从未限制内部匹配，而是始终扫描整个行。

Answer 3

这是因为每次您搜索相同的字符串（lineByLine）。 您可能想要的是分别搜索每一行。 我建议你这样做：

    Pattern tagFinder = Pattern.compile("word"); //Finding the word required
    for(String line : lineByLine.split("\\n")
    {
        Matcher tagMatcher = tagFinder.matcher(line);
        while(tagMatcher.find())
            score++;
        scoreTracker.add(score);
        score = 0;
    }

Answer 4

原始代码是使用tempFileReader.readLine()一次读取输入的一行，然后使用matcher在每一行中查找行尾。 由于lineContents仅包含一行，因此matcher永远不会找到新行，因此其余代码将被跳过。 为什么需要两个不同的代码位才能将输入分成几行？ 您可以删除与查找新行有关的代码之一。 例如

while ((lineContents = tempFileReader.readLine()) != null)
{
      Pattern tagFinder = Pattern.compile("word"); //Finding the word required
      Matcher tagMatcher = tagFinder.matcher(lineContents);
      while(tagMatcher.find())
      {
          score++;
      }
      scoreTracker.add(score);
      score = 0;

}

我已经在Windows上使用BufferedReader读取的文件test.txt尝试了上面的代码。 例如

BufferedReader tempFileReader = new BufferedReader(new FileReader("c:\\test\\test.txt"));

scoreTracker包含具有您描述的内容的文件的[0，1，0，3，0，0]。 如果样本输入是如上所述的实际文件，而tempFileReader是BufferedReader那么我不明白您是如何从原始代码中得到[4,4,4,4,4,4,4]的。 查看用于设置tempFileReader的代码将很有用。

Answer 5

您可以使用Scanner类。 您将扫描程序初始化为要计数的字符串，然后仅计算扫描程序发现的令牌数量。

您可以直接使用FileInputStream初始化Scanner。

结果代码只有9行：

File file = new File(fileName);
Scanner scanner = new Scanner(file);
scanner.useDelimiter("your text here");
int occurences;
while(scanner.hasNext()){
     scanner.next();
     occurences++;
}
scanner.close();

计算文件中特定字符串的出现

问题描述

5 个解决方案

解决方案1
3 已采纳 2012-03-13 18:35:33

解决方案2
1 2012-03-13 18:36:03

解决方案3
1 2012-03-13 18:43:52

解决方案4
1 2012-03-13 18:51:44

解决方案5
0 2012-03-13 18:38:36

计算文件中特定字符串的出现

问题描述

5 个解决方案

解决方案1 3 已采纳 2012-03-13 18:35:33

解决方案2 1 2012-03-13 18:36:03

解决方案3 1 2012-03-13 18:43:52

解决方案4 1 2012-03-13 18:51:44

解决方案5 0 2012-03-13 18:38:36

解决方案1
3 已采纳 2012-03-13 18:35:33

解决方案2
1 2012-03-13 18:36:03

解决方案3
1 2012-03-13 18:43:52

解决方案4
1 2012-03-13 18:51:44

解决方案5
0 2012-03-13 18:38:36