简体   繁体   English

Java-计算文件中的单词,行和字符

[英]Java - Counting words, lines, and characters from a file

I'm trying to read in words from a file. 我正在尝试从文件中读取单词。 I need to count the words, lines, and characters in the text file. 我需要计算文本文件中的单词,行和字符。 The word count should only include words (containing only alphabetic letters, no punctuation, spaces, or non-alphabetic characters). 字数统计应仅包括字(仅包含字母,标点,空格或非字母字符)。 The character count should only include the characters inside those words. 字符数应仅包括这些单词内的字符。

This is what I have so far. 到目前为止,这就是我所拥有的。 I'm unsure of how to count the characters. 我不确定如何计算字符。 Every time I run the program, it jumps to the catch mechanism as soon as I enter the file name (and it should have no issues with the file path, as I've tried using it before). 每次我运行该程序时,只要输入文件名,它就会跳到catch机制(并且文件路径应该没有问题,就像我之前尝试过的那样)。 I tried to create the program without the try/catch to see what the error was, but it wouldn't work without it. 我尝试在没有try / catch的情况下创建程序,以查看错误是什么,但是如果没有它,它将无法正常工作。

Why is it jumping to the catch function when I enter the file name? 输入文件名时为什么跳到catch功能? How can I fix this program to properly count words, lines, and characters in the text file? 如何修复此程序以正确计算文本文件中的单词,行和字符?

I don't get any exception with your code if I give a proper file name. 如果输入正确的文件名,我的代码不会有任何异常。 As for reading the number of character, you should modify the logic a little bit. 至于读取字符数,您应该稍微修改一下逻辑。 Instead of directly concatenating the number of words count, you should create a new instance of StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+"); 您应该创建一个StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+");的新实例StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+");而不是直接串联单词数量的计数StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+"); and iterate through all the token and sum the length of each token. 并遍历所有令牌并求和每个令牌的长度。 This should give you the number of characters. 这应该为您提供字符数。 Something like below 像下面这样

while (fileScan.hasNextLine()) {
            lineC++;
            tempo = fileScan.nextLine();
            StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+");
            wordC += st.countTokens();
            while(st.hasMoreTokens()) {
                String stt = st.nextToken();
                System.out.println(stt); // Displaying string to confirm that like is splitted as I expect it to be
                charC += stt.length();
            }
            System.out.println("Lines: " + lineC + "\nWords: " + wordC+" \nChars: "+charC);
        }

Note: Escaping character with StringTokenizer will not work. 注意:使用StringTokenizer转义字符将不起作用。 ie you would expect that \\\\s should delimit with any whitespace character but it will instead delimit based on literal character s . 也就是说,您希望\\\\s应该用任何空格字符定界,但它将改为根据文字字符s定界。 If you want to escape a character, I suggest you to use java.util.Pattern and java.util.Matcher and use it matcher.find() to idenfity words and characters 如果要转义字符,建议您使用java.util.Patternjava.util.Matcher并将其使用matcher.find()识别单词和字符

I tried your code but I didn't receive any exception here. 我尝试了您的代码,但这里没有收到任何异常。 However, I suspect that when you input the file name, maybe you forgot the extension of the file. 但是,我怀疑当您输入文件名时,也许您忘记了文件的扩展名。

You probably forgot the file extension while giving input, but there is a much simpler way of doing this. 您可能在输入时忘记了文件扩展名,但是有一种更简单的方法。 You also mention you don't know how to count the characters. 您还提到您不知道如何计算字符。 You can try something like this: 您可以尝试如下操作:

import java.util.Scanner;
import java.util.StringTokenizer;
import java.io.*;
import java.util.stream.*;

public class WordCount
{
    public static void main(String[] args)
    {
        Scanner userInput = new Scanner(System.in);

       try {
            // Input file
            System.out.println("Please enter the name of the file.");
            String content = Files.readString(Path.of("C:/Users/garre/OneDrive/Desktop/" + userInput.next()));
            System.out.printf("Lines: %d\nWords: %d\nCharacters: %d",content.split("\n").length,Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count(),content.length());
            }


        catch (IOException ex1) {
            System.out.println("Error.");
            System.exit(0);
        }
    }
}

Going through the code 遍历代码

import java.util.stream.*;

Note we use the streams package, for filtering out empty strings while finding words. 注意,我们使用streams包,用于在查找单词时过滤掉空字符串。 Now let's skip forward a bit. 现在,让我们跳过一些。

String content = Files.readString(Path.of("C:/Users/garre/OneDrive/Desktop/" + userInput.next()));

The above part gets all of the text in the file and stores it as a string. 上面的部分获取文件中的所有文本并将其存储为字符串。

System.out.printf("Lines: %d\nWords: %d\nCharacters: %d",content.split("\n").length,Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count(),content.length());

Okay, this is a long line. 好吧,这是一条很长的线。 Let's break it down. 让我们分解一下。

"Lines: %d\\nWords: %d\\nCharacters: %d" is a format string, where each %d is replaced with the corresponding argument in the printf function. "Lines: %d\\nWords: %d\\nCharacters: %d"是格式字符串,其中每个%d都用printf函数中的相应参数替换。 The first %d will be replaced by content.split("\\n").length , which is the number of lines. 第一个%d将替换为content.split("\\n").length ,即行数。 We get the number of lines by splitting the string. 我们通过分割字符串来获得行数。

The second %d is replaced by Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count() . 第二个%d替换为Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count() Stream.of creates a stream from an array, and the array is an array of strings after you split on anything that is non-alphabetic (you said words are anything that are non-alphabetic). Stream.of从数组创建流,并且在拆分非字母顺序的任何内容(您说单词是非字母顺序的任何内容)之后,数组是字符串的数组。 Next, we filter all the empty values out, since String.split keeps in empty values. 接下来,我们将所有空值过滤掉,因为String.split保持为空值。 The .count() is self-explanatory, takes the amount of words left after filtering. .count()是不言自明的,接受过滤后剩下的单词数量。

The third and last %d is the simplest. 第三个也是最后一个%d最简单。 It is replaced by the length of the string. 它由字符串的长度代替。 content.length() should be self-explanatory. content.length()应该是不言自明的。

I left your catch block intact, but I feel like the System.exit(0) is a bit redundant. 我保留了您的catch块,但我觉得System.exit(0)有点多余。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM