[英]Java - Counting words, lines, and characters from a file
I'm trying to read in words from a file. 我正在尝试从文件中读取单词。 I need to count the words, lines, and characters in the text file. 我需要计算文本文件中的单词,行和字符。 The word count should only include words (containing only alphabetic letters, no punctuation, spaces, or non-alphabetic characters). 字数统计应仅包括字(仅包含字母,标点,空格或非字母字符)。 The character count should only include the characters inside those words. 字符数应仅包括这些单词内的字符。
This is what I have so far. 到目前为止,这就是我所拥有的。 I'm unsure of how to count the characters. 我不确定如何计算字符。 Every time I run the program, it jumps to the catch mechanism as soon as I enter the file name (and it should have no issues with the file path, as I've tried using it before). 每次我运行该程序时,只要输入文件名,它就会跳到catch机制(并且文件路径应该没有问题,就像我之前尝试过的那样)。 I tried to create the program without the try/catch to see what the error was, but it wouldn't work without it. 我尝试在没有try / catch的情况下创建程序,以查看错误是什么,但是如果没有它,它将无法正常工作。
Why is it jumping to the catch function when I enter the file name? 输入文件名时为什么跳到catch功能? How can I fix this program to properly count words, lines, and characters in the text file? 如何修复此程序以正确计算文本文件中的单词,行和字符?
I don't get any exception with your code if I give a proper file name. 如果输入正确的文件名,我的代码不会有任何异常。 As for reading the number of character, you should modify the logic a little bit. 至于读取字符数,您应该稍微修改一下逻辑。 Instead of directly concatenating the number of words count, you should create a new instance of StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+");
您应该创建一个StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+");
的新实例StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+");
而不是直接串联单词数量的计数StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+");
and iterate through all the token and sum the length of each token. 并遍历所有令牌并求和每个令牌的长度。 This should give you the number of characters. 这应该为您提供字符数。 Something like below 像下面这样
while (fileScan.hasNextLine()) {
lineC++;
tempo = fileScan.nextLine();
StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+");
wordC += st.countTokens();
while(st.hasMoreTokens()) {
String stt = st.nextToken();
System.out.println(stt); // Displaying string to confirm that like is splitted as I expect it to be
charC += stt.length();
}
System.out.println("Lines: " + lineC + "\nWords: " + wordC+" \nChars: "+charC);
}
Note: Escaping character with StringTokenizer
will not work. 注意:使用StringTokenizer
转义字符将不起作用。 ie you would expect that \\\\s
should delimit with any whitespace character but it will instead delimit based on literal character s
. 也就是说,您希望\\\\s
应该用任何空格字符定界,但它将改为根据文字字符s
定界。 If you want to escape a character, I suggest you to use java.util.Pattern
and java.util.Matcher
and use it matcher.find()
to idenfity words and characters 如果要转义字符,建议您使用java.util.Pattern
和java.util.Matcher
并将其使用matcher.find()
识别单词和字符
I tried your code but I didn't receive any exception here. 我尝试了您的代码,但这里没有收到任何异常。 However, I suspect that when you input the file name, maybe you forgot the extension of the file. 但是,我怀疑当您输入文件名时,也许您忘记了文件的扩展名。
You probably forgot the file extension while giving input, but there is a much simpler way of doing this. 您可能在输入时忘记了文件扩展名,但是有一种更简单的方法。 You also mention you don't know how to count the characters. 您还提到您不知道如何计算字符。 You can try something like this: 您可以尝试如下操作:
import java.util.Scanner;
import java.util.StringTokenizer;
import java.io.*;
import java.util.stream.*;
public class WordCount
{
public static void main(String[] args)
{
Scanner userInput = new Scanner(System.in);
try {
// Input file
System.out.println("Please enter the name of the file.");
String content = Files.readString(Path.of("C:/Users/garre/OneDrive/Desktop/" + userInput.next()));
System.out.printf("Lines: %d\nWords: %d\nCharacters: %d",content.split("\n").length,Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count(),content.length());
}
catch (IOException ex1) {
System.out.println("Error.");
System.exit(0);
}
}
}
import java.util.stream.*;
Note we use the streams package, for filtering out empty strings while finding words. 注意,我们使用streams包,用于在查找单词时过滤掉空字符串。 Now let's skip forward a bit. 现在,让我们跳过一些。
String content = Files.readString(Path.of("C:/Users/garre/OneDrive/Desktop/" + userInput.next()));
The above part gets all of the text in the file and stores it as a string. 上面的部分获取文件中的所有文本并将其存储为字符串。
System.out.printf("Lines: %d\nWords: %d\nCharacters: %d",content.split("\n").length,Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count(),content.length());
Okay, this is a long line. 好吧,这是一条很长的线。 Let's break it down. 让我们分解一下。
"Lines: %d\\nWords: %d\\nCharacters: %d"
is a format string, where each %d
is replaced with the corresponding argument in the printf
function. "Lines: %d\\nWords: %d\\nCharacters: %d"
是格式字符串,其中每个%d
都用printf
函数中的相应参数替换。 The first %d
will be replaced by content.split("\\n").length
, which is the number of lines. 第一个%d
将替换为content.split("\\n").length
,即行数。 We get the number of lines by splitting the string. 我们通过分割字符串来获得行数。
The second %d
is replaced by Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count()
. 第二个%d
替换为Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count()
。 Stream.of
creates a stream from an array, and the array is an array of strings after you split on anything that is non-alphabetic (you said words are anything that are non-alphabetic). Stream.of
从数组创建流,并且在拆分非字母顺序的任何内容(您说单词是非字母顺序的任何内容)之后,数组是字符串的数组。 Next, we filter all the empty values out, since String.split
keeps in empty values. 接下来,我们将所有空值过滤掉,因为String.split
保持为空值。 The .count()
is self-explanatory, takes the amount of words left after filtering. .count()
是不言自明的,接受过滤后剩下的单词数量。
The third and last %d
is the simplest. 第三个也是最后一个%d
最简单。 It is replaced by the length of the string. 它由字符串的长度代替。 content.length()
should be self-explanatory. content.length()
应该是不言自明的。
I left your catch
block intact, but I feel like the System.exit(0)
is a bit redundant. 我保留了您的catch
块,但我觉得System.exit(0)
有点多余。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.