简体   繁体   中英

Java - Counting words, lines, and characters from a file

I'm trying to read in words from a file. I need to count the words, lines, and characters in the text file. The word count should only include words (containing only alphabetic letters, no punctuation, spaces, or non-alphabetic characters). The character count should only include the characters inside those words.

This is what I have so far. I'm unsure of how to count the characters. Every time I run the program, it jumps to the catch mechanism as soon as I enter the file name (and it should have no issues with the file path, as I've tried using it before). I tried to create the program without the try/catch to see what the error was, but it wouldn't work without it.

Why is it jumping to the catch function when I enter the file name? How can I fix this program to properly count words, lines, and characters in the text file?

I don't get any exception with your code if I give a proper file name. As for reading the number of character, you should modify the logic a little bit. Instead of directly concatenating the number of words count, you should create a new instance of StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+"); and iterate through all the token and sum the length of each token. This should give you the number of characters. Something like below

while (fileScan.hasNextLine()) {
            lineC++;
            tempo = fileScan.nextLine();
            StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+");
            wordC += st.countTokens();
            while(st.hasMoreTokens()) {
                String stt = st.nextToken();
                System.out.println(stt); // Displaying string to confirm that like is splitted as I expect it to be
                charC += stt.length();
            }
            System.out.println("Lines: " + lineC + "\nWords: " + wordC+" \nChars: "+charC);
        }

Note: Escaping character with StringTokenizer will not work. ie you would expect that \\\\s should delimit with any whitespace character but it will instead delimit based on literal character s . If you want to escape a character, I suggest you to use java.util.Pattern and java.util.Matcher and use it matcher.find() to idenfity words and characters

I tried your code but I didn't receive any exception here. However, I suspect that when you input the file name, maybe you forgot the extension of the file.

You probably forgot the file extension while giving input, but there is a much simpler way of doing this. You also mention you don't know how to count the characters. You can try something like this:

import java.util.Scanner;
import java.util.StringTokenizer;
import java.io.*;
import java.util.stream.*;

public class WordCount
{
    public static void main(String[] args)
    {
        Scanner userInput = new Scanner(System.in);

       try {
            // Input file
            System.out.println("Please enter the name of the file.");
            String content = Files.readString(Path.of("C:/Users/garre/OneDrive/Desktop/" + userInput.next()));
            System.out.printf("Lines: %d\nWords: %d\nCharacters: %d",content.split("\n").length,Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count(),content.length());
            }


        catch (IOException ex1) {
            System.out.println("Error.");
            System.exit(0);
        }
    }
}

Going through the code

import java.util.stream.*;

Note we use the streams package, for filtering out empty strings while finding words. Now let's skip forward a bit.

String content = Files.readString(Path.of("C:/Users/garre/OneDrive/Desktop/" + userInput.next()));

The above part gets all of the text in the file and stores it as a string.

System.out.printf("Lines: %d\nWords: %d\nCharacters: %d",content.split("\n").length,Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count(),content.length());

Okay, this is a long line. Let's break it down.

"Lines: %d\\nWords: %d\\nCharacters: %d" is a format string, where each %d is replaced with the corresponding argument in the printf function. The first %d will be replaced by content.split("\\n").length , which is the number of lines. We get the number of lines by splitting the string.

The second %d is replaced by Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count() . Stream.of creates a stream from an array, and the array is an array of strings after you split on anything that is non-alphabetic (you said words are anything that are non-alphabetic). Next, we filter all the empty values out, since String.split keeps in empty values. The .count() is self-explanatory, takes the amount of words left after filtering.

The third and last %d is the simplest. It is replaced by the length of the string. content.length() should be self-explanatory.

I left your catch block intact, but I feel like the System.exit(0) is a bit redundant.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM