简体   繁体   中英

How to correctly identify words when reading from a file with java Scanner?

I'm trying to do an exercise where I need to create a class to read the words from a .txt put the words in an HashSet. The thing is, if the text read "I am Daniel, Daniel I am." I'll have a word for "am" , "am." and "Daniel," and "Daniel". How do I fix this?

Here's my code. (I tried to use regex, but I'm getting an exception):

import java.io.File;
import java.io.FileNotFoundException;
import java.util.HashSet;
import java.util.Scanner;

public class WordCount {

    public static void main(String[] args) {
        try {
            File file = new File(args[0]);
            HashSet<String> set = readFromFile(file);
            set.forEach(word -> System.out.println(word));
        }
        catch(FileNotFoundException e) {
            System.err.println("File Not Found!");
        }

    }

    private static HashSet<String> readFromFile(File file) throws FileNotFoundException {
        HashSet<String> set = new HashSet<String>();
        Scanner scanner = new Scanner(file);
        while(scanner.hasNext()) {
            String s = scanner.next("[a-zA-Z]");
            set.add(s.toUpperCase());
        }
        scanner.close();
        return set;
    }


}

Error is thrown when the Scanner try to read a string not matching with the regex.

String s = scanner.next("[a-zA-Z]");

Instead of passing the regex in the Scanner. Read the word and remove the special characters as shown below.

String s = scanner.next();
s = s.replaceAll("[^a-zA-Z]", "");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM