简体   繁体   中英

Read in number of words from file and count amount of unique words

I am supposed to read in from a file and count the number of words total and then count the number of unique words so like for ex " I am Happy" has 3 unique words...

I tried doing this with a HashMap but I get a error when running, and I don't think I was supposed to use a hashmap for this example. Is there a way to read in from a file and count the number of unique words with just array's and ArrayList's? Error :Exception in thread "main" java.lang.NullPointerException

Here's my code using hash maps that doesn't work:

public static void main(String[]args)throws IOException{
    Scanner in = new Scanner(new File ("Lincoln.txt"));
    int totalWords = 0;
    
 

    while( in.hasNext()){
        String word = in.next();
        String[] spaces = word.split(" ");
        String[] comma = word.split(",");

        totalWords++;
    }
    System.out.println("The number of words are " + totalWords);



    Map<String,Integer> words = new HashMap<String,Integer>();
    countWords("D:\\Desktop\\CPS\\Lab11\\Lincoln.txt",words);
    in.close();

}
public static void countWords(String filename,Map<String,Integer>words)throws FileNotFoundException{
    Scanner file = new Scanner(new File(filename));
    while(file.hasNext()){
        String word = file.next();
        int count = words.get(word);
        
        if(count != 0){
            count++;
        }
        else{
            count =1;
            words.put(word,count);
        }
      
    }
    file.close();
}

Is there a way to read in from a file and count the number of unique characters with just array's and ArrayList's?

Your question is confusing. First you talk of words, then you hop over to characters. Which one is it?

counting unique characters with an array is possible, if we go back to the late 80s and think we live in a world where only ASCII characters exist.

counting unique words with an array or arraylist, or counting unique characters in a unicode world, is... not practical in the slightest, effectively impossible (you can of course do it - but only by using those lists to handroll a crappy implementation of hashmaps or writing an excruciatingly inefficient algorithm to do it).

So let's just assume that you are, in fact, meant to use maps for this.

There are a bunch of code style issues with this code (such as you repeating Lincoln.txt, once relative, and once as an absolute path), and your 'the number of words are' counter is also broken, in that you split on space (useless; scanner already does that) and commas (useful), but then do absolutely nothing with the result of these split operations. Presumably you want totalWords += comma.length perhaps. Or just get rid of that aspect entirely, and define 'a word' as 'stuff separated by spaces', and forget about commas. If you don't want to forget about commas, you'd want to update the delimiter for your scanner and tell the scanner that words are things between spaces OR commas ( scanner.useDelimiter("[ ,]+") - that's regexp for: a delimiter is any sequence of 1 or more [either a space or a comma]).

But the bug is this line:

int count = words.get(word);

words starts out empty, that means initially, words.get(word) is asking the map for the value associated with a key that is not yet in the map. The get method returns null in such a case. You then assign it to a primitive which cannot hold nulls, so java will 'auto unbox' your value, by invoking .intValue() on the thing words.get(word) returns. This then causes the NullPointerException you observe, because doing .foo to a null pointer does that. What you really wanted was: "Hey, words map? Please give me the Integer object associated with the key word , but if you dont have a mapping for this in the first place, then don't return null, instead, can you return 0? Thanks!".

Which is possible and easy:

int count = words.getOrDefault(word, 0);

Note that you then write '1' in the map if it wasn't there yet, but do nothing if it was ( count++ is not going to change the map; java is pass by value everywhere. That count you get from calling words.get(word) ? It is a copy. Modifying it does nothing to that map, you have to re-put the updated value.

If you want you can do the whole thing in a single merge, but that's getting into lambdas which is probably beyond your current level.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM