简体   繁体   中英

How to count unique words in a text file?

I have implemented code to count number of: - chars - words - lines - bytes in text file. But how to count dictionary size: number of different words used in this file? Also, how to implement iterator which can iterate over only letters? (Ignore whitespaces)

public class wc {
    public static void main(String[] args) throws IOException {
    //counters
        int charsCount = 0;
        int wordsCount = 0;
        int linesCount = 0;

        Scanner in = null;
        File file = new File("Sample.txt");

        try(Scanner scanner = new Scanner(new BufferedReader(new FileReader(file)))){

            while (scanner.hasNextLine()) {

                String tmpStr = scanner.nextLine();
                if (!tmpStr.equalsIgnoreCase("")) {
                    String replaceAll = tmpStr.replaceAll("\\s+", "");
                    charsCount += replaceAll.length();
                    wordsCount += tmpStr.split("\\s+").length;
                }
                ++linesCount;
            }

        System.out.println("# of chars: " + charsCount);
        System.out.println("# of words: " + wordsCount);
        System.out.println("# of lines: " + linesCount);
        System.out.println("# of bytes: " + file.length());

        }
    }
}

To get unique words and their counts:
1. Split your obtained line from file into a string array
2. Store the contents of this string array in a Hashset
3. Repeat steps 1 and 2 till end of file
4. Get unique words and their count from the Hashset

I prefer posting logic and pseudo code as it will help OP to learn something by solving posted problem.

hey @JeyKey you can use HashMap. Here I using Iterator too. You can check out this code.

    public class CountUniqueWords {

    public static void main(String args[]) throws FileNotFoundException { 

    File f = new File("File Name");
    ArrayList arr=new ArrayList();
    HashMap<String, Integer> listOfWords = new HashMap<String, Integer>(); 
    Scanner in = new Scanner(f);
    int i=0;
    while(in.hasNext())
    {
    String s=in.next();
    //System.out.println(s);
    arr.add(s);
    }
    Iterator itr=arr.iterator();
    while(itr.hasNext())
    {i++;

        listOfWords.put((String) itr.next(), i);
        //System.out.println(listOfWords);    //for Printing the words 
     }

    Set<Object> uniqueValues = new HashSet<Object>(listOfWords.values()); 

    System.out.println("The number of unique words: "+uniqueValues.size());
    }
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM