简体   繁体   中英

Calculating Word Frequency Using StreamTokenizer () , HashMap() , HashSet(). in Java Core

import java.io.*;
import java.util.*;
class A {
    public static void main(String args[]) throws Exception {
        Console con = System.console();
        String str;
        int i=0;

        HashMap map = new HashMap();
        HashSet set = new HashSet();

        System.out.println("Enter File Name : ");
        str = con.readLine();
        File f = new File(str);
        f.createNewFile();

        FileInputStream fis = new FileInputStream(str);
        StreamTokenizer st = new StreamTokenizer(fis);
        while(st.nextToken()!=StreamTokenizer.TT_EOF) {
         String s;

         switch(st.ttype) {
             case StreamTokenizer.TT_NUMBER:  s = st.nval+"";
             break;
              case StreamTokenizer.TT_WORD:    s = st.sval;
               break;
                default: s = ""+((char)st.ttype);
            }


            map.put(i+"",s);
            set.add(s);
            i++;
        }

        Iterator iter = set.iterator();
        System.out.println("Frequency Of Words :");
        while(iter.hasNext()) {
          String word;
          int count=0;
          word=(String)iter.next();

            for(int j=0; j<i ; j++) {
             String word2;
              word2=(String)map.get(j+"");
               if(word.equals(word2))
                count++;
            }
            System.out.println(" WORD : "+ word+" = "+count);
        }
        System.out.println("Total Words In Files: "+i);
    }
}

In This code First I have already created a text file which contains the following data :

@ Hello Hii World # * c++ java salesforce

And the output of this code is :

**Frequency Of Words :

WORD : # = 1

WORD : @ = 1

WORD : c = 1

WORD : salesforce = 1

WORD : * = 1

WORD : Hii = 1

WORD : + = 2

WORD : java = 1

WORD : World = 1

WORD : Hello = 1

Total Words In Files: 11**

where i am unable to find why this shows c++ as a seperate words . I want to combine c++ as a single word as in the output

You can do it in this way

    // Create the file at path specified in the String str
    // ...
    HashMap<String, Integer> map = new HashMap<>();
    InputStream fis = new FileInputStream(str);
    Reader bufferedReader = new BufferedReader(new InputStreamReader(fis));

    StreamTokenizer st = new StreamTokenizer(bufferedReader);
    st.wordChars('+', '+');
    while(st.nextToken() != StreamTokenizer.TT_EOF) {
        String s;

        switch(st.ttype) {
            case StreamTokenizer.TT_NUMBER:
                s = String.valueOf(st.nval);
                break;
            case StreamTokenizer.TT_WORD:
                s = st.sval;
                break;
            default:
                s = String.valueOf((char)st.ttype);
        }
        Integer val = map.get(s);
        if(val == null)
            val = 1;
        else
            val++;
        map.put(s, val);
    }

    Set<String> keySet = map.keySet();
    Iterator<String> iter = keySet.iterator();
    System.out.println("Frequency Of Words :");
    int sum = 0;
    while(iter.hasNext()) {
        String word = iter.next();
        int count = map.get(word);
        sum += count;
        System.out.println(" WORD : " + word + " = " + count);
    }
    System.out.println("Total Words In Files: " + sum);

Note that I've updated your code using Generics instead of the raw version of HashMap and Iterator. Moreover, the constructor you used for StreamTokenizer was deprecated. The use of both map and set was useless because you can iterate over the key set of the map using .keySet() method. The map now goes from String (the word) to Integer (the number of word count).

Anyway, regarding the example you did, I think that a simple split method would have been more appropriate.

For further information about the wordChars method of StreamTokenizer you can give a look at #wordChars(int, int)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM