简体   繁体   中英

how to create a triple using map

I want to build an inverted index in java. I have cran data of 1400 text files. I was able to count the frequency of each term/word. I have been able to return the number times a word appears in the entire collection, but I have not been able to create a triple (t,d,f) where t =term, d=doc, f=frequency. This is the code I have so far:

I want the output in the following form

term1: doc1:2, 
term2: doc2:3, 
term1: doc3:1 

here term is a word in a doc file and doc 1:2 means term1 appears in doc 1 2 times

 public static void main(String[]args) throws FileNotFoundException{ Map<String, Integer> m = new HashMap<>(); String wrd; for(int i=1;i<=2;i++){ //FileInputStream tdfr = new FileInputStream("D:\\\\logs\\\\steem"+i+".txt"); Scanner tdsc=new Scanner(new File("D:\\\\logs\\\\steem"+i+".txt")); while(tdsc.hasNext()){ // m.clear(); Integer docid=i; wrd=tdsc.next(); //Vector<Integer> vPosList = p.hPosList.get(wrd); Integer freq=m.get(wrd); //Integer doc=m1.get(i); //System.out.println(m.get(wrd)); m.put(wrd, (freq == null) ? 1 : freq + 1); } System.out.println(m.size() + " distinct words" + " steem" +i); System.out.println("Doc" +i+""+m); //System.out.println("Doc"+i+""+m1); m.clear(); tdsc.close(); } //System.out.println(m.size() + " distinct words"); //System.out.println(m); // System.out.println(m1); } } 

public static void main(String[]args) throws FileNotFoundException{
        Map<String, Integer> m = new HashMap<>();

        String wrd;

        for(int i=1;i<=2;i++){
           //FileInputStream tdfr = new FileInputStream("D:\\logs\\steem"+i+".txt");
           Scanner tdsc=new Scanner(new File("D:\\logs\\steem"+i+".txt"));
           while(tdsc.hasNext()){
              // m.clear();
              Integer docid=i;

               wrd=tdsc.next();
               //Vector<Integer> vPosList = p.hPosList.get(wrd);
               Integer freq=m.get(wrd);

               //Integer doc=m1.get(i);
              //System.out.println(m.get(wrd));
               m.put(wrd, (freq == null) ? 1 : freq + 1);





           }



          System.out.println(m.size() + " distinct words" + " steem" +i);
          System.out.println("Doc" +i+""+m);
          //System.out.println("Doc"+i+""+m1);
          m.clear();


        tdsc.close();

    }
        //System.out.println(m.size() + " distinct words");
        //System.out.println(m);
       // System.out.println(m1);

}
}

What about storing it as List<Map<String,Integer>> ? Create a new Map for each doc, mapping the term with it's frequency.

  List<Map<String, Integer>> list = new ArrayList<>();
    Map<String, Integer> map;
    String word;
    //Iterate over documents
    for (int i = 1; i <= 2; i++) {
        map = new HashedMap<>();
        Scanner tdsc = new Scanner(new File("D:\\logs\\steem" + i + ".txt"));
        //Iterate over words
        while (tdsc.hasNext()) {
            word = tdsc.next();
            final Integer freq = map.get(word);
            if (freq == null) {
                map.put(word, 1);
            } else {
                map.put(word, map.get(word) + 1);
            }
        }
        list.add(map);
    }

    //Print result
    int documentNumber = 0;
    for (Map<String, Integer> document : list) {
        for (Map.Entry<String, Integer> entry : document.entrySet()) {
            System.out.println(entry.getKey() + ":doc"+documentNumber+":" + entry.getValue());
        }
        documentNumber++;
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM