简体   繁体   中英

How can i store the frequency of the tags in any website page in Hashmap?

I am using a paired Hashmap in which i am storing the tags and its frequency but i am confused that how can i store the frequency in a variable. Code is as follows :

package z;
import java.awt.List;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Collector;
import org.jsoup.select.Elements;
import org.jsoup.select.Evaluator;
import org.jsoup.nodes.Element;


public class crawler {

    static String url="";

        public static void main(String[] args) {
            int val=0;
            String URL = "http://stackoverflow.com/";
            HashMap<Integer, String> myMap = new HashMap<Integer, String>();
            myMap.clear();  
            try {
                Document document = Jsoup.connect(URL).get();
                ArrayList<String> tags = new ArrayList<String>();

                System.out.println("Number of tags by select(\"*\") method =" + document.select("*").size());
                for(Element e : document.getAllElements()){
                    tags.add(e.tagName().toLowerCase());
                    myMap.put(val,tags.toString());
                    val++;
                }
                System.out.println("The tags = " + tags);
                System.out.println("Distinct tags = " + new HashSet<String>(tags));
                System.out.println("Distinct tags = " + myMap);
            } catch (IOException e) {
               System.out.println(e);
            }



    }


}

How can i increment the value of val so that i can store the frequency of all tags? Do we need more than one variable?

I'd suggest to use tag as key, not frequency. So your loop will be like this

String tagN;
for(Element e : document.getAllElements()){
    tagN = tagName().toLowerCase();
    val = 1;
    if(tags.contains(tagN){
        val+ = tags.get(tagN);
    } 
    tags.put(tagN, val);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM