简体   繁体   中英

HashMap String and Count number of times each word is used

The question below is in Java

Sample data : https://tartarus.org/martin/PorterStemmer/output.txt

I have a tokenizationString String array that contain words that similar to the list above with many duplicated words.

I have to conver that string array into a hashmap and then use the hashmap to count the number of times each word is used (count the duplicated value in the string array but i have to use hashmap related method) .

I am thinking of doing in this way

Map<Integer, String> hashMap = new HashMap<Integer, String>();    
            for(int i = 0 ; i < tokenizationString.length; i++)
                {
                   hashMap.put(i, tokenizationString[i]);

                }

After that I will have to sort the string array by # of time they are used.

In the end I want to be able to print out the result like:

the "was used" 502 "times"
i "was used" 50342 "times"
apple "was used" 50 "times"

Firstly, your map should be like Map<String, Integer> (string and its frequency). I am giving you the Java 8 stream solution.

    public static void main(String[] args) {
    try (Stream<String> lines = Files.lines(Paths.get("out.txt"))) {
        Map<String, Long> frequency = lines
                .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
                .entrySet()
                .stream()
                .sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
                .collect(Collectors.toMap(
                        Map.Entry::getKey,
                        Map.Entry::getValue,
                        (o, n) -> o,
                        LinkedHashMap::new
                ));

    } catch (IOException e) {
        e.printStackTrace();
    }
}

Above code will read from file line by line. Then collect as a frequency map. Then again convert them into stream of entrySet . Then sort the stream based on the value in reverse order. Lastly collect them as a LinkedHashMap . LinkedHashMap because it will maintain the insersion order. Take look at Java 8 Stream API.

Instead of

hashMap.put(i, tokenizationString[i]);

first check if the word is already present, and then increment the corresponding entry:

int count = hashMap.containsKey(tokenizationString[i]) ? hashMap.get(tokenizationString[i]) : 0;
hashMap.put(tokenizationString[i], count + 1);

you can achieve this by Google Gauva library 's MultiMap class as below. Also find the working example at this link - https://gist.github.com/dkalawadia/8d06fba1c2c87dd94ab3e803dff619b0

FileInputStream fstream = null;
    BufferedReader br = null;
    try {
        fstream = new FileInputStream("C:\\temp\\output.txt");
         br = new BufferedReader(new InputStreamReader(fstream));

        String strLine;

        Multimap<String, String> multimap = ArrayListMultimap.create();
        // Read File Line By Line
        while ((strLine = br.readLine()) != null) {
            multimap.put(strLine, strLine);
        }

        for (String key : multimap.keySet()) {
            System.out.println(key + "was used " + multimap.get(key).size() + "times");
        }

    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        if (fstream != null) {
            fstream.close();
        }
        if(br!=null){
            br.close();
        }
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM