Find frequency of a character in an array of Strings

Question

Given an array of Strings, find the frequency of occurrence of a particular character.

eg. Given array {"hon","bhig","zzz","hello"} and character 'h', the output is 3.

Here's how I solved it: Approach 1: Iterate through every string in the array, increment a counter every time that character occurs in the current String. Run time is O(n), where n is the cumulative length of all strings in the array.

Approach 2: This can be optimized using a HashMap; this is particularly helpful if the strings are repeated in the array. Here's what I did: take a HashMap where key = string and value = number of times that string occurs in the array. Put all strings in the given array into the HashMap along with their counts. Then iterate over each key-value pair in the HashMap, count the number of times the given character appears in the key(string) and increment it by its corresponding value in the HashMap.

My question is: Is there a better way to do this?

Here's the code:

NOTE: PLEASE READ THE ENTIRE ACCEPTED ANSWER.

public static int findFreq(String[] arr,char c) {
    Map<String,Integer> map  = new HashMap<String,Integer>();
    for(int i=0;i<arr.length;i++) {
        if(map.containsKey(arr[i])) 
            map.put(arr[i],map.get(arr[i])+1);
        else
            map.put(arr[i], 1);
    }
    int freq=0;
    for(Entry<String,Integer> entr:map.entrySet()) {
        String s = entr.getKey();
        for(int i=0;i<s.length();i++) {
            if(s.charAt(i)==c)
                freq += entr.getValue();
        }
    }
    return freq;
}

Answer 1

Sorry, I think Approach 2 slows things down. In order to add each string to the HashMap , the method computes the hash code, which looks at every character in the string. So setting up the HashMap already looks at every character in every string, which takes as long as what you'd have to do with approach 1, plus then you have to make another pass through the map.

Answer 2

Approach 1 is preferable here. The cost is O(N) to either of them in the worst case. The second approach using HashMap<String> for remembering old visited string (with inherent hashing cost) would not bring improvement to performance worthy to be mentioned. We should avoid premature optimization, as approach 1 is simpler .

Answer 3

Approach 2 is not very optimised, what you should really do is create a Map<Character,Integer> then you don't the second loop to count but you need to then loop each character in each String.

Approach 1, depending on your implementation also only counts for each character occurring in the String, does it consider if the character occurs twice, eg "hash" ?

Either approach needs to compare EACH character in EACH String and then count

This is how approach 2 should be

public static int findFreq(String[] arr,char c) {
    Map<Character,Integer> map  = new HashMap<Character,Integer>();
    for(int i=0;i<arr.length;i++) {
        for(Character ch : arr[i].toCharArray()){
            if(map.containsKey(ch)) 
                map.put(ch,map.get(ch)+1);
            else
                map.put(ch, 1);
        }
    }
    return map.get(Character.valueOf(c));
 }

Either way both approaches will be O(n), from the docs for HashMap

This implementation provides constant-time performance for the basic operations (get and put)

But that said even with the approach I provided above this requires additional get when populating the map.

So Approach 1 is better if using for a single search, if using repeatedly then approach 2 is the way to go (but populate the map outside the method)

Some metrics for you:

Number of Words  |    Array (approach 1)   |   Map (My approach 2)  |  Map (your approach 2)
                 |       (time in ms)      |     (time in ms)       |      (time in ms) 
                 |     (groovy)/(java)     |     (groovy)/(java)    |     (groovy)/(java)     
-------------------------------------------------------------------------------------------
      43303      |         118 /  5        |         229 / 34       |             / 16     
     417221      |         852 / 10        |        1088 / 120      |             / 49
    2086705      |        2929 / 45        |        5064 / 731      |             / 219

I retract my method, it appears your Map approach is faster!

This was my array method (in case yours differs)

private static int findFreqArray(String[] arr, char c){
    int count = 0;
    for(int i=0;i<arr.length;i++) {
        for(char ch : arr[i].toCharArray()){
            if(ch == c)
                count++;
        }
    }
    return count;  
}

Answer 4

Not necessarily. Yet another possibility would be to "flatten" your array into a single string and search for a single character in it (fast the same as your variant 1). This would maybe speed thinks a little bit, but it would not necessarily make the code "better". Example for a char search in a string can be found in this SO answer .

Answer 5

No, you'll never do better than O(n) for just one search. But if you're going to be searching many times against the same array, for different characters, you could start by running through the array and building a hash map from each character to its number of occurrences. Then, for each search, you just have to do a simple constant-time look-up, not an O(n) search.

Answer 6

Hashmap is even more slow than the first one. Both algorithms needs to pass from each character once, so both needs O(n) time. But the first one is much simpler, and fewer lines of code would be executed.

Nice try though :)

Find frequency of a character in an array of Strings

Question

6 answers

solution1
3 2013-10-16 21:37:55

solution2
2 2013-10-16 21:37:16

solution3
2 ACCPTED 2013-10-16 21:42:45

solution4
1 2013-10-16 21:36:28

solution5
1 2013-10-16 21:43:50

solution6
1 2013-10-16 21:45:26

Find frequency of a character in an array of Strings

Question

6 answers

solution1 3 2013-10-16 21:37:55

solution2 2 2013-10-16 21:37:16

solution3 2 ACCPTED 2013-10-16 21:42:45

solution4 1 2013-10-16 21:36:28

solution5 1 2013-10-16 21:43:50

solution6 1 2013-10-16 21:45:26

solution1
3 2013-10-16 21:37:55

solution2
2 2013-10-16 21:37:16

solution3
2 ACCPTED 2013-10-16 21:42:45

solution4
1 2013-10-16 21:36:28

solution5
1 2013-10-16 21:43:50

solution6
1 2013-10-16 21:45:26