简体   繁体   English

查找字符串数组中字符的频率

[英]Find frequency of a character in an array of Strings

Given an array of Strings, find the frequency of occurrence of a particular character. 给定一个字符串数组,找到特定字符的出现频率。

eg. 例如。 Given array {"hon","bhig","zzz","hello"} and character 'h', the output is 3. 给定数组{“hon”,“bhig”,“zzz”,“hello”}和字符“h”,输出为3。

Here's how I solved it: Approach 1: Iterate through every string in the array, increment a counter every time that character occurs in the current String. 以下是我解决它的方法:方法1:遍历数组中的每个字符串,每次在当前字符串中出现该字符时递增计数器。 Run time is O(n), where n is the cumulative length of all strings in the array. 运行时间为O(n),其中n是数组中所有字符串的累积长度。

Approach 2: This can be optimized using a HashMap; 方法2:这可以使用HashMap进行优化; this is particularly helpful if the strings are repeated in the array. 如果字符串在数组中重复,这将特别有用。 Here's what I did: take a HashMap where key = string and value = number of times that string occurs in the array. 这就是我所做的:取一个HashMap,其中key = string和value =字符串在数组中出现的次数。 Put all strings in the given array into the HashMap along with their counts. 将给定数组中的所有字符串及其计数放入HashMap中。 Then iterate over each key-value pair in the HashMap, count the number of times the given character appears in the key(string) and increment it by its corresponding value in the HashMap. 然后遍历HashMap中的每个键值对,计算给定字符在键(字符串)中出现的次数,并在HashMap中将其增加相应的值。

My question is: Is there a better way to do this? 我的问题是:有更好的方法吗?

Here's the code: 这是代码:

NOTE: PLEASE READ THE ENTIRE ACCEPTED ANSWER. 注意:请阅读整个接受的答案。

public static int findFreq(String[] arr,char c) {
    Map<String,Integer> map  = new HashMap<String,Integer>();
    for(int i=0;i<arr.length;i++) {
        if(map.containsKey(arr[i])) 
            map.put(arr[i],map.get(arr[i])+1);
        else
            map.put(arr[i], 1);
    }
    int freq=0;
    for(Entry<String,Integer> entr:map.entrySet()) {
        String s = entr.getKey();
        for(int i=0;i<s.length();i++) {
            if(s.charAt(i)==c)
                freq += entr.getValue();
        }
    }
    return freq;
}

Sorry, I think Approach 2 slows things down. 对不起,我认为方法2减慢了速度。 In order to add each string to the HashMap , the method computes the hash code, which looks at every character in the string. 为了将每个字符串添加到HashMap ,该方法计算哈希码,该哈希码查看字符串中的每个字符。 So setting up the HashMap already looks at every character in every string, which takes as long as what you'd have to do with approach 1, plus then you have to make another pass through the map. 因此,设置HashMap已经查看了每个字符串中的每个字符,这与您使用方法1所需的时间一样长,然后您必须再次通过地图。

Approach 1 is preferable here. 方法1在此是优选的。 The cost is O(N) to either of them in the worst case. 在最坏的情况下,成本是其中任何一个的O(N) The second approach using HashMap<String> for remembering old visited string (with inherent hashing cost) would not bring improvement to performance worthy to be mentioned. 使用HashMap<String>来记住旧的访问字符串(具有固有的散列成本)的第二种方法不会带来值得提及的性能改进。 We should avoid premature optimization, as approach 1 is simpler . 我们应该避免过早优化,因为approach 1 更简单

Approach 2 is not very optimised, what you should really do is create a Map<Character,Integer> then you don't the second loop to count but you need to then loop each character in each String. 方法2不是很优化,你应该做的是创建一个Map<Character,Integer>然后你不要计算第二个循环,但是你需要循环每个String中的每个字符。

Approach 1, depending on your implementation also only counts for each character occurring in the String, does it consider if the character occurs twice, eg "hash" ? 方法1,取决于你的实现也只计算字符串中出现的每个字符,它是否考虑字符是否出现两次,例如"hash"

Either approach needs to compare EACH character in EACH String and then count 这两种方法都需要比较EACH字符串中的每个字符然后计数

This is how approach 2 should be 这就是方法2应该如何

public static int findFreq(String[] arr,char c) {
    Map<Character,Integer> map  = new HashMap<Character,Integer>();
    for(int i=0;i<arr.length;i++) {
        for(Character ch : arr[i].toCharArray()){
            if(map.containsKey(ch)) 
                map.put(ch,map.get(ch)+1);
            else
                map.put(ch, 1);
        }
    }
    return map.get(Character.valueOf(c));
 }

Either way both approaches will be O(n), from the docs for HashMap 无论哪种方式,两种方法都是O(n),来自HashMap文档

This implementation provides constant-time performance for the basic operations (get and put) 此实现为基本操作提供恒定时间性能(get和put)

But that said even with the approach I provided above this requires additional get when populating the map. 但是,即使使用我上面提供的方法,这也需要在填充地图时额外get

So Approach 1 is better if using for a single search, if using repeatedly then approach 2 is the way to go (but populate the map outside the method) 因此,如果使用单个搜索,方法1会更好,如果反复使用,那么方法2就是要去的方法(但是在方法之外填充地图)

Some metrics for you: 一些指标适合您:

Number of Words  |    Array (approach 1)   |   Map (My approach 2)  |  Map (your approach 2)
                 |       (time in ms)      |     (time in ms)       |      (time in ms) 
                 |     (groovy)/(java)     |     (groovy)/(java)    |     (groovy)/(java)     
-------------------------------------------------------------------------------------------
      43303      |         118 /  5        |         229 / 34       |             / 16     
     417221      |         852 / 10        |        1088 / 120      |             / 49
    2086705      |        2929 / 45        |        5064 / 731      |             / 219

I retract my method, it appears your Map approach is faster! 我收回了我的方法,看来你的Map方法更快!

This was my array method (in case yours differs) 这是我的数组方法(如果你的方法不同)

private static int findFreqArray(String[] arr, char c){
    int count = 0;
    for(int i=0;i<arr.length;i++) {
        for(char ch : arr[i].toCharArray()){
            if(ch == c)
                count++;
        }
    }
    return count;  
}

Not necessarily. 不必要。 Yet another possibility would be to "flatten" your array into a single string and search for a single character in it (fast the same as your variant 1). 另一种可能性是将数组“扁平”为单个字符串并在其中搜索单个字符(快速与变体1相同)。 This would maybe speed thinks a little bit, but it would not necessarily make the code "better". 这可能会加速思考一下,但这并不一定会使代码“更好”。 Example for a char search in a string can be found in this SO answer . 可以在此SO答案中找到字符串中字符搜索的示例。

No, you'll never do better than O(n) for just one search. 不,只有一次搜索,你永远不会比O(n)做得更好。 But if you're going to be searching many times against the same array, for different characters, you could start by running through the array and building a hash map from each character to its number of occurrences. 但是,如果你要针对同一个数组多次搜索,对于不同的字符,你可以首先运行数组并从每个字符构建一个哈希映射到它的出现次数。 Then, for each search, you just have to do a simple constant-time look-up, not an O(n) search. 然后,对于每次搜索,您只需要进行简单的恒定时间查找,而不是O(n)搜索。

Hashmap is even more slow than the first one. Hashmap比第一个慢得多。 Both algorithms needs to pass from each character once, so both needs O(n) time. 两种算法都需要从每个字符传递一次,因此两者都需要O(n)时间。 But the first one is much simpler, and fewer lines of code would be executed. 但第一个更简单,并且将执行更少的代码行。

Nice try though :) 不错的尝试:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM