简体   繁体   English

优化字符串中的计数字符

[英]Optimizing counting characters within a string

I just created a simple method to count occurences of each character within a string, without taking caps into account. 我刚刚创建了一个简单的方法来计算字符串中每个字符的出现次数,而不考虑上限。

static List<int> charactercount(string input)
        {
            char[] characters = "abcdefghijklmnopqrstuvwxyz".ToCharArray();
            input = input.ToLower();

            List<int> counts = new List<int>();
            foreach (char c in characters)
            {
                int count = 0;
                foreach (char c2 in input) if (c2 == c)
                    {
                        count++;
                    }

                counts.Add(count);
             }

            return counts;

        }

Is there a cleaner way to do this (ie without creating a character array to hold every character in the alphabet) that would also take into account numbers, other characters, caps, etc? 有没有更简洁的方法来做到这一点(即没有创建一个字符数组来保存字母表中的每个字符),这也会考虑数字,其他字符,大写等?

Conceptually, I would prefer to return a Dictionary<string,int> of counts. 从概念上讲,我更愿意返回计数的Dictionary<string,int> I'll assume that it's ok to know by omission rather than an explicit count of 0 that a character occurs zero times, you can do it via LINQ. 我假设可以通过省略而不是显式计数0知道一个字符出现零次,你可以通过LINQ来做。 @Oded's given you a good start on how to do that. @Oded给你一个良好的开端,如何做到这一点。 All you would need to do is replace the Select() with ToDictionary( k => k.Key, v => v.Count() ) . 您需要做的就是用ToDictionary( k => k.Key, v => v.Count() )替换Select() ToDictionary( k => k.Key, v => v.Count() ) See my comment on his answer about doing the case insensitive grouping. 请参阅我对他关于进行不区分大小写分组的回答的评论。 Note: you should decide if you care about cultural differences in characters or not and adjust the ToLower method accordingly. 注意:您应该决定是否关心字符的文化差异,并相应地调整ToLower方法。

You can also do this without LINQ; 你也可以不用LINQ做到这一点;

public static Dictionary<string,int> CountCharacters(string input)
{
     var counts = new Dictionary<char,int>(StringComparer.OrdinalIgnoreCase);

     foreach (var c in input)
     {
          int count = 0;
          if (counts.ContainsKey(c))
          {
              count = counts[c];
          }
          counts[c] = counts + 1;
     }

     return counts;
}

Note if you wanted a Dictionary<char,int> , you could easily do that by creating a case invariant character comparer and using that as the IEqualityComparer<T> for a dictionary of the required type. 注意,如果你想要一个Dictionary<char,int> ,你可以通过创建一个case不变字符比较器并将其作为IEqualityComparer<T>用于所需类型的字典来轻松完成。 I've used string for simplicity in the example. 我在示例中使用了string来简化。

Again, adjust the type of the comparer to be consistent with how you want to handle culture. 同样,调整比较器的类型以与您希望处理文化的方式一致。

Using GroupBy and Select : 使用GroupBySelect

aString.GroupBy(c => c).Select(g => new { Character = g.Key, Num = g.Count() })

The returned anonymous type list will contain each character and the number of times it appears in the string. 返回的匿名类型列表将包含每个字符及其在字符串中出现的次数。

You can then filter it in any way you wish, using the static methods defined on Char . 然后,您可以使用Char定义的静态方法以任何方式过滤它。

Your code is kind of slow because you are looping through the range az instead of just looping through the input. 您的代码有点慢,因为您循环遍历范围az而不是仅仅循环输入。

If you only need to count letters (like your code suggests), the fastest way to do it would be: 如果您只需要计算字母(如代码所示),最快的方法是:

int[] CountCharacters(string text)
{
    var counts = new int[26];

    for (var i = 0; i < text.Length; i++)
    {
        var charIndex - text[index] - (int)'a';
        counts[charIndex] = counts[charindex] + 1;
    }

    return counts;
}  

Note that you need to add some thing like verify the character is in the range, and convert it to lowercase when needed, or this code might throw exceptions. 请注意,您需要添加一些内容,例如验证字符是否在范围内,并在需要时将其转换为小写,否则此代码可能会抛出异常。 I'll leave those for you to add. 我会留下那些给你补充的。 :) :)

Based on +Ran's answer to avoiding IndexOutOfRangeException : 基于+ Ran的回答来避免IndexOutOfRangeException

static readonly int differ = 'a';
int[] CountCharacters(string text) {
    text = text.ToLower();
    var counts = new int[26];

    for (var i = 0; i < text.Length; i++) {
        var charIndex = text[i] - differ;
        // to counting chars between 'a' and 'z' we have to do this:
        if(charIndex >= 0 && charIndex < 26)
            counts[charIndex] += 1;
    }
    return counts;
}

Actually using Dictionary and/or LINQ is not optimized enough as counting chars and working with a low level array. 实际上使用Dictionary和/或LINQ并不足以优化计数字符和使用低级数组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM