简体   繁体   English

在Python中查找字符串中的字符数

[英]Find count of characters within the string in Python

I am trying to create a dictionary of word and number of times it is repeating in string. 我正在尝试创建一个单词字典和它在字符串中重复的次数。 Say suppose if string is like below 假设字符串如下所示

str1 = "aabbaba"

I want to create a dictionary like this 我想创建一个这样的字典

word_count = {'a':4,'b':3}

I am trying to use dictionary comprehension to do this. 我正在尝试使用字典理解来做到这一点。 I did 我做到了

dic = {x:dic[x]+1 if x in dic.keys() else x:1 for x in str}

This ends up giving an error saying 这最终会给出一个错误说法

  File "<stdin>", line 1
    dic = {x:dic[x]+1 if x in dic.keys() else x:1 for x in str}
                                               ^
SyntaxError: invalid syntax

Can anybody tell me what's wrong with the syntax? 谁能告诉我语法有什么问题? Also,How can I create such a dictionary using dictionary comprehension? 另外,如何使用字典理解创建这样的字典?

Ideal way to do this is via using collections.Counter : 这样做的理想方法是使用collections.Counter

>>> from collections import Counter
>>> str1 = "aabbaba"
>>> Counter(str1)
Counter({'a': 4, 'b': 3})

You can not achieve this via simple dict comprehension expression as you will require reference to your previous value of count of element. 您无法通过简单的dict理解表达式实现此目的,因为您需要引用先前的元素计数值。 As mentioned in Dawg's answer , as a work around you may use list.count(e) in order to find count of each element from the set of string within you dict comprehension expression. 正如Dawg的回答中所提到 ,作为一个解决方法你可以使用list.count(e)来查找你在dict理解表达式中的字符串set中每个元素的计数。 But time complexity will be n*m as it will traverse the complete string for each unique element (where m are uniques elements), where as with counter it will be n . 但是时间复杂度将是n*m ,因为它将遍历每个唯一元素的完整字符串(其中m是唯一元素),其中与计数器一样,它将是n

As others have said, this is best done with a Counter. 正如其他人所说,这最好用一个柜台来完成。

You can also do: 你也可以这样做:

>>> {e:str1.count(e) for e in set(str1)}
{'a': 4, 'b': 3}

But that traverses the string 1+n times for each unique character (once to create the set, and once for each unique letter to count the number of times it appears. ie, This has quadratic runtime complexity.). 但是,对于每个唯一字符遍历字符串1 + n次(一次创建集合,并且每个唯一字母一次计算它出现的次数。即,这具有二次运行时复杂性。)。 Bad result if you have a lot of unique characters in a long string... A Counter only traverses the string once. 如果长字符串中有很多唯一字符,则结果不好...计数器只遍历字符串一次。

If you want no import version that is more efficient than using .count , you can use .setdefault to make a counter: 如果你不想要导入的版本比使用更高效的.count ,你可以使用.setdefault提出抗辩:

>>> count={}
>>> for c in str1:
...    count[c]=count.setdefault(c, 0)+1
... 
>>> count
{'a': 4, 'b': 3}

That only traverses the string once no matter how long or how many unique characters. 无论多长时间或多少个唯一字符,它只会遍历字符串一次。


You can also use defaultdict if you prefer: 如果您愿意,也可以使用defaultdict

>>> from collections import defaultdict
>>> count=defaultdict(int)
>>> for c in str1:
...    count[c]+=1
... 
>>> count
defaultdict(<type 'int'>, {'a': 4, 'b': 3})
>>> dict(count)
{'a': 4, 'b': 3}

But if you are going to import collections -- Use a Counter! 但是如果要导入集合 - 使用计数器!

This is a nice case for collections.Counter : 这是collections.Counter一个很好的案例:

>>> from collections import Counter
>>> Counter(str1)
Counter({'a': 4, 'b': 3})

It's dict subclass so you can work with the object similarly to standard dictionary: 它是dict子类,因此您可以使用类似于标准字典的对象:

>>> c = Counter(str1)
>>> c['a']
4

You can do this without use of Counter class as well. 您也可以在不使用Counter类的情况下执行此操作。 The simple and efficient python code for this would be: 这个简单而有效的python代码是:

>>> d = {}
>>> for x in str1:
...     d[x] = d.get(x, 0) + 1
... 
>>> d
{'a': 4, 'b': 3}

Note that this is not the correct way to do it since it won't count repeated characters more than once (apart from losing other characters from the original dict) but this answers the original question of whether if-else is possible in comprehensions and demonstrates how it can be done. 请注意,这不是正确的方法,因为它不会多次计算重复的字符(除了丢失原始字典中的其他字符),但这回答了原理问题,即在理解中是否有可能并且演示怎么做

To answer your question, yes it's possible but the approach is like this: 要回答你的问题,是的,这是可能的,但方法是这样的:

dic = {x: (dic[x] + 1 if x in dic else 1) for x in str1}

The condition is applied on the value only not on the key:value mapping. 条件仅应用于值:值映射。

The above can be made clearer using dict.get : 使用dict.get可以使上面的内容更清晰:

dic = {x: dic.get(x, 0) + 1 for x in str1}

0 is returned if x is not in dic . 如果x不在dic则返回0。

Demo: 演示:

In [78]: s = "abcde"

In [79]: dic = {}

In [80]: dic = {x: (dic[x] + 1 if x in dic else 1) for x in s}

In [81]: dic 
Out[81]: {'a': 1, 'b': 1, 'c': 1, 'd': 1, 'e': 1}

In [82]: s = "abfg"

In [83]: dic = {x: dic.get(x, 0) + 1 for x in s}

In [84]: dic
Out[84]: {'a': 2, 'b': 2, 'f': 1, 'g': 1}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM