简体   繁体   中英

Find count of characters within the string in Python

I am trying to create a dictionary of word and number of times it is repeating in string. Say suppose if string is like below

str1 = "aabbaba"

I want to create a dictionary like this

word_count = {'a':4,'b':3}

I am trying to use dictionary comprehension to do this. I did

dic = {x:dic[x]+1 if x in dic.keys() else x:1 for x in str}

This ends up giving an error saying

  File "<stdin>", line 1
    dic = {x:dic[x]+1 if x in dic.keys() else x:1 for x in str}
                                               ^
SyntaxError: invalid syntax

Can anybody tell me what's wrong with the syntax? Also,How can I create such a dictionary using dictionary comprehension?

Ideal way to do this is via using collections.Counter :

>>> from collections import Counter
>>> str1 = "aabbaba"
>>> Counter(str1)
Counter({'a': 4, 'b': 3})

You can not achieve this via simple dict comprehension expression as you will require reference to your previous value of count of element. As mentioned in Dawg's answer , as a work around you may use list.count(e) in order to find count of each element from the set of string within you dict comprehension expression. But time complexity will be n*m as it will traverse the complete string for each unique element (where m are uniques elements), where as with counter it will be n .

As others have said, this is best done with a Counter.

You can also do:

>>> {e:str1.count(e) for e in set(str1)}
{'a': 4, 'b': 3}

But that traverses the string 1+n times for each unique character (once to create the set, and once for each unique letter to count the number of times it appears. ie, This has quadratic runtime complexity.). Bad result if you have a lot of unique characters in a long string... A Counter only traverses the string once.

If you want no import version that is more efficient than using .count , you can use .setdefault to make a counter:

>>> count={}
>>> for c in str1:
...    count[c]=count.setdefault(c, 0)+1
... 
>>> count
{'a': 4, 'b': 3}

That only traverses the string once no matter how long or how many unique characters.


You can also use defaultdict if you prefer:

>>> from collections import defaultdict
>>> count=defaultdict(int)
>>> for c in str1:
...    count[c]+=1
... 
>>> count
defaultdict(<type 'int'>, {'a': 4, 'b': 3})
>>> dict(count)
{'a': 4, 'b': 3}

But if you are going to import collections -- Use a Counter!

This is a nice case for collections.Counter :

>>> from collections import Counter
>>> Counter(str1)
Counter({'a': 4, 'b': 3})

It's dict subclass so you can work with the object similarly to standard dictionary:

>>> c = Counter(str1)
>>> c['a']
4

You can do this without use of Counter class as well. The simple and efficient python code for this would be:

>>> d = {}
>>> for x in str1:
...     d[x] = d.get(x, 0) + 1
... 
>>> d
{'a': 4, 'b': 3}

Note that this is not the correct way to do it since it won't count repeated characters more than once (apart from losing other characters from the original dict) but this answers the original question of whether if-else is possible in comprehensions and demonstrates how it can be done.

To answer your question, yes it's possible but the approach is like this:

dic = {x: (dic[x] + 1 if x in dic else 1) for x in str1}

The condition is applied on the value only not on the key:value mapping.

The above can be made clearer using dict.get :

dic = {x: dic.get(x, 0) + 1 for x in str1}

0 is returned if x is not in dic .

Demo:

In [78]: s = "abcde"

In [79]: dic = {}

In [80]: dic = {x: (dic[x] + 1 if x in dic else 1) for x in s}

In [81]: dic 
Out[81]: {'a': 1, 'b': 1, 'c': 1, 'd': 1, 'e': 1}

In [82]: s = "abfg"

In [83]: dic = {x: dic.get(x, 0) + 1 for x in s}

In [84]: dic
Out[84]: {'a': 2, 'b': 2, 'f': 1, 'g': 1}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM