I have a lengthy Python list and would like to count the number of occurrences of a single character. For example, how many total times does 'o' occur? I want N=4.
lexicon = ['yuo', 'want', 'to', 'sioo', 'D6', 'bUk', 'lUk'], etc.
list.count() is the obvious solution. However, it consistently returns 0. It doesn't matter which character I look for. I have double checked my file - the characters I am searching for are definitely there. I happen to be calculating count() in a for loop:
for i in range(100):
# random sample 500 words
sample = list(set(random.sample(lexicon, 500)))
C1 = ['k']
total = sum(len(i) for i in sample) # total words
sample_count_C1 = sample.count(C1) / total
But it returns 0 outside of the for loop, over the list 'lexicon' as well. I don't want a list of overall counts so I don't think Counter will work.
Ideas?
If we take your list
(the shortened version you supplied):
lexicon = ['yu', 'want', 'to', 'si', 'D6', 'bUk', 'lUk']
then we can get the count using sum()
and a generator-expression
:
count = sum(s.count(c) for s in lexicon)
so if c
were, say, 'k'
this would give 2
as there are two occurances of k
.
This will work in a for-loop
or not, so you should be able to incorporate this into your wider code by yourself.
With your latest edit, I can confirm that this produces a count of 4
for 'o'
in your modified list.
If I understand your question correctly, you would like to count the number of occurrences of each character for each word in the list. This is known as a frequency distribution.
Here is a simple implementation using Counter
from collections import Counter
lexicon = ['yu', 'want', 'to', 'si', 'D6', 'bUk', 'lUk']
chars = [char for word in lexicon for char in word]
freq_dist = Counter(chars)
Counter({'t': 2, 'U': 2, 'k': 2, 'a': 1, 'u': 1, 'l': 1, 'i': 1, 'y': 1, 'D': 1, '6': 1, 'b': 1, 's': 1, 'w': 1, 'n': 1, 'o': 1})
Using freq_dist
, you can return the number of occurrences for a character.
freq_dist.get('a')
1
# get() method returns None if character is not in dict
freq_dist.get('4')
None
It's giving zero because sample.count('K')
will matches k
as a string. It will not consider buk
or luk
. If u want to calculate frequency of character go like this
for i in range(100):
# random sample 500 words
sample = list(set(random.sample(lexicon, 500)))
C1 = ['k']
total = sum(len(i) for i in sample) # total words
sample_count=sum([x.count(C1) for x in sample])
sample_count_C1 = sampl_count / total
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.