简体   繁体   中英

Adding all occurrences of a string

I am stuck on a section of code. In this code I am trying to get all the occurrences of characters(amino) in a string(protein). There are two letters( ['M','L'] ) that I need to find in the string. When I use .count I get 1 for "M" and 10 for "L". The problem is I cannot find the right way to add both of the counts from both letters together to get 11.

protein = "MSRSLLLRFLLFLLLLPPLP"
aa = ['M', 'L']
    
    for aminos in aa:
        if aminos in protein:
            protein.count(aminos)

There are many ways to do this. The following is one of the many.

from collections import Counter

protein = "MSRSLLLRFLLFLLLLPPLP"
aminos = ['M', 'L']

# Count occurrences of all characters
amino_counter = Counter(protein)
total_count = 0

# Only consider the counts of aminos that matter
for amino in aminos:
    total_count += amino_counter.get(amino, 0)

print(total_count)

There are lots of ways to do this. For example,

  1. You could keep track of the total count
total = 0
for aminos in aa:
    # No need to check if aminos in protein because .count() returns 0 if that's the case
    total += protein.count(aminos)
  1. You could write a generator expression and use sum() to add up all the values for the count() of each amino in aa .
total = sum(protein.count(amino) for amino in aa)
  1. You could iterate over the protein and check if each character is in aa . But first, convert aa to a set to make membership-checks less expensive.
s_aa = set(aa)
total = sum(p in s_aa for p in protein) 

This works because p in s_aa evaluates to True if p is in s_aa , and False otherwise. True counts as one, False counts as zero, so when you sum a bunch of True/False values, you get the number of True values.

  1. Count all characters in protein , then sum the counts for the ones you care about:
counts = {}
for p in protein:
    ct = counts.get(p, 0) # get counts[p], default to 0 if not exists
    counts[p] = ct + 1

total = sum(counts.get(amino, 0) for amino in aa)

Vignesh's collections.Counter technique is the same as this approach. It's better than Hamza's approach for counting the elements because it iterates over the protein string just once instead of once for each element of aa . This is also the reason my third or fourth approaches are better than #1 and #2.

Easiest way might be:

sum(protein.count(a) for a in aa)

You can also get indivdual counts as:

all_counts = {a:protein.count(a) for a in aa}

results in: {'M': 1, 'L': 10}

Which you can further sum if you only need the total count:

sum(all_counts.values())

which results in: 11

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM