简体   繁体   中英

How to find the most common string(s) in a Python list?

I am dealing with ancient DNA data. I have an array with n different base pair calls for a given coordinate.

eg, ['A','A','C','C','G']

I need to setup a bit in my script whereby the most frequent call(s) are identified. If there is one, it should use that one. If there are two (or three) that are tied (eg, A and C here), I need it randomly pick one of the two.

I have been looking for a solution but cannot find anything satisfactory. The most frequent solution, I see is Counter, but Counter is useless for me as c.most_common(1) will not identify that 1 and 2 are tied.

You can get the maximum count from the mapping returned by Counter with the max function first, and then ues a list comprehension to output only the keys whose counts equal the maximum count. Since Counter , max , and list comprehension all cost linear time, the overall time complexity of the code can be kept at O(n) :

from collections import Counter
import random
lst = ['A','A','C','C','G']
counts = Counter(lst)
greatest = max(counts.values())
print(random.choice([item for item, count in counts.items() if count == greatest]))

This outputs either A or C .

Something like this would work:

import random

string = ['A','A','C','C','G']

dct = {}

for x in set(string):
    dct[x] = string.count(x)

max_value = max(dct.values())

lst = []

for key, value in dct.items():
    if value == max_value:
        lst.append(key)

print(random.choice(lst))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM