How to find the most common string(s) in a Python list?

Question

I am dealing with ancient DNA data. I have an array with n different base pair calls for a given coordinate.

eg, ['A','A','C','C','G']

I need to setup a bit in my script whereby the most frequent call(s) are identified. If there is one, it should use that one. If there are two (or three) that are tied (eg, A and C here), I need it randomly pick one of the two.

I have been looking for a solution but cannot find anything satisfactory. The most frequent solution, I see is Counter, but Counter is useless for me as c.most_common(1) will not identify that 1 and 2 are tied.

Answer 1

You can get the maximum count from the mapping returned by Counter with the max function first, and then ues a list comprehension to output only the keys whose counts equal the maximum count. Since Counter , max , and list comprehension all cost linear time, the overall time complexity of the code can be kept at O(n) :

from collections import Counter
import random
lst = ['A','A','C','C','G']
counts = Counter(lst)
greatest = max(counts.values())
print(random.choice([item for item, count in counts.items() if count == greatest]))

This outputs either A or C .

Answer 2

Something like this would work:

import random

string = ['A','A','C','C','G']

dct = {}

for x in set(string):
    dct[x] = string.count(x)

max_value = max(dct.values())

lst = []

for key, value in dct.items():
    if value == max_value:
        lst.append(key)

print(random.choice(lst))

How to find the most common string(s) in a Python list?

Question

2 answers

solution1
1 2019-08-30 20:57:34

solution2
0 2019-08-30 20:35:17

How to find the most common string(s) in a Python list?

Question

2 answers

solution1 1 2019-08-30 20:57:34

solution2 0 2019-08-30 20:35:17

solution1
1 2019-08-30 20:57:34

solution2
0 2019-08-30 20:35:17