简体   繁体   English

如何在Python列表中找到最常见的字符串?

[英]How to find the most common string(s) in a Python list?

I am dealing with ancient DNA data. 我正在处理古代DNA数据。 I have an array with n different base pair calls for a given coordinate. 我有一个给定坐标的n个不同碱基对的数组。

eg, ['A','A','C','C','G'] 例如,['A','A','C','C','G']

I need to setup a bit in my script whereby the most frequent call(s) are identified. 我需要在脚本中进行一些设置,从而确定最频繁的呼叫。 If there is one, it should use that one. 如果有一个,应该使用那个。 If there are two (or three) that are tied (eg, A and C here), I need it randomly pick one of the two. 如果有两个(或三个)并列(例如,此处为A和C),我需要它随机选择两者中的一个。

I have been looking for a solution but cannot find anything satisfactory. 我一直在寻找解决方案,但找不到满意的解决方案。 The most frequent solution, I see is Counter, but Counter is useless for me as c.most_common(1) will not identify that 1 and 2 are tied. 我看到的最常见的解决方案是Counter,但是Counter对我来说毫无用处,因为c.most_common(1)不会标识1和2是绑定的。

You can get the maximum count from the mapping returned by Counter with the max function first, and then ues a list comprehension to output only the keys whose counts equal the maximum count. 您可以先从Counter使用max函数返回的映射中获取最大计数,然后使用列表推导仅输出计数等于最大计数的键。 Since Counter , max , and list comprehension all cost linear time, the overall time complexity of the code can be kept at O(n) : 由于Countermax和list理解都花费线性时间,因此代码的总体时间复杂度可以保持为O(n)

from collections import Counter
import random
lst = ['A','A','C','C','G']
counts = Counter(lst)
greatest = max(counts.values())
print(random.choice([item for item, count in counts.items() if count == greatest]))

This outputs either A or C . 这将输出AC

Something like this would work: 这样的事情会起作用:

import random

string = ['A','A','C','C','G']

dct = {}

for x in set(string):
    dct[x] = string.count(x)

max_value = max(dct.values())

lst = []

for key, value in dct.items():
    if value == max_value:
        lst.append(key)

print(random.choice(lst))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM