简体   繁体   English

如何在字符串列表中找到主要字母

[英]How do I find the predominant letters in a list of strings

I want to check for each position in the string what is the character that appears most often on that position. 我想检查字符串中的每个位置,什么是最常出现在该位置的字符。 If there are more of the same frequency, keep the first one. 如果有更多相同的频率,请保留第一个。 All strings in the list are guaranteed to be of identical length!!! 列表中的所有字符串都保证长度相同!!!

I tried the following way: 我尝试了以下方法:

print(max(((letter, strings.count(letter)) for letter in strings), key=lambda x:[1])[0])

But I get: mistul or qagic 但我得到: mistulqagic

And I can not figure out what's wrong with my code. 而且我无法弄清楚我的代码出了什么问题。

My list of strings looks like this: 我的字符串列表如下所示:

Input: strings = ['mistul', 'aidteh', 'mhfjtr', 'zxcjer'] 输入: strings = ['mistul', 'aidteh', 'mhfjtr', 'zxcjer']

Output: mister 输出: mister

Explanation: On the first position, m appears twice. 说明:在第一个位置, m出现两次。 Second, i appears twice twice. 其次, 两次出现两次。 Third, there is no predominant character, so we chose the first, that is, s . 第三,没有主要字符,因此我们选择第一个字符s On the fourth position, we have t twice and j twice, but you see first t , so we stay with him, on the fifth position we have e twice and the last r twice. 在第四个位置上,我们有两次t,两次是j ,但是您看到第一个t ,所以我们和他在一起,在第五个位置,我们两次拥有e ,最后一个r两次。

Another examples: 另一个例子:

Input: ['qagic', 'cafbk', 'twggl', 'kaqtc', 'iisih', 'mbpzu', 'pbghn', 'mzsev', 'saqbl', 'myead'] 输入: ['qagic', 'cafbk', 'twggl', 'kaqtc', 'iisih', 'mbpzu', 'pbghn', 'mzsev', 'saqbl', 'myead']

Output: magic 输出: magic

Input: ['sacbkt', 'tnqaex', 'vhcrhl', 'obotnq', 'vevleg', 'rljnlv', 'jdcjrk', 'zuwtee', 'xycbvm', 'szgczt', 'imhepi', 'febybq', 'pqkdfg', 'swwlds', 'ecmrut', 'buwruy', 'icjwet', 'gebgbq', 'djtfzr', 'uenleo'] 输入: ['sacbkt', 'tnqaex', 'vhcrhl', 'obotnq', 'vevleg', 'rljnlv', 'jdcjrk', 'zuwtee', 'xycbvm', 'szgczt', 'imhepi', 'febybq', 'pqkdfg', 'swwlds', 'ecmrut', 'buwruy', 'icjwet', 'gebgbq', 'djtfzr', 'uenleo']

Expected Output: secret 预期输出: secret

Some help? 一些帮助?

Finally a use case for zip() :-) 最后是zip()的用例:-)

If you like cryptic code, it could even be done in one statement: 如果你喜欢神秘的代码,它甚至可以在一个语句来完成:

def solve(strings):
    return ''.join([max([(letter, letters.count(letter)) for letter in letters], key=lambda x: x[1])[0] for letters in zip(*strings)])

But I prefer a more readable version: 但我更喜欢可读性更高的版本:

def solve(strings):
    result = ''
    # "zip" the strings, so in the first iteration `letters` would be a list
    # containing the first letter of each word, the second iteration it would
    # be a list of all second letters of each word, and so on...
    for letters in zip(*strings):
        # Create a list of (letter, count) pairs:
        letter_counts = [(letter, letters.count(letter)) for letter in letters]
        # Get the first letter with the highest count, and append it to result:
        result += max(letter_counts, key=lambda x: x[1])[0]
    return result

# Test function with input data from question:
assert solve(['mistul', 'aidteh', 'mhfjtr', 'zxcjer']) == 'mister'
assert solve(['qagic', 'cafbk', 'twggl', 'kaqtc', 'iisih', 'mbpzu', 'pbghn',
              'mzsev', 'saqbl', 'myead']) == 'magic'
assert solve(['sacbkt', 'tnqaex', 'vhcrhl', 'obotnq', 'vevleg', 'rljnlv',
              'jdcjrk', 'zuwtee', 'xycbvm', 'szgczt', 'imhepi', 'febybq',
              'pqkdfg', 'swwlds', 'ecmrut', 'buwruy', 'icjwet', 'gebgbq',
              'djtfzr', 'uenleo']) == 'secret'

UPDATE UPDATE

@dun suggested a smarter way of using the max() function, which makes the one-liner actually quite readable :-) @dun提出了一种使用max()函数的更聪明的方法,这使得单线实际上相当易读:-)

def solve(strings):
    return ''.join([max(letters, key=letters.count) for letters in zip(*strings)])

Using collections.Counter() is a nice strategy here. 在这里使用collections.Counter()是一个不错的策略。 Here's one way to do it: 这是一种实现方法:

from collections import Counter

def most_freq_at_index(strings, idx):
  chars = [s[idx] for s in strings]
  char_counts = Counter(chars)
  return char_counts.most_common(n=1)[0][0]

strings = ['qagic', 'cafbk', 'twggl', 'kaqtc', 'iisih', 
           'mbpzu', 'pbghn', 'mzsev', 'saqbl', 'myead']

result = ''.join(most_freq_at_index(strings, idx) for idx in range(5))
print(result) 
## 'magic'

If you want something more manual without the magic of Python libraries you can do something like this: 如果您想要更多的手册,而又没有Python库的魔力,可以执行以下操作:

def f(strings):
    dic = {}
    for string in strings:
        for i in range(len(string)):
            word_dic = dic.get(i, { string[i]: 0 })
            word_dic[string[i]] = word_dic.get(string[i], 0) + 1
            dic[i] = word_dic
    largest_string = max(strings, key = len)
    result = ""
    for i in range(len(largest_string)):
        result += max(dic[i], key = lambda x : dic[i][x])
    return result
strings = ['qagic', 'cafbk', 'twggl', 'kaqtc', 'iisih', 'mbpzu', 'pbghn', 'mzsev', 'saqbl', 'myead']
f(strings)
'magic'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在字符串列表中查找字母序列? - How to find sequence of letters in a list of strings? 如何从具有特定字母的列表中查找单词? - How do I find words from a list with specific letters? 如何使用 lambda function 在列表/数组中查找具有匹配字母的字符串? - How to find strings with matched letters in list/array using lambda function? 如何“梳理”列表以查找某些字符串? - How do I 'comb' through a list to find certain strings? 如何在Python中找到带有字符串列表的预先存在的变量? - How do I find a preexisting variable with a list of strings in Python? (Python)如何将几个字母与整个单词列表进行比较以查找哪些单词按顺序包含搜索的字母? - (Python) How do I compare a few letters to a whole list of words to find, what words contain the searched-for letters in order? 如何在字符串数组中找到字母 position - How to find position of letters in array of strings 如何在包含 Python 中的整数和字符串的列表中找到最大的 integer 的索引? - How do I find find the index of the greatest integer in a list that contains integers and strings in Python? 如何从列表中删除包含字母的字符串? - How to remove strings that contain letters from a list? 如何在 python 列表中为带有字母的字符串添加标签? - How to add labels to strings with letters in python list?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM