[英]How do I find the predominant letters in a list of strings
I want to check for each position in the string what is the character that appears most often on that position. 我想检查字符串中的每个位置,什么是最常出现在该位置的字符。 If there are more of the same frequency, keep the first one.
如果有更多相同的频率,请保留第一个。 All strings in the list are guaranteed to be of identical length!!!
列表中的所有字符串都保证长度相同!!!
I tried the following way: 我尝试了以下方法:
print(max(((letter, strings.count(letter)) for letter in strings), key=lambda x:[1])[0])
But I get: mistul
or qagic
但我得到:
mistul
或qagic
And I can not figure out what's wrong with my code. 而且我无法弄清楚我的代码出了什么问题。
My list of strings looks like this: 我的字符串列表如下所示:
Input: strings = ['mistul', 'aidteh', 'mhfjtr', 'zxcjer']
输入:
strings = ['mistul', 'aidteh', 'mhfjtr', 'zxcjer']
Output: mister
输出:
mister
Explanation: On the first position, m appears twice. 说明:在第一个位置, m出现两次。 Second, i appears twice twice.
其次, 我两次出现两次。 Third, there is no predominant character, so we chose the first, that is, s .
第三,没有主要字符,因此我们选择第一个字符s 。 On the fourth position, we have t twice and j twice, but you see first t , so we stay with him, on the fifth position we have e twice and the last r twice.
在第四个位置上,我们有两次t,两次是j ,但是您看到第一个t ,所以我们和他在一起,在第五个位置,我们两次拥有e ,最后一个r两次。
Another examples: 另一个例子:
Input: ['qagic', 'cafbk', 'twggl', 'kaqtc', 'iisih', 'mbpzu', 'pbghn', 'mzsev', 'saqbl', 'myead']
输入:
['qagic', 'cafbk', 'twggl', 'kaqtc', 'iisih', 'mbpzu', 'pbghn', 'mzsev', 'saqbl', 'myead']
Output: magic
输出:
magic
Input: ['sacbkt', 'tnqaex', 'vhcrhl', 'obotnq', 'vevleg', 'rljnlv', 'jdcjrk', 'zuwtee', 'xycbvm', 'szgczt', 'imhepi', 'febybq', 'pqkdfg', 'swwlds', 'ecmrut', 'buwruy', 'icjwet', 'gebgbq', 'djtfzr', 'uenleo']
输入:
['sacbkt', 'tnqaex', 'vhcrhl', 'obotnq', 'vevleg', 'rljnlv', 'jdcjrk', 'zuwtee', 'xycbvm', 'szgczt', 'imhepi', 'febybq', 'pqkdfg', 'swwlds', 'ecmrut', 'buwruy', 'icjwet', 'gebgbq', 'djtfzr', 'uenleo']
Expected Output: secret
预期输出:
secret
Some help? 一些帮助?
Finally a use case for zip()
:-) 最后是
zip()
的用例:-)
If you like cryptic code, it could even be done in one statement: 如果你喜欢神秘的代码,它甚至可以在一个语句来完成:
def solve(strings):
return ''.join([max([(letter, letters.count(letter)) for letter in letters], key=lambda x: x[1])[0] for letters in zip(*strings)])
But I prefer a more readable version: 但我更喜欢可读性更高的版本:
def solve(strings):
result = ''
# "zip" the strings, so in the first iteration `letters` would be a list
# containing the first letter of each word, the second iteration it would
# be a list of all second letters of each word, and so on...
for letters in zip(*strings):
# Create a list of (letter, count) pairs:
letter_counts = [(letter, letters.count(letter)) for letter in letters]
# Get the first letter with the highest count, and append it to result:
result += max(letter_counts, key=lambda x: x[1])[0]
return result
# Test function with input data from question:
assert solve(['mistul', 'aidteh', 'mhfjtr', 'zxcjer']) == 'mister'
assert solve(['qagic', 'cafbk', 'twggl', 'kaqtc', 'iisih', 'mbpzu', 'pbghn',
'mzsev', 'saqbl', 'myead']) == 'magic'
assert solve(['sacbkt', 'tnqaex', 'vhcrhl', 'obotnq', 'vevleg', 'rljnlv',
'jdcjrk', 'zuwtee', 'xycbvm', 'szgczt', 'imhepi', 'febybq',
'pqkdfg', 'swwlds', 'ecmrut', 'buwruy', 'icjwet', 'gebgbq',
'djtfzr', 'uenleo']) == 'secret'
UPDATE UPDATE
@dun suggested a smarter way of using the max()
function, which makes the one-liner actually quite readable :-) @dun提出了一种使用
max()
函数的更聪明的方法,这使得单线实际上相当易读:-)
def solve(strings):
return ''.join([max(letters, key=letters.count) for letters in zip(*strings)])
Using collections.Counter()
is a nice strategy here. 在这里使用
collections.Counter()
是一个不错的策略。 Here's one way to do it: 这是一种实现方法:
from collections import Counter
def most_freq_at_index(strings, idx):
chars = [s[idx] for s in strings]
char_counts = Counter(chars)
return char_counts.most_common(n=1)[0][0]
strings = ['qagic', 'cafbk', 'twggl', 'kaqtc', 'iisih',
'mbpzu', 'pbghn', 'mzsev', 'saqbl', 'myead']
result = ''.join(most_freq_at_index(strings, idx) for idx in range(5))
print(result)
## 'magic'
If you want something more manual without the magic of Python libraries you can do something like this: 如果您想要更多的手册,而又没有Python库的魔力,可以执行以下操作:
def f(strings):
dic = {}
for string in strings:
for i in range(len(string)):
word_dic = dic.get(i, { string[i]: 0 })
word_dic[string[i]] = word_dic.get(string[i], 0) + 1
dic[i] = word_dic
largest_string = max(strings, key = len)
result = ""
for i in range(len(largest_string)):
result += max(dic[i], key = lambda x : dic[i][x])
return result
strings = ['qagic', 'cafbk', 'twggl', 'kaqtc', 'iisih', 'mbpzu', 'pbghn', 'mzsev', 'saqbl', 'myead']
f(strings)
'magic'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.