简体   繁体   English

检索单个键下具有多个值的字典中的最高值

[英]Retrieving the top value in a dictionary that has multiple values under a single key

I am somewhat new to python and i have a problem. 我对python有点陌生,我遇到了问题。 I have a file with 5 results for each unique identifier. 我有一个文件,每个唯一标识符有5个结果。 Each result has a percent match, and various other pieces of data. 每个结果都有一个百分比匹配,以及其他各种数据。 My goal is to find the result with the greatest percent match, and then retrieve more information from that original line. 我的目标是找到匹配百分比最高的结果,然后从该原始行中检索更多信息。 For example 例如

Name    Organism    Percent Match     Misc info
1        Human        100              xxx     
1        Goat          95              yyy
1        Pig           90              zzz   

I am attempting to solve this problem by putting each key in a dictionary with the values being each percent match unique to the given name (ie multiple values for every key). 我试图通过将每个键放入字典中来解决此问题,每个值的百分比都与给定名称唯一匹配(即每个键有多个值)。 The only way I can think to proceed is to convert the values in this dictionary to a list, then sort the list. 我认为可以进行的唯一方法是将此字典中的值转换为列表,然后对列表进行排序。 I then want to retrieve the greatest value in the list (list[0] or list[-1]) and then retrieve more info from the original line. 然后,我想检索列表(list [0]或list [-1])中的最大值,然后从原始行中检索更多信息。 Here is my code thus far 到目前为止,这是我的代码

list = []  
if "1" in line: 
    id = line
    bsp = id.split("\t")
    uid = bsp[0]
    per = bsp[2]

    if not dict.has_key(uid):
        dict[uid] = []
    dict[uid].append(per)
    list = dict[uid]
    list.sort()
if list[0] in dict:
    print key

This ends up just printing every key, as opposed to only that which has the greatest percent. 最终仅打印每个键,而不是仅打印百分比最高的键。 Any thoughts? 有什么想法吗? Thanks! 谢谢!

You could use csv to parse the tab-delineated data file, (though the data you posted looks to be column-spaced data!?) 您可以使用csv来解析制表符描述的数据文件,(尽管您发布的数据看起来是按列分隔的数据!?)

Since the first line in your data file gives field names, a DictReader is convenient, so you can refer to the columns by human-readable names. 由于数据文件中的第一行提供了字段名称,因此DictReader很方便,因此您可以使用易于理解的名称来引用列。

csv.DictReader returns an iterable of rows (dicts). csv.DictReader返回一个可迭代的行(字典)。 If you take the max of the iterable using the Percent Match column as the key , you can find the row with the highest percent match: 如果使用“ Percent Match列作为key来获取max迭代次数,则可以找到Percent Match最高的行:

Using this (tab-delimited) data as test.dat : 使用此(制表符分隔的)数据作为test.dat

Name    Organism    Percent Match   Misc    info
1   Human   100 xxx
1   Goat    95  yyy
1   Pig 90  zzz
2   Mouse   95  yyy
2   Moose   90  zzz
2   Manatee 100 xxx

the program 该程序

import csv

maxrows = {}
with open('test.dat', 'rb') as f:
    for row in csv.DictReader(f, delimiter = '\t'):
        name = row['Name']
        percent = int(row['Percent Match'])
        if int(maxrows.get(name,row)['Percent Match']) <= percent:
            maxrows[name] = row

print(maxrows)

yields 产量

{'1': {'info': None, 'Percent Match': '100', 'Misc': 'xxx', 'Organism': 'Human', 'Name': '1'}, '2': {'info': None, 'Percent Match': '100', 'Misc': 'xxx', 'Organism': 'Manatee', 'Name': '2'}}

You should be able to do something like this: 您应该能够执行以下操作:

lines = []
with open('data.txt') as file:
    for line in file:
        if line.startswith('1'):
            lines.append(line.split())

best_match = max(lines, key=lambda k: int(k[2]))

After reading the file lines would look something like this: 看完文件后lines会是这个样子:

>>> pprint.pprint(lines)
[['1', 'Human', '100', 'xxx'],
 ['1', 'Goat', '95', 'yyy'],
 ['1', 'Pig', '90', 'zzz']]

And then you want to get the entry from lines where the int value of the third item is the highest, which can be expressed like this: 然后,您要从第三项的int值最高的lines中获取条目,可以将其表示为:

>>> max(lines, key=lambda k: int(k[2]))
['1', 'Human', '100', 'xxx']

So at the end of this best_match will be a list with the data from the line you are interested in. 因此,此best_match将是一个列表,其中包含您感兴趣的行中的数据。

Or if you wanted to get really tricky, you could get the line in one (complicated) step: 或者,如果您想变得非常棘手,则可以通过一个(复杂的)步骤进行操作:

with open('data.txt') as file:
    best_match = max((s.split() for s in file if s.startswith('1')),
                     key=lambda k: int(k[2]))

I think you may be looking for something like: 我认为您可能正在寻找类似的东西:

from collections import defaultdict

results = defaultdict(list)
with open('data.txt') as f:
    #next(f)      # you may need this so skip the header
    for line in f:
        splitted = line.split()
        results[splitted[0]].append(splitted[1:])

maxs = {}
for uid,data in results.items():
    maxs[uid] =  max(data, key=lambda k: int(k[1]))

I've testif on a file like: 我已经对以下文件进行了证明:

Name    Organism    Percent Match     Misc info
1        Human        100              xxx     
1        Goat          95              yyy
1        Pig           90              zzz   
2        Pig           85              zzz   
2        Goat          70              yyy

And the result was: 结果是:

{'1': ['Human', '100', 'xxx'], '2': ['Pig', '85', 'zzz']}
with open('datafile.txt', 'r') as f:
    lines = file.read().split('\n')

matchDict = {}

for line in lines:
    if line[0] == '1':
        uid, organism, percent, misc = line.split('\t')
        matchDict[int(percent)] = (organism, uid, misc)

highestMatch = max(matchDict.keys())

print('{0} is the highest match at {1} percent'.format(matchDict[highestMatch][0], highestMatch))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将具有多个值的字典键的单个值分配给单个键 - Assign single value to a single key from dictionary key with multiple values 如何使用 python 中的字典从具有多个值的键中访问单个值 - how to access a single value from a key that has multiple values in it using dictionary in python 从具有多个值的字典中获取单个键的值的键 - Fetch key of a value from dictionary with multiple values for single key 包含同一键下的多个列表列表的单个字典。 希望将同一键下的所有值组合到一个列表中 - single dictionary containing multiple list of lists under the same key. Looking to combining the all values under the same key into a single list 从每个键有多个值的字典中选择单个值 - Selecting single value from dictionary with multiple values per key 需要从具有多个值的字典键返回单个值 (Python) - Need to Return a Single Value from a Dictionary Key with Multiple Values (Python) 通过字典中的值获取具有多个值的键 - Get the key which has multiple value by its values in a dictionary 如果键有多个值,有没有办法在字典中找到一个值? - Is there a way to find a value in a dictionary if the key has multiple values? 将多个值添加到python字典中的单个键 - Add multiple values to single key in python dictionary 单个键中有多个值的字典中的语法错误 - Syntax error in dictionary with multiple values in a single key
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM