检索单个键下具有多个值的字典中的最高值

Question

I am somewhat new to python and i have a problem. 我对python有点陌生，我遇到了问题。 I have a file with 5 results for each unique identifier. 我有一个文件，每个唯一标识符有5个结果。 Each result has a percent match, and various other pieces of data. 每个结果都有一个百分比匹配，以及其他各种数据。 My goal is to find the result with the greatest percent match, and then retrieve more information from that original line. 我的目标是找到匹配百分比最高的结果，然后从该原始行中检索更多信息。 For example 例如

Name    Organism    Percent Match     Misc info
1        Human        100              xxx     
1        Goat          95              yyy
1        Pig           90              zzz

I am attempting to solve this problem by putting each key in a dictionary with the values being each percent match unique to the given name (ie multiple values for every key). 我试图通过将每个键放入字典中来解决此问题，每个值的百分比都与给定名称唯一匹配（即每个键有多个值）。 The only way I can think to proceed is to convert the values in this dictionary to a list, then sort the list. 我认为可以进行的唯一方法是将此字典中的值转换为列表，然后对列表进行排序。 I then want to retrieve the greatest value in the list (list[0] or list[-1]) and then retrieve more info from the original line. 然后，我想检索列表（list [0]或list [-1]）中的最大值，然后从原始行中检索更多信息。 Here is my code thus far 到目前为止，这是我的代码

list = []  
if "1" in line: 
    id = line
    bsp = id.split("\t")
    uid = bsp[0]
    per = bsp[2]

    if not dict.has_key(uid):
        dict[uid] = []
    dict[uid].append(per)
    list = dict[uid]
    list.sort()
if list[0] in dict:
    print key

This ends up just printing every key, as opposed to only that which has the greatest percent. 最终仅打印每个键，而不是仅打印百分比最高的键。 Any thoughts? 有什么想法吗？ Thanks! 谢谢！

Answer 1

You could use csv to parse the tab-delineated data file, (though the data you posted looks to be column-spaced data!?) 您可以使用csv来解析制表符描述的数据文件，（尽管您发布的数据看起来是按列分隔的数据！？）

Since the first line in your data file gives field names, a DictReader is convenient, so you can refer to the columns by human-readable names. 由于数据文件中的第一行提供了字段名称，因此DictReader很方便，因此您可以使用易于理解的名称来引用列。

csv.DictReader returns an iterable of rows (dicts). csv.DictReader返回一个可迭代的行（字典）。 If you take the max of the iterable using the Percent Match column as the key , you can find the row with the highest percent match: 如果使用“ Percent Match列作为key来获取max迭代次数，则可以找到Percent Match最高的行：

Using this (tab-delimited) data as test.dat : 使用此（制表符分隔的）数据作为test.dat ：

Name    Organism    Percent Match   Misc    info
1   Human   100 xxx
1   Goat    95  yyy
1   Pig 90  zzz
2   Mouse   95  yyy
2   Moose   90  zzz
2   Manatee 100 xxx

the program 该程序

import csv

maxrows = {}
with open('test.dat', 'rb') as f:
    for row in csv.DictReader(f, delimiter = '\t'):
        name = row['Name']
        percent = int(row['Percent Match'])
        if int(maxrows.get(name,row)['Percent Match']) <= percent:
            maxrows[name] = row

print(maxrows)

yields 产量

{'1': {'info': None, 'Percent Match': '100', 'Misc': 'xxx', 'Organism': 'Human', 'Name': '1'}, '2': {'info': None, 'Percent Match': '100', 'Misc': 'xxx', 'Organism': 'Manatee', 'Name': '2'}}

Answer 2

You should be able to do something like this: 您应该能够执行以下操作：

lines = []
with open('data.txt') as file:
    for line in file:
        if line.startswith('1'):
            lines.append(line.split())

best_match = max(lines, key=lambda k: int(k[2]))

After reading the file lines would look something like this: 看完文件后lines会是这个样子：

>>> pprint.pprint(lines)
[['1', 'Human', '100', 'xxx'],
 ['1', 'Goat', '95', 'yyy'],
 ['1', 'Pig', '90', 'zzz']]

And then you want to get the entry from lines where the int value of the third item is the highest, which can be expressed like this: 然后，您要从第三项的int值最高的lines中获取条目，可以将其表示为：

>>> max(lines, key=lambda k: int(k[2]))
['1', 'Human', '100', 'xxx']

So at the end of this best_match will be a list with the data from the line you are interested in. 因此，此best_match将是一个列表，其中包含您感兴趣的行中的数据。

Or if you wanted to get really tricky, you could get the line in one (complicated) step: 或者，如果您想变得非常棘手，则可以通过一个（复杂的）步骤进行操作：

with open('data.txt') as file:
    best_match = max((s.split() for s in file if s.startswith('1')),
                     key=lambda k: int(k[2]))

Answer 3

I think you may be looking for something like: 我认为您可能正在寻找类似的东西：

from collections import defaultdict

results = defaultdict(list)
with open('data.txt') as f:
    #next(f)      # you may need this so skip the header
    for line in f:
        splitted = line.split()
        results[splitted[0]].append(splitted[1:])

maxs = {}
for uid,data in results.items():
    maxs[uid] =  max(data, key=lambda k: int(k[1]))

I've testif on a file like: 我已经对以下文件进行了证明：

Name    Organism    Percent Match     Misc info
1        Human        100              xxx     
1        Goat          95              yyy
1        Pig           90              zzz   
2        Pig           85              zzz   
2        Goat          70              yyy

And the result was: 结果是：

{'1': ['Human', '100', 'xxx'], '2': ['Pig', '85', 'zzz']}

Answer 4

with open('datafile.txt', 'r') as f:
    lines = file.read().split('\n')

matchDict = {}

for line in lines:
    if line[0] == '1':
        uid, organism, percent, misc = line.split('\t')
        matchDict[int(percent)] = (organism, uid, misc)

highestMatch = max(matchDict.keys())

print('{0} is the highest match at {1} percent'.format(matchDict[highestMatch][0], highestMatch))

检索单个键下具有多个值的字典中的最高值

问题描述

4 个解决方案

解决方案1
2 2012-02-15 23:12:57

解决方案2
1 2012-02-15 23:06:49

解决方案3
1 2012-02-15 23:16:44

解决方案4
0 2012-02-15 23:14:35

检索单个键下具有多个值的字典中的最高值

问题描述

4 个解决方案

解决方案1 2 2012-02-15 23:12:57

解决方案2 1 2012-02-15 23:06:49

解决方案3 1 2012-02-15 23:16:44

解决方案4 0 2012-02-15 23:14:35

解决方案1
2 2012-02-15 23:12:57

解决方案2
1 2012-02-15 23:06:49

解决方案3
1 2012-02-15 23:16:44

解决方案4
0 2012-02-15 23:14:35