简体   繁体   English

如何从列表中检索最小唯一值?

[英]How to retrieve minimum unique values from list?

I have a list of dictionary.我有一个字典列表。 I wish to have only one result for each unique api and the result need to show according to priority: 0, 1, 2. May I know how should I work on it?我希望每个唯一的 api 只有一个结果,结果需要根据优先级显示:0、1、2。我可以知道我应该如何处理它吗?

Data:数据:

[
{'api':'test1', 'result': 0},
{'api':'test2', 'result': 1},
{'api':'test3', 'result': 2},
{'api':'test3', 'result': 0},
{'api':'test3', 'result': 1},
]

Expected output:预期 output:

[
{'api':'test1', 'result': 0},
{'api':'test2', 'result': 1},
{'api':'test3', 'result': 0},
]

Assuming input data you can do classic sql-ish groupby :假设输入data ,您可以执行经典的 sql-ish groupby

from itertools import groupby

# in case your data is sorted already by api skip the below line
data = sorted(data, key=lambda x: x['api'])

res = [
    {'api': g, 'result': min(v, key=lambda x: x['result'])['result']} 
    for g, v in groupby(data, lambda x: x['api'])
]

Outputs:输出:

[{'api': 'test1', 'result': 0}, {'api': 'test2', 'result': 1}, {'api': 'test3', 'result': 0}]

You can pass through the list once and preserve the best ones you see for each group.您可以通过列表一次并保留您在每个组中看到的最佳列表。 This is time and space efficient.这是节省时间和空间的。

def get_min_unique(items, id_key, value_key):
  lowest = {}
  for item in items:
    key = item[id_key]
    if key not in lowest or lowest[key][value_key] > item[value_key]:
        lowest[key] = item
  return list(lowest.values())

For example with your own data:例如使用您自己的数据:

data = [
  {'api':'test1', 'result': 0},
  {'api':'test2', 'result': 1},
  {'api':'test3', 'result': 2},
  {'api':'test3', 'result': 0},
  {'api':'test3', 'result': 1},
]

assert get_min_unique(data, 'api', 'result') == [
  {'api': 'test1', 'result': 0},
  {'api': 'test2', 'result': 1},
  {'api': 'test3', 'result': 0},
]
data = [
    {'api': 'test1', 'result': 0},
    {'api': 'test3', 'result': 2},
    {'api': 'test2', 'result': 1},
    {'api': 'test3', 'result': 1},
    {'api': 'test3', 'result': 0}
]

def find(data):
    step1 = sorted(data, key=lambda k: k['result'])
    print('step1', step1)

    step2 = {}
    for each in step1:
        if each['api'] not in step2:
            step2[each['api']] = each
    print('step2', step2)

    step3 = list(step2.values())
    print('step3', step3)
    print('\n')
    return step3

find(data)

Try this, it will give you试试这个,它会给你

step1 [{'api': 'test1', 'result': 0}, {'api': 'test3', 'result': 0}, {'api': 'test2', 'result': 1}, {'api': 'test3', 'result': 1}, {'api': 'test3', 'result': 2}]
step2 {'test1': {'api': 'test1', 'result': 0}, 'test3': {'api': 'test3', 'result': 0}, 'test2': {'api': 'test2', 'result': 1}}
step3 [{'api': 'test1', 'result': 0}, {'api': 'test3', 'result': 0}, {'api': 'test2', 'result': 1}]

Sort all first, then find first for each "api", and there goes your result.首先对所有内容进行排序,然后首先为每个“api”查找,然后您的结果就出来了。

Indulging in code golf:沉迷于代码高尔夫:

from itertools import groupby
dut = [
    {'api':'test1', 'result': 0},
    {'api':'test2', 'result': 1},
    {'api':'test3', 'result': 2},
    {'api':'test3', 'result': 0},
    {'api':'test3', 'result': 1},
]

res = [
    next(g)
    for _,g in groupby(
        sorted(dut, key=lambda d: tuple(d.values())),
        key=lambda i: i['api']
    )
]

result:结果:

Out[45]:
[{'api': 'test1', 'result': 0},
 {'api': 'test2', 'result': 1},
 {'api': 'test3', 'result': 0}]

Using the itertools.groupby utility, the iterable fed as the first argument is sorted in ascending order using sorted by api and result and grouped by result only.使用itertools.groupby实用程序,作为第一个参数馈送的可迭代对象使用sorted apiresult排序并仅按result分组的升序排序。

groupby returns back an iterable of the key, and iterable of items in this group, as seen here: groupby返回键的可迭代对象,以及该组中项目的可迭代对象,如下所示:

In [56]: list(groupby(sorted(dut, key=lambda i: tuple(i.values())), key=lambda i: i['api']))
Out[56]:
[('test1', <itertools._grouper at 0x10af4c550>),
 ('test2', <itertools._grouper at 0x10af4c400>),
 ('test3', <itertools._grouper at 0x10af4cc88>)]

Using a list comprehension, since the group is already sorted, next is used to fetch the first item in the group and the group key is discarded.使用列表推导,由于组已经排序, next用于获取组中的第一项,并且组键被丢弃。

The existing answers are fine if you have a need to store every api at every priority and only periodically filter it to highest priority.如果您需要在每个优先级存储每个 api 并且仅定期将其过滤到最高优先级,则现有答案很好。 If you're only ever going to need the highest priority of each api, however, I'd argue you're using the wrong data structure.但是,如果您只需要每个 api 的最高优先级,我认为您使用了错误的数据结构。

>>> from collections import UserDict
>>> 
>>> class DataContainer(UserDict):
...     def __setitem__(self, key, value):
...         cur = self.get(key)
...         if cur is None or value < cur:
...             super().__setitem__(key, value)
...     def __str__(self):
...         return '\n'.join(("'api': {}, 'result': {}".format(k, v) for k, v in self.items()))
... 
>>> data = DataContainer()
>>> data['test1'] = 0
>>> data['test2'] = 1
>>> data['test3'] = 2
>>> data['test3'] = 0
>>> data['test3'] = 1
>>> print(data)
'api': test1, 'result': 0
'api': test2, 'result': 1
'api': test3, 'result': 0

This container will only ever contain the highest priority for each api.此容器将只包含每个 api 的最高优先级。 Advantages include:优点包括:

  • Clearly expresses what you're doing清楚地表达你在做什么
  • No need for code golf无需代码高尔夫
  • Keeps memory footprint to minimum将 memory 占位面积保持在最低限度
  • Faster than periodically sorting, grouping, and filtering比定期排序、分组和过滤更快

not so clean solution like others, but i think step wise, easy to understand one不像其他人那样干净的解决方案,但我认为逐步明智,易于理解

l = [
{'api':'test1', 'result': 0},
{'api':'test2', 'result': 1},
{'api':'test3', 'result': 2},
{'api':'test3', 'result': 0},
{'api':'test3', 'result': 1},
]

j = {'api':[], 'result':[]}
for i in l:
    if i['api'] not in j['api']:
        j['api'].append(i['api'])
        j['result'].append(i['result']) 
    else:    
        index = j['api'].index(i['api'])
        
        
        if j['result'][index]>i['result']:
            j['result'][index] = i['result']
        
result = []

for i in range(len(j['api'])):
        result.append({'api':j['api'][i],'result':j['result'][i]})
    
print(result)

output output

[{'api': 'test1', 'result': 0},
 {'api': 'test2', 'result': 1},
 {'api': 'test3', 'result': 0}]

You could pick another, more efficient data structure: a dict of Counters.您可以选择另一种更有效的数据结构:Counters 的字典。

You retain the distribution of results for each api, and the code is relatively straightforward:您保留每个 api 的结果分布,代码相对简单:

data = [
{'api':'test1', 'result': 0},
{'api':'test2', 'result': 1},
{'api':'test3', 'result': 2},
{'api':'test3', 'result': 0},
{'api':'test3', 'result': 1},
]

from collections import Counter

results = {}
for d in data:
    counter = results.setdefault(d['api'], Counter())
    counter[d['result']] += 1

results
# {'test1': Counter({0: 1}),
#  'test2': Counter({1: 1}),
#  'test3': Counter({2: 1, 0: 1, 1: 1})}

[{'api': api, 'result':min(v.keys())} for api, v in results.items()]
# [{'api': 'test1', 'result': 0},
#  {'api': 'test2', 'result': 1},
#  {'api': 'test3', 'result': 0}]

Should you want to get the maximum or the count of results, you'd just need to change the last line.如果您想获得最大值或结果数,您只需要更改最后一行。

Here's the cleanest solution (if you are willing to use external libraries):这是最干净的解决方案(如果您愿意使用外部库):

import pandas as pd
df = pd.DataFrame(data)
dfMin = df.groupby(by='api').min()

dfMin is a Pandas DataFrame with indices api and result the minimum value for each API. dfMin是一个 Pandas DataFrame ,其索引apiresult每个 ZDB974238714CA8DE634A7CED 的最小值

yet another solution..还有一个解决方案..

result = {}

for d in data: result[ d['api']] = min(result.get(d['api'], d['result']), d['result'])

new_data = [ {'api' : k, 'result': v} for k, v in result.items() ]

print (new_data)

prints印刷

#[{'api': 'test1', 'result': 0}, {'api': 'test2', 'result': 1}, {'api': 'test3', 'result': 0}]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM