简体   繁体   English

查找与另一个字典中的键、值对匹配的字典的 Pythonic 方法

[英]Pythonic way to find a dictionary that matches key, value pairs in another dictionary

I'm trying to find a way to match the key, value pairs of one dictionary to another.我正在尝试找到一种将一个字典的键、值对与另一个匹配的方法。 The first dictionary, record is a record with a static number of keys that do not change (although the values for each key can of course change), but the second dictionary, potential_outputs is user-defined and has variable keys and values.第一个字典record是一个记录,其键数为 static 不变(尽管每个键的值当然可以更改),但第二个字典potential_outputs是用户定义的,并且具有可变的键和值。 The user chooses which keys from the record they want to assign, assigns them a value, and then assigns an output value that is used when a match is found.用户从他们想要分配的record中选择哪些键,为其分配一个值,然后分配一个 output 值,该值在找到匹配项时使用。

Example:例子:

record = [
    {"Name": "John Smith", "Class": "c1", "Plan": "p1",},
    {"Name": "Jane Doe", "Class": "c2", "Plan": "p2",},
]
potential_outputs = [
    {"Class": "c1", "Plan": "p1", "Output": "o11"},
    {"Class": "c1", "Plan": "p2", "Output": "o12"},
    {"Class": "c2", "Plan": "p1", "Output": "o21"},
    {"Class": "c2", "Plan": "p2", "Output": "o22"},
]

The program needs to be able to loop through each dictionary in the record list, determine which dictionary in potential_outputs matches the key, value pairs, and then return the "Output" from the matching potential_outputs dictionary.程序需要能够遍历record列表中的每个字典,确定potential_outputs中哪个字典匹配键、值对,然后从匹配的potential_outputs字典中返回“输出”。

Expected output would be something along the lines of:预期的 output 将类似于:

[
    {"Name": "John Smith", "Output": "o11"},
    {"Name": "Jane Doe", "Output": "o22"},
]

I also want to note that I am not committed to using dictionaries in order to resolve this issue.我还想指出,我不致力于使用字典来解决这个问题。

Thank you!谢谢!

You could group your outputs with a (Class, Plan) tuple key, then output the the found output dictionaries using a list comprehension.您可以使用(Class, Plan)元组键对输出进行分组,然后使用列表推导 output 找到找到的 output 字典。

Using a output lookup dictionary for O(1) lookups allows the solution to be O(N + M) , instead of O(N * M) , where N is the number of dictionaries in record , and M is the number of dictionaries in potential_outputs .使用 output 查找字典进行O(1)查找允许解决方案为O(N + M) ,而不是O(N * M) ,其中Nrecord中的字典数量,而M是字典中的数量potential_outputs

record = [
    {"Name": "John Smith", "Class": "c1", "Plan": "p1",},
    {"Name": "Jane Doe", "Class": "c2", "Plan": "p2",},
]

potential_outputs = [
    {"Class": "c1", "Plan": "p1", "Output": "o11"},
    {"Class": "c1", "Plan": "p2", "Output": "o12"},
    {"Class": "c2", "Plan": "p1", "Output": "o21"},
    {"Class": "c2", "Plan": "p2", "Output": "o22"},
]

outputs = {(output["Class"], output["Plan"]): output["Output"] for output in potential_outputs}

result = [{"Name": r["Name"], "Output": outputs[r["Class"], r["Plan"]]} for r in record]

print(result)

Output: Output:

[{'Name': 'John Smith', 'Output': 'o11'}, {'Name': 'Jane Doe', 'Output': 'o22'}]

To avoid nested looping and M*N complexity, you can preprocess record为避免嵌套循环和 M*N 复杂性,您可以预处理record

from collections import defaultdict

rec = defaultdict(lambda: defaultdict(list))
for r in record:
    rec[r['Class']][r['Plan']].append(r['Name'])

before looping through the potential_outputs在遍历potential_outputs之前

result = [{"Name": name, "Output": po["Output"]} 
          for po in potential_outputs 
          for name in rec[po['Class']][po['Plan']]]
result
# [{'Name': 'John Smith', 'Output': 'o11'}, {'Name': 'Jane Doe', 'Output': 'o22'}]

It is possible to do this, and have better than linear performance by creating a 3rd dictionary to be used as an index.可以通过创建第三个字典用作索引来做到这一点,并且具有比线性性能更好的性能。 The "keys" on the index dictionary should be sets of key/value pairs that can be valid identifiers to the desired output record.索引字典上的“键”应该是一组键/值对,它们可以是所需 output 记录的有效标识符。 It looks like if you generate this index with FrosenSets containing tuples - something like:看起来如果您使用包含元组的 FrosenSets 生成此索引 - 类似于:


def make_index(data):
    result_index = {}
    for row in data:
        work_row = row.copy()
        work_row.pop("Output")
        while work_row:
            key = frozenset((key, value) for key, value in work_row.items())
            result_index.setdefault(key, []).append(row)
            work_row.pop(next(iter(work_row))) 
    return result_index


def search(index, row_key):
    row_key = row_key.copy()
    row_key.pop("Name", None)
    key = frozenset((key, value) for key, value in row_key.items())
    return index[key]

And this works if "potential_outputs" have all the keys except "Name":如果“potential_outputs”具有除“Name”之外的所有键,则此方法有效:

In [35]: search(index, record[0])                                                                                                                    
Out[35]: [{'Class': 'c1', 'Plan': 'p1', 'Output': 'o11'}]

In [36]: index = make_index(potential_outputs)                                                                                                       

In [37]: search(index, record[0])                                                                                                                    
Out[37]: [{'Class': 'c1', 'Plan': 'p1', 'Output': 'o11'}]

If you want mtches that occur with less matching keys than just stripping name, the same index works, but the "search" code have to be changed.如果您希望 mtches 出现的匹配键较少,而不仅仅是剥离名称,则相同的索引有效,但必须更改“搜索”代码。 And then we have to know exactly what are the desired matches to query accordingly.然后我们必须确切知道要查询的所需匹配项是什么。 If "class" and "plan" matches different records, should both be returned?如果 "class" 和 "plan" 匹配不同的记录,是否都应该返回? Or None?还是没有? You will likely find something in itertools to generate all keys you want search for, given a row in records.给定记录中的一行,您可能会在itertools中找到一些东西来生成您想要搜索的所有键。

Meanwhile, anyway, this code is already fit to recover multiple results if everything matches:同时,无论如何,如果一切匹配,此代码已经适合恢复多个结果:


In [39]: search(index, {"Plan": "p2"})                                                                                                               
Out[39]: 
[{'Class': 'c1', 'Plan': 'p2', 'Output': 'o12'},
 {'Class': 'c2', 'Plan': 'p2', 'Output': 'o22'}]

Here is a really simple way to handle it using pandas :这是使用pandas处理它的一种非常简单的方法:

import pandas as pd

# Read your list of dicts into DataFrames.
dfr = pd.DataFrame(record)
dfp = pd.DataFrame(potential_outputs)

# Merge the two DataFrames on `Class` and `Plan` and return the result.
result = pd.merge(dfr, 
                  dfp, 
                  how='inner', 
                  on=['Class', 'Plan']).drop(['Class', 'Plan'], axis=1)

Output1:输出1:

As a DataFrame:作为 DataFrame:

    Name    Output
0   John Smith  o11
1   Jane Doe    o22

Output2:输出2:

As a list:作为一个列表:

result2 = [i for i in result.T.to_dict().values()]

[{'Name': 'John Smith', 'Output': 'o11'}, {'Name': 'Jane Doe', 'Output': 'o22'}]

If you would make potential_outputs a dict with the form {("c1","p1"): "o11"} , you could do that:如果您将 potential_outputs {("c1","p1"): "o11"}形式的字典,您可以这样做:

result = []
for a in record:
    if (a["Class"], a["Plan"]) in potential_outputs:
         result.append({"Name": a["Name"], "Output": potential_outputs[(a["Class"], a["Plan"])]})

That's maybe not the best way, but would be a pure Python way.这可能不是最好的方式,但会是纯粹的 Python 方式。

If you're interested in an one-liner如果你对单线感兴趣

result = [{"Name": r["Name"], "Output": o["Output"]} for r in record for o in potential_outputs if r["Class"] == o["Class"] and r["Plan"] == o["Plan"]]

You could restructure your potential_outputs as a dictionary:您可以将您的potential_outputs重组为字典:

potential_output_dict = {
    f"{o['Class']}_{o['Plan']}": o['Output'] for o in potential_outputs
}

output = []
for r in record:
    plan_key = f"{r['Class']}_{r['Plan']}"
    plan = potential_output_dict.get(plan_key)
    if not plan:
        continue

    output.append({
        "Name": r['Name'],
        "Plan": plan,
     })

print(output)

This way you are using get() which is a bit nicer than iterating over the list of dictionaries multiple times.这样您就可以使用get() ,这比多次迭代字典列表要好一些。

(code not tested) (代码未测试)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM