简体   繁体   English

Python 字典:根据列表中的匹配值返回键

[英]Python Dictionary: Return keys based on matching values in list

I have a dictionary wherein the key is a unique name and the value is a list of non-unique names.我有一本字典,其中键是唯一名称,值是非唯一名称列表。 For domain knowledge sake, the keys are Tableau workbooks and the value is a list of tables the workbook connects to.对于领域知识,键是 Tableau 工作簿,值是工作簿连接到的表的列表。

What I am trying to do is return, for each key, every other key that has at least three matching values.我想要做的是为每个键返回至少具有三个匹配值的每个其他键。 Doing so will basically allow me to find workbooks that are overlapping data by using the same tables.这样做基本上可以让我通过使用相同的表找到重叠数据的工作簿。

Currently, I am able to find all keys that match a specific value by doing the following:目前,我可以通过执行以下操作找到与特定值匹配的所有键:

keys = [key for key, value in intersect_dict.items() if 'VOLUME_DIMENSION' in value]
keys

values = [value for key, value in intersect_dict.items() if 'VOLUME_DIMENSION' in value]
values

The output of keys is:键的输出是:

['(SAN) STORAGE GROUP INVENTORY AND CAPACITY',
 '(SAN) STORAGE GROUP INVENTORY AND CAPACITY V2',
 'SAN INVENTORY AND CAPACITY']

And the output of values is:值的输出是:

[['VOLUME_DIMENSION',
  'EXTENDED_DATA',
  'VOLUME_HISTORY_CAPACITY_FACT',
  'HOST_DIMENSION',
  'STORAGE_DIMENSION',
  'DATE_DIMENSION'],
 ['STORAGE_DIMENSION',
  'DATE_DIMENSION',
  'VOLUME_DIMENSION',
  'HOST_DIMENSION',
  'VOLUME_HISTORY_CAPACITY_FACT',
  'EXTENDED_DATA'],
 ['VOLUME_HISTORY_CAPACITY_FACT',
  'HOST_DIMENSION',
  'EXTENDED_DATA',
  'DATE_DIMENSION',
  'STORAGE_DIMENSION',
  'VOLUME_DIMENSION']]

Is there a possible way that I can do essentially the same thing except instead of有没有一种可能的方法可以让我基本上做同样的事情,除了

if 'VOLUME_DIMENSION' in value I have if values in value match 3 times or more ? if 'VOLUME_DIMENSION' in value if values in value match 3 times or more我有吗?

Please let me know if more info is needed.如果需要更多信息,请告诉我。

Edit1: Below is the input dictionary excerpt requested: Edit1:以下是请求的输入字典摘录:

{'(SAN) STORAGE GROUP INVENTORY AND CAPACITY': ['VOLUME_DIMENSION',
  'EXTENDED_DATA',
  'VOLUME_HISTORY_CAPACITY_FACT',
  'HOST_DIMENSION',
  'STORAGE_DIMENSION',
  'DATE_DIMENSION'],
 '(SAN) STORAGE GROUP INVENTORY AND CAPACITY V2': ['STORAGE_DIMENSION',
  'DATE_DIMENSION',
  'VOLUME_DIMENSION',
  'HOST_DIMENSION',
  'VOLUME_HISTORY_CAPACITY_FACT',
  'EXTENDED_DATA'],

The requested output would be something like:请求的输出类似于:

{'(SAN) STORAGE GROUP INVENTORY AND CAPACITY': workbook1, workbook7, workbook8}

The "workbooks" shown as values would be the workbooks who have three or more matching values with that key.显示为值的“工作簿”将是具有三个或更多与该键匹配的值的工作簿。

Edit2: Sorry for bad data format explanation.编辑 2:对不起,错误的数据格式解释。 Attempting to clarify that here.试图在这里澄清这一点。

d = { 
    'item1': ['A', 'B', 'C'], 
    'item2': ['A', 'B', 'C', 'D'], 
    'item3': ['A', 'C', 'D'], 
    'item4': ['B', 'C', 'D', 'E'], 
    'item5': ['A', 'B', 'C'], 
    'item6': ['A', 'B', 'C', 'E'], 
    }

Results = { 
    'item1': ['item2', 'item5', 'item6'] 
    'item2': ['item1', 'item5', 'item6'] 
    }

In the above example, d would be my overall dataset in dictionary form and Results are what I would like for the output to be.在上面的例子中,d 将是我的字典形式的整体数据集,而结果是我想要的输出。 So it would let me target which items are sharing data.所以它可以让我定位哪些项目正在共享数据。 Or in this case, sharing letters.或者在这种情况下,共享信件。

I would use set :我会使用set

d = {
    'item1': ['A', 'B', 'C'],
    'item2': ['A', 'B', 'C', 'D'],
    'item3': ['A', 'C', 'D'],
    'item4': ['B', 'C', 'D', 'E'],
}

search_items = {'A', 'B', 'C'}
keys = [key for key, value in d.items() if len(search_items & set(value)) >= 3]
print(keys)

values = [value for key, value in d.items() if len(search_items & set(value)) >= 3]
print(values)

Output:输出:

['item1', 'item2']
[['A', 'B', 'C'], ['A', 'B', 'C', 'D']]

To get all keys that share three or more items, you can do:要获取共享三个或更多项目的所有密钥,您可以执行以下操作:

common_items = [
    (search_key, key, set(search_values) & set(values))
    for search_key, search_values in d.items()
    for key, values in d.items()
    if search_key != key and len(set(search_values) & set(values)) >= 3
]
print(common_items)
[('item1', 'item2', {'C', 'B', 'A'}),
 ('item2', 'item1', {'C', 'B', 'A'}),
 ('item2', 'item3', {'C', 'D', 'A'}),
 ('item2', 'item4', {'C', 'D', 'B'}),
 ('item3', 'item2', {'C', 'D', 'A'}),
 ('item4', 'item2', {'C', 'D', 'B'})]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM