简体   繁体   中英

Python Dictionary: Return keys based on matching values in list

I have a dictionary wherein the key is a unique name and the value is a list of non-unique names. For domain knowledge sake, the keys are Tableau workbooks and the value is a list of tables the workbook connects to.

What I am trying to do is return, for each key, every other key that has at least three matching values. Doing so will basically allow me to find workbooks that are overlapping data by using the same tables.

Currently, I am able to find all keys that match a specific value by doing the following:

keys = [key for key, value in intersect_dict.items() if 'VOLUME_DIMENSION' in value]
keys

values = [value for key, value in intersect_dict.items() if 'VOLUME_DIMENSION' in value]
values

The output of keys is:

['(SAN) STORAGE GROUP INVENTORY AND CAPACITY',
 '(SAN) STORAGE GROUP INVENTORY AND CAPACITY V2',
 'SAN INVENTORY AND CAPACITY']

And the output of values is:

[['VOLUME_DIMENSION',
  'EXTENDED_DATA',
  'VOLUME_HISTORY_CAPACITY_FACT',
  'HOST_DIMENSION',
  'STORAGE_DIMENSION',
  'DATE_DIMENSION'],
 ['STORAGE_DIMENSION',
  'DATE_DIMENSION',
  'VOLUME_DIMENSION',
  'HOST_DIMENSION',
  'VOLUME_HISTORY_CAPACITY_FACT',
  'EXTENDED_DATA'],
 ['VOLUME_HISTORY_CAPACITY_FACT',
  'HOST_DIMENSION',
  'EXTENDED_DATA',
  'DATE_DIMENSION',
  'STORAGE_DIMENSION',
  'VOLUME_DIMENSION']]

Is there a possible way that I can do essentially the same thing except instead of

if 'VOLUME_DIMENSION' in value I have if values in value match 3 times or more ?

Please let me know if more info is needed.

Edit1: Below is the input dictionary excerpt requested:

{'(SAN) STORAGE GROUP INVENTORY AND CAPACITY': ['VOLUME_DIMENSION',
  'EXTENDED_DATA',
  'VOLUME_HISTORY_CAPACITY_FACT',
  'HOST_DIMENSION',
  'STORAGE_DIMENSION',
  'DATE_DIMENSION'],
 '(SAN) STORAGE GROUP INVENTORY AND CAPACITY V2': ['STORAGE_DIMENSION',
  'DATE_DIMENSION',
  'VOLUME_DIMENSION',
  'HOST_DIMENSION',
  'VOLUME_HISTORY_CAPACITY_FACT',
  'EXTENDED_DATA'],

The requested output would be something like:

{'(SAN) STORAGE GROUP INVENTORY AND CAPACITY': workbook1, workbook7, workbook8}

The "workbooks" shown as values would be the workbooks who have three or more matching values with that key.

Edit2: Sorry for bad data format explanation. Attempting to clarify that here.

d = { 
    'item1': ['A', 'B', 'C'], 
    'item2': ['A', 'B', 'C', 'D'], 
    'item3': ['A', 'C', 'D'], 
    'item4': ['B', 'C', 'D', 'E'], 
    'item5': ['A', 'B', 'C'], 
    'item6': ['A', 'B', 'C', 'E'], 
    }

Results = { 
    'item1': ['item2', 'item5', 'item6'] 
    'item2': ['item1', 'item5', 'item6'] 
    }

In the above example, d would be my overall dataset in dictionary form and Results are what I would like for the output to be. So it would let me target which items are sharing data. Or in this case, sharing letters.

I would use set :

d = {
    'item1': ['A', 'B', 'C'],
    'item2': ['A', 'B', 'C', 'D'],
    'item3': ['A', 'C', 'D'],
    'item4': ['B', 'C', 'D', 'E'],
}

search_items = {'A', 'B', 'C'}
keys = [key for key, value in d.items() if len(search_items & set(value)) >= 3]
print(keys)

values = [value for key, value in d.items() if len(search_items & set(value)) >= 3]
print(values)

Output:

['item1', 'item2']
[['A', 'B', 'C'], ['A', 'B', 'C', 'D']]

To get all keys that share three or more items, you can do:

common_items = [
    (search_key, key, set(search_values) & set(values))
    for search_key, search_values in d.items()
    for key, values in d.items()
    if search_key != key and len(set(search_values) & set(values)) >= 3
]
print(common_items)
[('item1', 'item2', {'C', 'B', 'A'}),
 ('item2', 'item1', {'C', 'B', 'A'}),
 ('item2', 'item3', {'C', 'D', 'A'}),
 ('item2', 'item4', {'C', 'D', 'B'}),
 ('item3', 'item2', {'C', 'D', 'A'}),
 ('item4', 'item2', {'C', 'D', 'B'})]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM