简体   繁体   English

遍历字典列表并在 Python 中识别字典中的相似值

[英]Iterate through list of dictionary and identify similar values in dictionary in Python

Suppose I have a list of dictionary as below假设我有一个字典列表如下

[{'name': 'User_ORDERS1234', 'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}], 'users': ['User_2']}, {'name': 'User_ORDERS1235', 'expressions': [{'exp': '"table"."ORDERS"."STATUS"  = \'Shipped\''}], 'users': ['User_1']}, {'name': 'User_ORDERS1236', 'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}], 'users': ['User_3']}]

On iterate of this list I want to check if the expressions(key) a sub list values are same as some other set of dictionary key expression values.In the above case users key with value-User_2 has same expression values as User_3 .In this case I want to delete the entire dictionary of User_3 and add append the value User_3 to User_2 list(as 'Users':['User_2','User_3'])在迭代此列表时,我想检查表达式(键)子列表值是否与其他一些字典键表达式值集相同。在上述情况下,值为 User_2 的用户键与 User_3 具有相同的表达式值。在此如果我想删除 User_3 的整个字典并将值 User_3 添加到 User_2 列表中(如 'Users':['User_2','User_3'])

exprected output:预期输出:

[{'name': 'User_ORDERS1234', 'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}], 'users': ['User_2','User_3']}, {'name': 'User_ORDERS1235', 'expressions': [{'exp': '"table"."ORDERS"."STATUS"  = \'Shipped\''}], 'users': ['User_1']}]

You can use enumerate to get the index and value of each order in the list of orders.您可以使用enumerate获取订单列表中每个订单的索引和值。 scanned_exp is a dictionary with the unique expression as the key and the value is the index in the list of orders in which the first occurrence of the unique expression was found. scanned_exp是一个字典,以唯一表达式为键,值是第一次出现唯一表达式的订单列表中的索引。 When iterating, we check if the current expression has already been scanned, ie, in scanned_exp .迭代时,我们检查当前表达式是否已经被扫描,即在scanned_exp If it has been found already, we extend the list of users at the index position of the first occurrence of that expression with the list of users from the current expression.如果已经找到,我们用当前表达式中的用户列表扩展该表达式第一次出现的索引位置处的用户列表。 We then delete the current order from the list using remove .然后我们使用remove从列表中remove当前订单。

scanned_exp = {}
for idx, order in enumerate(d):
    exp = order["expressions"][0]["exp"]
    if exp in scanned_exp:
        d[scanned_exp[exp]]["users"].extend(order["users"])
        d.remove(order)
    else:
        scanned_exp[exp] = idx

Your output then becomes:你的输出然后变成:

[
    {
        'name': 'User_ORDERS1234', 
        'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}], 
        'users': ['User_2', 'User_3']
    }, 
    {
        'name': 'User_ORDERS1235', 
        'expressions': [{'exp': '"table"."ORDERS"."STATUS"  = \'Shipped\''}], 
        'users': ['User_1']
    }
]

Edit编辑

Okay, let's make this dynamic.好的,让我们把它变成动态的。 Firstly, the keys of a dictionary cannot be lists (unhashable type), so this breaks our original implementation.首先,字典的键不能是列表(不可散列的类型),所以这打破了我们原来的实现。 above.以上。 A collection that is able to be used as a key is tuple (unless the tuple contains unhashable types, ie, list , dict ).可以用作键的集合是tuple (除非tuple包含不可散列的类型,即listdict )。 What we can do is make a tuple that contains all of the string values that appear as a value in the exp key.我们可以做的是创建一个tuple ,其中包含在exp键中显示为值的所有字符串值。

So, you can replace this:所以,你可以替换这个:

exp = order["expressions"][0]["exp"]

with this:有了这个:

exp = tuple(e["exp"] for e in order["expressions"])
orders = [{
    'name': 'User_ORDERS1234',
    'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}],
    'users': ['User_2']
},{
    'name': 'User_ORDERS1235',
    'expressions': [{'exp': '"table"."ORDERS"."STATUS"  = \'Shipped\''}],
    'users': ['User_1']
},{
    'name': 'User_ORDERS1236',
    'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}],
    'users': ['User_3']
}]

for i, order in enumerate(orders):                # loop trough orders:
    exp1 = order['expressions']                   # 'exp' value of the order

    for next_order in orders[i+1:]:               # loop through the next orders:
        exp2 = next_order['expressions']          # 'exp' value of a next order

        if exp1 == exp2:                          # if the 'exp' values are the same:
            order['users'] += next_order['users'] # add the 'users' to the order 'users'
            next_order['users'] = []              # remove users from the next order

orders = [o for o in orders if o['users']]        # leave only the orders that have 'users'

print(orders)

Output输出

[{
    'name': 'User_ORDERS1234',
    'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}],
    'users': ['User_2', 'User_3']
},{
    'name': 'User_ORDERS1235',
    'expressions': [{'exp': '"table"."ORDERS"."STATUS"  = \'Shipped\''}],
    'users': ['User_1']
}]
dictt = [{'name': 'User_ORDERS1234', 'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}], 'users': ['User_2']}, {'name': 'User_ORDERS1235', 'expressions': [{'exp': '"table"."ORDERS"."STATUS"  = \'Shipped\''}], 'users': ['User_1']}, {'name': 'User_ORDERS1236', 'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}], 'users': ['User_3']}]



def sorting_it(d):
    for n,c in enumerate([x['expressions'] for x in dictt]):
        if c == d['expressions'] and dictt[n] != d and d['users']:
            d['users'] = d['users'] + dictt[n]['users']
            del dictt[n]
f = list(map(sorting_it,dictt))

print(dictt)

>>> [{'name': 'User_ORDERS1234', 'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}], 'users': ['User_2', 'User_3']}, {'name': 'User_ORDERS1235', 'expressions': [{'exp': '"table"."ORDERS"."STATUS"  = \'Shipped\''}], 'users': ['User_1']}]

Explanation:解释:

f = list(map(sorting_it,dictt))

using the map function, every dictionary in dictt is passed through function sorting_it one at a time as the variable d, so first is:使用map函数,dictt 中的每个字典都通过函数sorting_it作为变量 d 一次传递一个,所以首先是:

{'name': 'User_ORDERS1234', 'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}], 'users': ['User_2']}

Now I'm looping through the the values of key 'expressions' , [x['expressions'] for x in dictt] is the list for this现在我正在遍历键'expressions'的值, [x['expressions'] for x in dictt]是这个列表

If the value of key 'expressions' in dictt d is equal to the value of key 'expressions' in [x['expressions'] for x in dictt] then I get the index n , use this to find the corresponding dictionary in dictt and add all the values for key 'expressions' together.如果密钥的值'expressions'在dictt d等于键的值'expressions'[x['expressions'] for x in dictt]然后我得到的索引n ,用它来发现在相应的字典dictt并将键'expressions'所有值加在一起。

I then do del dictt[n] since the user for that dictionary has already been added to another dictionary, so in this case dictionary for 'user_3' is deleted since they were added to dictionary for 'user_2' .然后我做del dictt[n]因为该字典的用户已经添加到另一个字典,所以在这种情况下, 'user_3'字典被删除,​​因为它们被添加到'user_2'字典中。

Also dictt[n] != d and d['users'] makes sure I'm not comparing the same dictionary.另外dictt[n] != d and d['users']确保我没有比较同一个字典。

def function_1(values):
  for j in range(len(values)):
    for k in range(j + 1, len(values)):
      if values[j]['expressions'] == values[k]['expressions']:
        values[j]['users'] = values[j]['users'] + values[k]['users'] 
  return values

#In the performance #在表演中

list_values = [{'name': 'User_ORDERS1234', 'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}], 'users': ['User_2']}, {'name': 'User_ORDERS1235', 'expressions': [{'exp': '"table"."ORDERS"."STATUS"  = \'Shipped\''}], 'users': ['User_1']}, {'name': 'User_ORDERS1236', 'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}], 'users': ['User_3']}]

#call the function #调用函数

function_1(list_values)

[{'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}],
  'name': 'User_ORDERS1234',
  'users': ['User_2', 'User_3']},
 {'expressions': [{'exp': '"table"."ORDERS"."STATUS"  = \'Shipped\''}],
  'name': 'User_ORDERS1235',
  'users': ['User_1']},
 {'expressions': [{'exp': '"table"."ORDERS"."STATUS" IN (\'Canceled\',\'Pending\')'}],
  'name': 'User_ORDERS1236',
  'users': ['User_3']}]
[ ]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM