简体   繁体   中英

Python deduplicate list of Dictionaries by Value of a Key

I have a pretty basic (but not quite working) function to dedupe a list of dictionaries from key values by adding the key value to a list for keeping track.

def dedupe(rs):
    delist = []
    for r in rs:
        if r['key'] not in delist:
            delist.append(r['key'])
        else:
            rs.remove(r)
    return rs

Which gets used in the script just below on two lists of dictionaries:

from pprint import pprint

records = [
 {'key': 'Item 1',
  'name': 'Item 1',
  'positions': [['00:00:00', '00:05:54'],
                ['00:05:55', '00:07:54'],
                ['00:16:47', '00:20:04']]},
 {'key': 'Item 1',
  'name': 'Item 1',
  'positions': [['00:05:55', '00:07:54'],
                ['00:00:00', '00:05:54'],
                ['00:16:47', '00:20:04']]},
 {'key': 'Item 1',
  'name': 'Item 1',
  'positions': [['00:16:47', '00:20:04'],
                ['00:00:00', '00:05:54'],
                ['00:05:55', '00:07:54']]},
 {'key': 'Item 2',
  'name': 'Item 2',
  'positions': [['00:07:55', '00:11:23'], ['00:11:24', '00:16:46']]},
 {'key': 'Item 2',
  'name': 'Item 2',
  'positions': [['00:11:24', '00:16:46'], ['00:07:55', '00:11:23']]},
 {'key': 'Item 3', 'name': 'Item 3', 'positions': [['00:20:05', '00:25:56']]}
]

records2 = [
 {'key': 'Item 1',
  'name': 'Item 1',
  'positions': [['00:00:00', '00:05:54'],
                ['00:05:55', '00:07:54'],
                ['00:16:47', '00:20:04']]},
 {'key': 'Item 1',
  'name': 'Item 1',
  'positions': [['00:05:55', '00:07:54'],
                ['00:00:00', '00:05:54'],
                ['00:16:47', '00:20:04']]},
 {'key': 'Item 2',
  'name': 'Item 2',
  'positions': [['00:07:55', '00:11:23'], ['00:11:24', '00:16:46']]},
 {'key': 'Item 1',
  'name': 'Item 1',
  'positions': [['00:16:47', '00:20:04'],
                ['00:00:00', '00:05:54'],
                ['00:05:55', '00:07:54']]},
 {'key': 'Item 2',
  'name': 'Item 2',
  'positions': [['00:11:24', '00:16:46'], ['00:07:55', '00:11:23']]},
 {'key': 'Item 3', 'name': 'Item 3', 'positions': [['00:20:05', '00:25:56']]}
]


def dedupe(rs):
    delist = []
    for r in rs:
        if r['key'] not in delist:
            delist.append(r['key'])
        else:
            rs.remove(r)
    return rs

if __name__ == '__main__':
   res = dedupe(records)
   res2 = dedupe(records2)
   pprint(res)
   pprint(res2)

For either records or records2 , I would expect to get:

[
 {'key': 'Item 1',
  'name': 'Item 1',
  'positions': [['00:00:00', '00:05:54'],
                ['00:05:55', '00:07:54'],
                ['00:16:47', '00:20:04']]},
 {'key': 'Item 2',
  'name': 'Item 2',
  'positions': [['00:07:55', '00:11:23'], ['00:11:24', '00:16:46']]},
 {'key': 'Item 3', 
  'name': 'Item 3', 
  'positions': [['00:20:05', '00:25:56']]}
]

But instead I get (for each of the two inputs):

[
 {'key': 'Item 1',
  'name': 'Item 1',
  'positions': [['00:00:00', '00:05:54'],
                ['00:05:55', '00:07:54'],
                ['00:16:47', '00:20:04']]},
 {'key': 'Item 1',
  'name': 'Item 1',
  'positions': [['00:16:47', '00:20:04'],
                ['00:00:00', '00:05:54'],
                ['00:05:55', '00:07:54']]},
 {'key': 'Item 2',
  'name': 'Item 2',
  'positions': [['00:07:55', '00:11:23'], ['00:11:24', '00:16:46']]},
 {'key': 'Item 3', 'name': 'Item 3', 'positions': [['00:20:05', '00:25:56']]}
]

[
 {'key': 'Item 1',
  'name': 'Item 1',
  'positions': [['00:00:00', '00:05:54'],
                ['00:05:55', '00:07:54'],
                ['00:16:47', '00:20:04']]},
 {'key': 'Item 2',
  'name': 'Item 2',
  'positions': [['00:07:55', '00:11:23'], ['00:11:24', '00:16:46']]},
 {'key': 'Item 2',
  'name': 'Item 2',
  'positions': [['00:11:24', '00:16:46'], ['00:07:55', '00:11:23']]},
 {'key': 'Item 3', 'name': 'Item 3', 'positions': [['00:20:05', '00:25:56']]}
]

I keep staring at and tweaking this, but it's not clear to me why it is not deleting the third instance if they are in sequence ( records ), or works for the one with three, but fails on the one with two if the one with three instances are broken up ( records2 ).

I wouldn't remove elements from an iterator while iterating it.

Instead do this:

def dedupe(rs):
    delist = []
    new_rs = []
    for r in rs:
        if r['key'] not in delist:
            print r['key']
            delist.append(r['key'])
            new_rs.append(r)

    return new_rs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM