I have a pretty basic (but not quite working) function to dedupe a list of dictionaries from key values by adding the key value to a list for keeping track.
def dedupe(rs):
delist = []
for r in rs:
if r['key'] not in delist:
delist.append(r['key'])
else:
rs.remove(r)
return rs
Which gets used in the script just below on two lists of dictionaries:
from pprint import pprint
records = [
{'key': 'Item 1',
'name': 'Item 1',
'positions': [['00:00:00', '00:05:54'],
['00:05:55', '00:07:54'],
['00:16:47', '00:20:04']]},
{'key': 'Item 1',
'name': 'Item 1',
'positions': [['00:05:55', '00:07:54'],
['00:00:00', '00:05:54'],
['00:16:47', '00:20:04']]},
{'key': 'Item 1',
'name': 'Item 1',
'positions': [['00:16:47', '00:20:04'],
['00:00:00', '00:05:54'],
['00:05:55', '00:07:54']]},
{'key': 'Item 2',
'name': 'Item 2',
'positions': [['00:07:55', '00:11:23'], ['00:11:24', '00:16:46']]},
{'key': 'Item 2',
'name': 'Item 2',
'positions': [['00:11:24', '00:16:46'], ['00:07:55', '00:11:23']]},
{'key': 'Item 3', 'name': 'Item 3', 'positions': [['00:20:05', '00:25:56']]}
]
records2 = [
{'key': 'Item 1',
'name': 'Item 1',
'positions': [['00:00:00', '00:05:54'],
['00:05:55', '00:07:54'],
['00:16:47', '00:20:04']]},
{'key': 'Item 1',
'name': 'Item 1',
'positions': [['00:05:55', '00:07:54'],
['00:00:00', '00:05:54'],
['00:16:47', '00:20:04']]},
{'key': 'Item 2',
'name': 'Item 2',
'positions': [['00:07:55', '00:11:23'], ['00:11:24', '00:16:46']]},
{'key': 'Item 1',
'name': 'Item 1',
'positions': [['00:16:47', '00:20:04'],
['00:00:00', '00:05:54'],
['00:05:55', '00:07:54']]},
{'key': 'Item 2',
'name': 'Item 2',
'positions': [['00:11:24', '00:16:46'], ['00:07:55', '00:11:23']]},
{'key': 'Item 3', 'name': 'Item 3', 'positions': [['00:20:05', '00:25:56']]}
]
def dedupe(rs):
delist = []
for r in rs:
if r['key'] not in delist:
delist.append(r['key'])
else:
rs.remove(r)
return rs
if __name__ == '__main__':
res = dedupe(records)
res2 = dedupe(records2)
pprint(res)
pprint(res2)
For either records
or records2
, I would expect to get:
[
{'key': 'Item 1',
'name': 'Item 1',
'positions': [['00:00:00', '00:05:54'],
['00:05:55', '00:07:54'],
['00:16:47', '00:20:04']]},
{'key': 'Item 2',
'name': 'Item 2',
'positions': [['00:07:55', '00:11:23'], ['00:11:24', '00:16:46']]},
{'key': 'Item 3',
'name': 'Item 3',
'positions': [['00:20:05', '00:25:56']]}
]
But instead I get (for each of the two inputs):
[
{'key': 'Item 1',
'name': 'Item 1',
'positions': [['00:00:00', '00:05:54'],
['00:05:55', '00:07:54'],
['00:16:47', '00:20:04']]},
{'key': 'Item 1',
'name': 'Item 1',
'positions': [['00:16:47', '00:20:04'],
['00:00:00', '00:05:54'],
['00:05:55', '00:07:54']]},
{'key': 'Item 2',
'name': 'Item 2',
'positions': [['00:07:55', '00:11:23'], ['00:11:24', '00:16:46']]},
{'key': 'Item 3', 'name': 'Item 3', 'positions': [['00:20:05', '00:25:56']]}
]
[
{'key': 'Item 1',
'name': 'Item 1',
'positions': [['00:00:00', '00:05:54'],
['00:05:55', '00:07:54'],
['00:16:47', '00:20:04']]},
{'key': 'Item 2',
'name': 'Item 2',
'positions': [['00:07:55', '00:11:23'], ['00:11:24', '00:16:46']]},
{'key': 'Item 2',
'name': 'Item 2',
'positions': [['00:11:24', '00:16:46'], ['00:07:55', '00:11:23']]},
{'key': 'Item 3', 'name': 'Item 3', 'positions': [['00:20:05', '00:25:56']]}
]
I keep staring at and tweaking this, but it's not clear to me why it is not deleting the third instance if they are in sequence ( records
), or works for the one with three, but fails on the one with two if the one with three instances are broken up ( records2
).
I wouldn't remove elements from an iterator while iterating it.
Instead do this:
def dedupe(rs):
delist = []
new_rs = []
for r in rs:
if r['key'] not in delist:
print r['key']
delist.append(r['key'])
new_rs.append(r)
return new_rs
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.