I have a list of dictionaries where each dictionary is information about an article. Sometimes, the same article "title" repeats across dictionaries. I want to remove these duplicate dictionaries so that each article in the list of dictionaries is unique by title
ie no title repeats across the dictionaries.
I have
data = [{'title':'abc','source':'x','url':'abcx.com'},
{'title':'abc','source':'y','url':'abcy.com'},
{'title':'def','source':'g','url':'defg.com'}]
Expected result:
data = [{'title':'abc','source':'x','url':'abcx.com'},
{'title':'def','source':'g','url':'defg.com'}]
A quick way is to keep a track of titles you have seen:
titles_seen = set() #thank you @Mark Meyer
data = [{'title':'abc','source':'x','url':'abcx.com'},
{'title':'abc','source':'y','url':'abcy.com'},
{'title':'def','source':'g','url':'defg.com'}]
new_data = []
for item in data:
if item['title'] not in titles_seen:
new_data.append(item)
titles_seen.add(item['title'])
As @Mark Meyer points out in the comments, you can use title
as the key in the dictionary, which will eliminate duplicates due to the hashing of the title, or, one may define an Entry
class, and then simply use frozenset
(potential overkill):
>>> data
[<Entry title=abc source=x url=abcx.com />, <Entry title=abc source=y url=abcy.com />, <Entry title=def source=g url=defg.com />]
>>> frozenset(data)
frozenset({<Entry title=def source=g url=defg.com />, <Entry title=abc source=x url=abcx.com />})
class Entry:
def __init__(self, title, source, url):
self.title = title
self.source = source
self.url = url
def __hash__(self):
return hash(self.title)
def __eq__(self, other):
if isinstance(other, Entry):
return self.title == other.title
return False
def __ne__(self, other):
return (not self.__eq__(other))
def __repr__(self):
return "<Entry title={} source={} url={} />".format(self.title, self.source, self.url)
But a better way is simply to check if the title exists before adding to the list in the first place.
Two lines with set:
tmp = set()
result = [tmp.add(i['title']) or i for i in data if i['title'] not in tmp]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.