[英]Append unique values in json
我有以下清單;
[
{
"title": "title1",
"url": "https://myurl/entry/1",
"author": "john",
"count": 5
},
{
"title": "title1",
"url": "https://myurl/entry/2",
"author": "marry",
"count": 19
},
{
"title": "title1",
"url": "https://myurl/entry/1",
"author": "john",
"count": 45
},
{
"title": "title2",
"url": "https://myurl/entry/5",
"author": "jane",
"count": 34
}
]
我正在嘗試將此列表 append 放入 json 文件中,但我只想要 append 唯一值。 如您所見,我的第一個和第三個項目具有完全相同的標題、url 和作者。 唯一的區別是計數。 我只想 append 這兩個項目中的一個,不管它們的數量如何。 Append 如果標題、url 和作者相同則先忽略其他。 最終 json 文件將按計數降序排序。
我嘗試了下面的代碼,但它仍在附加非唯一值。
newlist=[]
[newlist.append(x) for x in originallist if x not in newlist]
newlist = sorted(newlist, key=lambda k: k.get('count', 0), reverse=True)
ofile = "final.json"
with open(ofile, 'w') as outfile:
json.dump(newlist, outfile,indent=2)
我的最終 json 文件應如下所示。 按計數排序,僅插入唯一值。
[
{
"title": "title2",
"url": "https://myurl/entry/5",
"author": "jane",
"count": 34
},
{
"title": "title1",
"url": "https://myurl/entry/2",
"author": "marry",
"count": 19
},
{
"title": "title1",
"url": "https://myurl/entry/1",
"author": "john",
"count": 7
}
]
知道我在這里缺少什么嗎?
您可以使用帶有元組鍵的臨時字典,其中包含要檢查唯一性的字段,例如:
originallist = [
{
"title": "title1",
"url": "https://myurl/entry/1",
"author": "john",
"count": 5
},
{
"title": "title1",
"url": "https://myurl/entry/2",
"author": "marry",
"count": 19
},
{
"title": "title1",
"url": "https://myurl/entry/1",
"author": "john",
"count": 45
},
{
"title": "title2",
"url": "https://myurl/entry/5",
"author": "jane",
"count": 34
}
]
unique_dict = {(d["title"], d["url"], d["author"]): d for d in originallist}
newlist = list(unique_dict.values())
變量newlist
現在應該包含 3 個唯一的字典。
知道我在這里缺少什么嗎? 這個:
[newlist.append(x) for x in originallist if x not in newlist]
如果任何值不同,確實將字典視為不同 - 因此將具有不同"count"
的字典視為不同。 對我來說,您的任務看起來像是從itertools 食譜中看到的獨特任務。 做就是了:
from itertools import filterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
然后:
original_list = [
{
"title": "title1",
"url": "https://myurl/entry/1",
"author": "john",
"count": 5
},
{
"title": "title1",
"url": "https://myurl/entry/2",
"author": "marry",
"count": 19
},
{
"title": "title1",
"url": "https://myurl/entry/1",
"author": "john",
"count": 45
},
{
"title": "title2",
"url": "https://myurl/entry/5",
"author": "jane",
"count": 34
}
]
unique_list = list(unique_everseen(original_list, lambda x:(x['title'],x['url'],x['author'])))
print(unique_list)
Output:
[{'title': 'title1', 'url': 'https://myurl/entry/1', 'author': 'john', 'count': 5}, {'title': 'title1', 'url': 'https://myurl/entry/2', 'author': 'marry', 'count': 19}, {'title': 'title2', 'url': 'https://myurl/entry/5', 'author': 'jane', 'count': 34}]
請注意,我使用lambda x:(x['title'],x['url'],x['author'])
因此說如果標題、url、作者的值相同,請考慮兩個元素相同,請注意此解決方案假定列表中的所有元素都有title
、 url
和author
。
與@Selcuk 的答案相同,但由於您要求的行為:
Append 如果標題、url 和作者相同則先忽略其他。
稍作修改。 @Seulcuk 的解決方案將更新unique_dict
元素而不是保留原始元素。 在您的情況下,這意味着count
元素將更新為originallist
中的后者。 (帶有https://myurl/entry/1
的title1
變為count = 45
)通過在迭代中簡單地反轉originallist
列表,此更新將等於保留第一個然后忽略。 通過反轉列表,最終的新列表將有newlist
count = 5
而不是count = 45
。
編輯:@Selcuk 的反饋,使用reversed(originallist)
而不是originallist[::-1]
。
originallist = [
{
"title": "title1",
"url": "https://myurl/entry/1",
"author": "john",
"count": 5
},
{
"title": "title1",
"url": "https://myurl/entry/2",
"author": "marry",
"count": 19
},
{
"title": "title1",
"url": "https://myurl/entry/1",
"author": "john",
"count": 45
},
{
"title": "title2",
"url": "https://myurl/entry/5",
"author": "jane",
"count": 34
}
]
unique_dict = {(d["title"], d["url"], d["author"]): d for d in reversed(originallist)}
newlist = list(unique_dict.values())
好吧,我認為問題在於您所依賴的in
運算符利用了字典實例的地址,而不是它們的值。 如果您可以僅依靠 url 來建立唯一性,那么您可能可以如下表達您的列表理解:
urls = set([item["url"] for item in originallist])
[x for x in originallist if x["url"] not in urls]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.