簡體   English   中英

Append json 中的唯一值

[英]Append unique values in json

我有以下清單;

[
  {
    "title": "title1",
    "url": "https://myurl/entry/1",
    "author": "john",
    "count": 5
  },
  {
    "title": "title1",
    "url": "https://myurl/entry/2",
    "author": "marry",
    "count": 19
  },
  {
    "title": "title1",
    "url": "https://myurl/entry/1",
    "author": "john",
    "count": 45
  },
  {
    "title": "title2",
    "url": "https://myurl/entry/5",
    "author": "jane",
    "count": 34
  }
]

我正在嘗試將此列表 append 放入 json 文件中,但我只想要 append 唯一值。 如您所見,我的第一個和第三個項目具有完全相同的標題、url 和作者。 唯一的區別是計數。 我只想 append 這兩個項目中的一個,不管它們的數量如何。 Append 如果標題、url 和作者相同則先忽略其他。 最終 json 文件將按計數降序排序。

我嘗試了下面的代碼,但它仍在附加非唯一值。

newlist=[]
[newlist.append(x) for x in originallist if x not in newlist] 
newlist = sorted(newlist, key=lambda k: k.get('count', 0), reverse=True)

ofile = "final.json"

with open(ofile, 'w') as outfile:
    json.dump(newlist, outfile,indent=2)

我的最終 json 文件應如下所示。 按計數排序,僅插入唯一值。

[
  {
    "title": "title2",
    "url": "https://myurl/entry/5",
    "author": "jane",
    "count": 34
  },
  {
    "title": "title1",
    "url": "https://myurl/entry/2",
    "author": "marry",
    "count": 19
  },
  {
    "title": "title1",
    "url": "https://myurl/entry/1",
    "author": "john",
    "count": 7
  }
]

知道我在這里缺少什么嗎?

您可以使用帶有元組鍵的臨時字典,其中包含要檢查唯一性的字段,例如:

originallist = [
  {
    "title": "title1",
    "url": "https://myurl/entry/1",
    "author": "john",
    "count": 5
  },
  {
    "title": "title1",
    "url": "https://myurl/entry/2",
    "author": "marry",
    "count": 19
  },
  {
    "title": "title1",
    "url": "https://myurl/entry/1",
    "author": "john",
    "count": 45
  },
  {
    "title": "title2",
    "url": "https://myurl/entry/5",
    "author": "jane",
    "count": 34
  }
]

unique_dict = {(d["title"], d["url"], d["author"]): d for d in originallist}
newlist = list(unique_dict.values())

變量newlist現在應該包含 3 個唯一的字典。

知道我在這里缺少什么嗎? 這個:

[newlist.append(x) for x in originallist if x not in newlist]

如果任何值不同,確實將字典視為不同 - 因此將具有不同"count"的字典視為不同。 對我來說,您的任務看起來像是從itertools 食譜中看到的獨特任務。 做就是了:

from itertools import filterfalse
def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in filterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

然后:

original_list = [
  {
    "title": "title1",
    "url": "https://myurl/entry/1",
    "author": "john",
    "count": 5
  },
  {
    "title": "title1",
    "url": "https://myurl/entry/2",
    "author": "marry",
    "count": 19
  },
  {
    "title": "title1",
    "url": "https://myurl/entry/1",
    "author": "john",
    "count": 45
  },
  {
    "title": "title2",
    "url": "https://myurl/entry/5",
    "author": "jane",
    "count": 34
  }
]
unique_list = list(unique_everseen(original_list, lambda x:(x['title'],x['url'],x['author'])))
print(unique_list)

Output:

[{'title': 'title1', 'url': 'https://myurl/entry/1', 'author': 'john', 'count': 5}, {'title': 'title1', 'url': 'https://myurl/entry/2', 'author': 'marry', 'count': 19}, {'title': 'title2', 'url': 'https://myurl/entry/5', 'author': 'jane', 'count': 34}]

請注意,我使用lambda x:(x['title'],x['url'],x['author'])因此說如果標題、url、作者的值相同,請考慮兩個元素相同,請注意此解決方案假定列表中的所有元素都有titleurlauthor

與@Selcuk 的答案相同,但由於您要求的行為:

Append 如果標題、url 和作者相同則先忽略其他。

稍作修改。 @Seulcuk 的解決方案將更新unique_dict元素而不是保留原始元素。 在您的情況下,這意味着count元素將更新為originallist中的后者。 (帶有https://myurl/entry/1title1變為count = 45 )通過在迭代中簡單地反轉originallist列表,此更新將等於保留第一個然后忽略。 通過反轉列表,最終的新列表將有newlist count = 5而不是count = 45

編輯:@Selcuk 的反饋,使用reversed(originallist)而不是originallist[::-1]

originallist = [
  {
    "title": "title1",
    "url": "https://myurl/entry/1",
    "author": "john",
    "count": 5
  },
  {
    "title": "title1",
    "url": "https://myurl/entry/2",
    "author": "marry",
    "count": 19
  },
  {
    "title": "title1",
    "url": "https://myurl/entry/1",
    "author": "john",
    "count": 45
  },
  {
    "title": "title2",
    "url": "https://myurl/entry/5",
    "author": "jane",
    "count": 34
  }
]

unique_dict = {(d["title"], d["url"], d["author"]): d for d in reversed(originallist)}
newlist = list(unique_dict.values())

好吧,我認為問題在於您所依賴的in運算符利用了字典實例的地址,而不是它們的 如果您可以僅依靠 url 來建立唯一性,那么您可能可以如下表達您的列表理解:

urls = set([item["url"] for item in originallist])

[x for x in originallist if x["url"] not in urls] 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM