简体   繁体   English

将JSON元素附加到列表中,然后在Python中有效地删除重复项

[英]Append JSON elements to a list, then remove duplicates efficiently in Python

I have a json file that's like for eg 我有一个json文件,例如

[{"fu": "thejimjams", "su": 232104580}, {"fu": "thejimjams", "su": 216575430}, {"fu": "thejimjams", "su": 184695850}] [{“ fu”:“ thejimjams”,“ su”:232104580},{“ fu”:“ thejimjams”,“ su”:216575430},{“ fu”:“ thejimjams”,“ su”:184695850}

I need to put all the values for a bunch of json files in the "su" category in a list. 我需要将一堆json文件的所有值放在列表的“ su”类别中。 So each file (about 200) will have their own list, then I'm going to combine the list and remove duplicates. 因此,每个文件(大约200个)都有自己的列表,然后我将合并列表并删除重复项。 Is there and advisable while I go about doing this to save system resources and time? 为了节省系统资源和时间,是否有建议这样做?

I'm thinking of making a list, loop through the json file get each "su" put it on a list go to the next file then append list, then scan through to remove duplicates. 我正在考虑创建一个列表,遍历json文件,将每个“ su”放到列表中,然后转到下一个文件,然后追加列表,然后进行扫描以删除重复项。

In terms of removing the duplicates I'm thinking of following what the answer was on this question: Combining two lists and removing duplicates, without removing duplicates in original list unless that's not efficient 在删除重复项方面,我正在考虑遵循该问题的答案: 合并两个列表并删除重复项,除非效率不高,否则不删除原始列表中的重复项

Basically open to recommendations about a good way to implement this. 基本上可以接受有关实现此方法的建议。

Thanks, 谢谢,

Do you care about order? 您关心订单吗? If not you can add the numbers to a set() which will automatically remove duplicates. 如果没有,您可以将数字添加到set() ,这将自动删除重复项。 For example, if you have 200 "su" lists: 例如,如果您有200个“ su”列表:

lists = [
    [...su's for file 1...],
    [...su's for file 2...],
    etc.
]

Then you can combine them into one big set with: 然后,您可以将它们组合成一个大集合:

set(su for sus in lists for su in sus)

Very straight forward way would be: 非常简单的方法是:

json_list = [{"fu": "thejimjams", "su": 232104580}, {"fu": "thejimjams", "su": 216575430}, {"fu": "thejimjams", "su": 184695850}]

new_list = []
for item in json_list:
    if item not in new_list:
        new_list.append(item)

Use a python set which is designed to keep a unique list of elements. 使用旨在保留元素唯一列表的python集 That will remove duplicates as you add elements. 这将在您添加元素时删除重复项。

output = set()
for filename in filenames:
    data = json.loads(open(filename, 'r').read())
    for row in data:
        output.add(row.get('su'))

# convert back to a list
output = list(output)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM