简体   繁体   English

如何使用Python从输出中删除重复项?

[英]How to remove duplicates from output with Python?

facing a issue here: 在这里面临一个问题:

Following example: 以下示例:

for item in g_data:
        Header = item.find_all("div", {"class": "InnprodInfos"})
        print(Header[0].contents[0].text.strip())

Output: 输出:

DMZ 3rd Tunnel - Korean Demilitarized Zone Day Tour from Seoul
Panmunjeom Day Tour
Seoul City Half Day Private Tour
The Soul of Seoul - Small Group Tour
Seoul Helicopter Tour
Seoul City Full Day Tour
Seoul City Half Day Tour
The Street Museum in the Urban Core - Small Group Tour
Korean Folk Village Day Tour
DMZ 3rd Tunnel - Korean Demilitarized Zone Day Tour from Seoul
Panmunjeom Day Tour
Seoul City Half Day Private Tour
The Soul of Seoul - Small Group Tour
Seoul Helicopter Tour
Seoul City Full Day Tour
Seoul City Half Day Tour
The Street Museum in the Urban Core - Small Group Tour
Korean Folk Village Day Tour

As you can see above, it gives me the output twice. 如您在上面看到的,它给了我两次输出。 Hence, only the second duplicates should be removed. 因此,只应删除第二个重复项。

The result should look like: 结果应如下所示:

DMZ 3rd Tunnel - Korean Demilitarized Zone Day Tour from Seoul
Panmunjeom Day Tour
Seoul City Half Day Private Tour
The Soul of Seoul - Small Group Tour
Seoul Helicopter Tour
Seoul City Full Day Tour
Seoul City Half Day Tour
The Street Museum in the Urban Core - Small Group Tour
Korean Folk Village Day Tour

Can anyone provide me feedback how to delete the duplicates? 谁能给我反馈如何删除重复项? Any feedback is appreciated. 任何反馈表示赞赏。

You can use list or set (if order doesn't matter): 您可以使用列表或设置(如果顺序无关紧要):

Using list: 使用清单:

result = []
for item in g_data:
    header = item.find_all("div", {"class": "InnprodInfos"})
    item = header[0].contents[0].text.strip()
    if item not in result:
        result.append(item)

print '\n'.join(result)

Using set: 使用设置:

result = set()
for item in g_data:
    header = item.find_all("div", {"class": "InnprodInfos"})
    result.add(header[0].contents[0].text.strip())

print '\n'.join(result)

You should store the output in a set to verify if it has been "printed" already. 您应该将输出存储在集中以验证它是否已经“打印”。 After that you print out the elements of the set. 之后,您可以打印出集合的元素。

g_data = ["foo", "bar", "foo"]
g_unique = set()
for item in g_data:
        g_unique.add(item) # ensures the element will only be copied if not already in the set

for item in g_unique:
    print(item) # {'foo', 'bar'}

You can use a set to keep track of which items you have printed. 您可以使用一set来跟踪已打印的项目。 This preserves the original order 这将保留原始顺序

already_printed = set()
for item in g_data:
    header = item.find_all("div", {"class": "InnprodInfos"})
    item = header[0].contents[0].text.strip()
    if item not in already_printed:
        print(item)
        already_printed.add(item)

There is a simple way to do this using list comprehension :) 有一个简单的方法可以使用列表理解:)

s = set()
[s.add(text) for d_text in Header[0].contents[0].text.strip().split('\n')]
print('\n'.join([text for text in s]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM