简体   繁体   English

从python中的列表中删除重复项

[英]Remove duplicates from list in python

Code below, gets the answer through get request and writes the result to the list "RESULT" 下面的代码通过获取请求获取答案,并将结果写入列表“ RESULT”

for i in url:
    df = pd.read_html(i,header=0)[0]
    df = df.as_matrix().tolist()
    for item in df:           
        RESULT.append(item)

I use the code below to exclude duplicate entries: 我使用下面的代码排除重复的条目:

def unique_items(RESULT):
found = set()
for item in RESULT:
    if item[0] not in found:
        yield item
        found.add(item[0])
NOT_DUBLICATE = (list(unique_items(RESULT)))
print(NOT_DUBLICATE)

It seems to me it is not optimal since it is necessary to get a list of all the rows to exclude duplicates. 在我看来,这不是最佳方法,因为有必要获取所有行的列表以排除重复项。

How can I find duplicates before loading a rows into the list RESULT? 在将行加载到结果列表中之前,如何查找重复项?

for example, the rows I write to the list RESULT: 例如,我写入列表RESULT的行:

[[55323602, 'system]
,[55323603, 'system]]
[[55323602, 'system]
,[55323603, 'system]]

Instead of use another method to exclude duplicate entries, append item to the list if item doesn't exist in the list RESULT . 如果item RESULT中不存在item ,请使用另一种方法排除重复项,而不是将item追加到列表中。 Then you don't need method unique_items() . 然后,您不需要方法unique_items()

You can find duplicates before loading a row into the list RESULT using this: 您可以使用以下方法在将行加载到RESULT列表中之前找到重复项:

for i in url:
    df = pd.read_html(i,header=0)[0]
    df = df.as_matrix().tolist()
    for item in df:  
        if item not in RESULT         
            RESULT.append(item)

Just use a set instead of a list. 只需使用一组而不是列表。

result = set()
for i in url:
    df = pd.read_html(i,header=0)[0]
    df_list = df.as_matrix().tolist()
    for item in df_list:          
       result.add(tuple(item))

Above code will exclude any duplicates. 上面的代码将排除任何重复项。 The only difference from your case will be that elements of result will be tuples instead of lists. 与您的案例唯一的区别在于, result元素将是元组而不是列表。

At the end, you can recast the set to a list by: 最后,您可以通过以下方式将集合重播到列表中:

result = list(result)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM