简体   繁体   中英

Remove duplicates from list in python

Code below, gets the answer through get request and writes the result to the list "RESULT"

for i in url:
    df = pd.read_html(i,header=0)[0]
    df = df.as_matrix().tolist()
    for item in df:           
        RESULT.append(item)

I use the code below to exclude duplicate entries:

def unique_items(RESULT):
found = set()
for item in RESULT:
    if item[0] not in found:
        yield item
        found.add(item[0])
NOT_DUBLICATE = (list(unique_items(RESULT)))
print(NOT_DUBLICATE)

It seems to me it is not optimal since it is necessary to get a list of all the rows to exclude duplicates.

How can I find duplicates before loading a rows into the list RESULT?

for example, the rows I write to the list RESULT:

[[55323602, 'system]
,[55323603, 'system]]
[[55323602, 'system]
,[55323603, 'system]]

Instead of use another method to exclude duplicate entries, append item to the list if item doesn't exist in the list RESULT . Then you don't need method unique_items() .

You can find duplicates before loading a row into the list RESULT using this:

for i in url:
    df = pd.read_html(i,header=0)[0]
    df = df.as_matrix().tolist()
    for item in df:  
        if item not in RESULT         
            RESULT.append(item)

Just use a set instead of a list.

result = set()
for i in url:
    df = pd.read_html(i,header=0)[0]
    df_list = df.as_matrix().tolist()
    for item in df_list:          
       result.add(tuple(item))

Above code will exclude any duplicates. The only difference from your case will be that elements of result will be tuples instead of lists.

At the end, you can recast the set to a list by:

result = list(result)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM