简体   繁体   中英

Remove duplicates from two linked arrays in Python

I have two arrays, one storing names and the other storing the URL associated with that name. However, the list may contain duplicate names and URL's. Traditionally to remove duplicates I would transform both arrays into a set. However, I only wish to delete the element (from both arrays) if the URL's are the same.

For example, if these were the initial arrays:

name = ['Bob', 'Mary', 'John', 'John', 'Bob']
url = ['url1', 'url2', 'url3', 'url4', 'url1']

I would want this output:

name = ['Bob', 'Mary', 'John', 'John']
url = ['url1', 'url2', 'url3', 'url4']
In [83]: name = ['Bob', 'Mary', 'John', 'John', 'Bob']

In [84]: url = ['url1', 'url2', 'url3', 'url4', 'url1']

In [85]: urls = set()

In [86]: answer = []

In [87]: for n,u in zip(name, url):
   ....:     if u in urls: continue
   ....:     answer.append((n,u))
   ....:     urls.add(u)
   ....:     

In [88]: answer
Out[88]: [('Bob', 'url1'), ('Mary', 'url2'), ('John', 'url3'), ('John', 'url4')]

In [89]: name, url = zip(*answer)

In [90]: name
Out[90]: ('Bob', 'Mary', 'John', 'John')

In [91]: url
Out[91]: ('url1', 'url2', 'url3', 'url4')

zip the lists together, essentially creating a pair and then use a set to eliminate the duplicates

>>> name = ['Bob', 'Mary', 'John', 'John', 'Bob']
>>> url = ['url1', 'url2', 'url3', 'url4', 'url1']
>>> list(zip(name, url))
[('Bob', 'url1'), ('Mary', 'url2'), ('John', 'url3'), ('John', 'url4'), ('Bob', 'url1')]
>>> x = set(list(zip(name, url)))
>>> x
{('Mary', 'url2'), ('Bob', 'url1'), ('John', 'url4'), ('John', 'url3')}

To get the items back in individual lists, use a list comprehension (The only downside is that you'll lose the ordering of the items because of the initial set conversion)

>>> a, b = [item[0] for item in x], [item[1] for item in x]
>>> a, b
(['Mary', 'Bob', 'John', 'John'], ['url2', 'url1', 'url4', 'url3'])
>>> 

You can do this, name, url = map(list,zip(*list(set(zip(name, url))))) What this does is it zips name and url . Removes duplicates using set , makes it back into a list. Unzips the lists then maps list to turn the tuples back into lists.

Note: This will not preserve order, but the elements will still be aligned (as in 'John' will still map to 'url3' ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM