简体   繁体   中英

Removing duplicates from a nested list

I am trying to get a new list called new_colors where it only has [['orange', 'green'], ['purple', 'red']] , removing the duplicates from my original list colors . For example, 'orange' is repeated twice in two different lists.

colors = [
    ['orange', 'green'],
    ['orange', 'yellow'],
    ['purple', 'red'],
    ['brown', 'red']]

this is what I came up with but it is not working.

new_colors = []

for i in colors:
  if i not in new_colors:
    new_colors.append(i)

print(new_colors)

You can define and sequentially update a set seen that stores elements seen, and use in combined with any to test whether a sublist has any element that is in seen :

colors = [['orange', 'green'], ['orange', 'yellow'], ['purple', 'red'], ['brown', 'red']] 

seen = set()
output = []
for sublst in colors:
    if not any(x in seen for x in sublst):
        output.append(sublst)
        seen.update(sublst)

print(output) # [['orange', 'green'], ['purple', 'red']]

As your example is ambiguous, I am providing here a solution to track the duplicates independently over the "columns". This means, if you had an extra ['red', 'black'] it would be kept as red is unique in the first column.

new_colors = []
seen = [set() for i in range(len(colors))]

for l in colors:
    # check if any item was already picked
    if any(e in s for e,s in zip(l,seen)):
        continue
    new_colors.append(l)
    # update picked items
    for e,s in zip(l,seen):
        s.add(e)

print(new_colors)

Output:

[['orange', 'green'],
 ['purple', 'red']]

More or less based on @j1-lee's answer, you could wrap the whole thing in a generator. Iterate over the color groups, and generate a set for each. If there exists an intersection between the current color set, and the set of all previously seen colors, do nothing. Otherwise, yield the current group and update the set of previously seen colors:

colors = [
    ['orange', 'green'],
    ['orange', 'yellow'],
    ['purple', 'red'],
    ['brown', 'red']
]

def to_filtered(groups):
    seen = set()
    for current_set, current_group in zip(map(set, groups), groups):
        if not seen & current_set:
            yield current_group
            seen |= current_set

print(list(to_filtered(colors)))

Output:

[['orange', 'green'], ['purple', 'red']]

You can try this,

colors = [
    ['orange', 'green'],
    ['orange', 'yellow'],
    ['purple', 'red'],
    ['brown', 'red']]
    
new_colors = []
isUnique = True

for pair in colors:
    unique_colors = [new_color for new_pair in new_colors for new_color in new_pair]
    for color in pair:
        if color in unique_colors:
            isUnique = False
            break
    if isUnique == True:
        new_colors.append(pair)
    isUnique = True

print(new_colors)

So,here first I declare an additional variable isUnique as flag and using a new list unique_colors which is inside the outer for loop, it is a flattened version of new_colors list and it will update each time new unique pair of color added to new_colors list.

Then inside the inner for loop, a checking took place for each color of the current pair . here if any color of the current pair matches any color of the unique_colors list, isUnique will be set False , and the inner loop will break.

After that isUnique will be checked and if it's True then the current pair will be added to new_colors and at the very last isUnique is to be set True for the next iteration.

Output: [['orange', 'green'], ['purple', 'red']]

You have to nest the for loop to iterate the elements instead of sub lists

colors = [
  ['orange', 'green'],
  ['orange', 'yellow'],
  ['purple', 'red'],
  ['brown', 'red']
]

new_colors = []

for i in colors:
  for j in i:
    if j not in new_colors:
      new_colors.append(j)

print(new_colors)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM