简体   繁体   中英

Unique elements of sublists depending on specific value in sublist

I an trying to select unique datasets from a very large quite inconsistent list. My Dataset RawData consists of string-items of different length. Some items occure many times, for example: ['a','b','x','15/30'] The key to compare the item is always the last string : for example '15/30'

The goal is: Get a list: UniqueData with items that occure only once. (i want to keep the order)

Dataset:

RawData = [['a','b','x','15/30'],['d','e','f','g','h','20/30'],['w','x','y','z','10/10'],['a','x','c','15/30'],['i','j','k','l','m','n','o','p','20/60'],['x','b','c','15/30']]

My desired solution Dataset:

UniqueData = [['a','b','x','15/30'],['d','e','f','g','h','20/30'],['w','x','y','z','10/10'],['i','j','k','l','m','n','o','p','20/60']]

I tried many possible solutions for instance:

for index, elem in enumerate(RawData): and appending to a new list if.....

for element in list does not work, because the items are not exactly the same.

Can you help me finding a solution to my problem?

Thanks!

The best way to remove duplicates is to add them into a set. Add the last element into a set as to keep track of all the unique values. When the value you want to add is already present in the set unique do nothing if not present add the value to set unique and append the lst to result list here it's new .

Try this.

new=[]
unique=set()
for lst in RawData:
     if lst[-1] not in unique:
         unique.add(lst[-1])
         new.append(lst)

print(new)
#[['a', 'b', 'x', '15/30'],
 ['d', 'e', 'f', 'g', 'h', '20/30'],
 ['w', 'x', 'y', 'z', '10/10'],
 ['i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', '20/60']]

You could set up a new array for unique data and to track the items you have seen so far. Then as you loop through the data if you have not seen the last element in that list before then append it to unique data and add it to the seen list.

RawData = [['a', 'b', 'x', '15/30'], ['d', 'e', 'f', 'g', 'h', '20/30'], ['w', 'x', 'y', 'z', '10/10'],
           ['a', 'x', 'c', '15/30'], ['i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', '20/60'], ['x', 'b', 'c', '15/30']]
seen = []
UniqueData = []
for data in RawData:
    if data[-1] not in seen:
        UniqueData.append(data)
        seen.append(data[-1])

print(UniqueData)

OUTPUT

[['a', 'b', 'x', '15/30'], ['d', 'e', 'f', 'g', 'h', '20/30'], ['w', 'x', 'y', 'z', '10/10'], ['i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', '20/60']]
    RawData = [['a','b','x','15/30'],['d','e','f','g','h','20/30'],['w','x','y','z','10/10'],['a','x','c','15/30'],['i','j','k','l','m','n','o','p','20/60'],['x','b','c','15/30']]

seen = []
seen_indices = []

for _,i in enumerate(RawData):
  # _ -> index
  # i -> individual lists
  if i[-1] not in seen:
   seen.append(i[-1])
  else:
   seen_indices.append(_)

for index in sorted(seen_indices, reverse=True):
    del RawData[index]

print (RawData)

Using a set to filter out entries for which the key has already been seen is the most efficient way to go.

Here's a one liner example using a list comprehension with internal side effects:

UniqueData = [rd for seen in [set()] for rd in RawData if not(rd[-1] in seen or seen.add(rd[-1])) ]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM