简体   繁体   中英

Using list.remove() to sort list Python

I am trying to sort the keys_list by removing items that don't appear in a list that is already sorted how I want it (sorted_category_list).

sorted_category_list = ['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'Master Men', 'Master Women', 'U21 Men', 'U21 Women',
                        'U17 Men', 'U17 Women', 'U17 Men', 'U17 Women', 'U15 Mixed', 'Hardtail', 'E-Bike']
keys_list = ['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'U15 Mixed', 'U17 Men', 'U21 Men', 'U21 Women']

for category in sorted_category_list:
    if category not in keys_list:
        sorted_category_list.remove(category)

print(sorted_category_list)
print(keys_list)

How ever I only get this results. It seems to remove some items but not others so I'm not sure what I am doing wrong:

['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'Master Women', 'U21 Men', 'U21 Women', 'U17 Men', 'U17 Men', 'U15 Mixed', 'E-Bike']
['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'U15 Mixed', 'U17 Men', 'U21 Men', 'U21 Women']

This is because list.remove() removes only the first element that is found hence if you have two same elements in the list it removes only one.

sorted_category_list = ['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'Master Men', 'Master Women', 'U21 Men', 'U21 Women',
                        'U17 Men', 'U17 Women', 'U17 Men', 'U17 Women', 'U15 Mixed', 'Hardtail', 'E-Bike']
keys_list = ['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'U15 Mixed', 'U17 Men', 'U21 Men', 'U21 Women']

sorted_category_list = [a for a in sorted_category_list if a in keys_list]
print(sorted_category_list)
print(keys_list)

Convert the keys into a set:

keys_list = {'Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'U15 Mixed', 'U17 Men', 'U21 Men', 'U21 Women'}

Then remove using the set:

sorted_category_list = ['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'Master Men', 'Master Women', 'U21 Men', 'U21 Women',
                        'U17 Men', 'U17 Women', 'U17 Men', 'U17 Women', 'U15 Mixed', 'Hardtail', 'E-Bike']

sorted_category_list[:] = [i for i in sorted_category_list if i in keys_list]

I would recommend maybe just appended the items in both lists to a new list. This way you can avoid altering your original list.

repeats=[]
for item in keys_list:
    if item in sorted_category_list:
        repeats.append(item)

The issue is you are iterating and modifying a list at the same time.
Consider a list ['a','b','c','d'] and you have a code

for char in list:
  if char == 'a':
     list.remove(char)

In this case how the list is being iterated is
Loop 1: char = a (Index 0)
Since char is removed the index that is going to be searched next is 1.
But the list now is ['b','c','d'] so the char at index at 1 is 'c' so 'b' gets skipped.

So in your case the first element getting deleted is 'Master Men' so the next element ie 'Master Women' gets skipped which is why it is in the list and with every deletion the next element gets skipped.

If you absolutely must modify a list (or iterable) while iterating over it, then do so using backwards iteration as follows:

def clean_dataset(data: list, items_to_remove: list) -> list:
    end_index = len(data) - 1
    #enumerate the reversed list to iterate backwards from the last index
    for index, value in enumerate(reversed(data)):
        if value in items_to_remove:
            del data[end_index - index]
    return data

this works fine on small datasets, but quickly becomes unsable as the dataset expands. It could be optimsed if you can remove large slices of the list as opposed to one-by-one. If you can't remove large slices, then better to append as suggested

def new_dataset(data: list, items_to_remove: list) -> list:
    new_list = []
    for value in data:
        if value not in items_to_remove:
            new_list.append(value)
    return data

out of curiousity, I checked the timing with small and larger datasets, even at just 750,000 items, appending to a new list is considerably faster:

sorted_category_list = ['Elite Men', 'Elite Women', 'Open Men',
                        'Open Women', 'Master Men', 'Master Women',
                         'U21 Men', 'U21 Women','U17 Men', 'U17 Women',
                          'U17 Men', 'U17 Women', 'U15 Mixed',
                           'Hardtail', 'E-Bike']

sorted_category_list3 = ['Elite Men', 'Elite Women', 'Open Men',
                        'Open Women', 'Master Men', 'Master Women',
                         'U21 Men', 'U21 Women','U17 Men', 'U17 Women',
                          'U17 Men', 'U17 Women', 'U15 Mixed',
                           'Hardtail', 'E-Bike']*50000

keys_list = ['Elite Men', 'Elite Women', 'Open Men',
             'Open Women', 'U15 Mixed', 'U17 Men', 'U21 Men', 'U21 Women']

if __name__ == "__main__":
    print('timing:')

    x1 = timeit.timeit("clean_dataset(sorted_category_list, keys_list)",
                        setup="from __main__ import clean_dataset,\
                             sorted_category_list, keys_list",
                             number=1)
    print(f"removal - small dataset:\t {x1:15.15f}")

    x2 = timeit.timeit("new_dataset(sorted_category_list, keys_list)",
                        setup="from __main__ import new_dataset,\
                             sorted_category_list, keys_list",
                             number=1)
    print(f"append - small dataset: \t {x2:15.15f}")

    y1 = timeit.timeit("clean_dataset(sorted_category_list3, keys_list)",
                        setup="from __main__ import clean_dataset,\
                             sorted_category_list3, keys_list",
                             number=1)
    print(f"removal - large dataset:\t {y1:15.15f}")

    y2 = timeit.timeit("new_dataset(sorted_category_list3, keys_list)",
                        setup="from __main__ import new_dataset,\
                             sorted_category_list3, keys_list",
                             number=1)
    print(f"append - large dataset: \t {y2:15.15f}")

output:

timing:
removal - small dataset:         0.000006600000000
append - small dataset:          0.000005500000000
removal - large dataset:         17.711741400000001
append - large dataset:          0.064716900000001

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM