繁体   English   中英

使用 list.remove() 对列表进行排序 Python

[英]Using list.remove() to sort list Python

我正在尝试通过删除未出现在已经按照我想要的方式排序的列表 (sorted_category_list) 中的项目来对 keys_list 进行排序。

sorted_category_list = ['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'Master Men', 'Master Women', 'U21 Men', 'U21 Women',
                        'U17 Men', 'U17 Women', 'U17 Men', 'U17 Women', 'U15 Mixed', 'Hardtail', 'E-Bike']
keys_list = ['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'U15 Mixed', 'U17 Men', 'U21 Men', 'U21 Women']

for category in sorted_category_list:
    if category not in keys_list:
        sorted_category_list.remove(category)

print(sorted_category_list)
print(keys_list)

我怎么只得到这个结果。 它似乎删除了一些项目而不是其他项目,所以我不确定我做错了什么:

['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'Master Women', 'U21 Men', 'U21 Women', 'U17 Men', 'U17 Men', 'U15 Mixed', 'E-Bike']
['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'U15 Mixed', 'U17 Men', 'U21 Men', 'U21 Women']

这是因为list.remove()仅删除找到的第一个元素,因此如果列表中有两个相同的元素,它只会删除一个。

sorted_category_list = ['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'Master Men', 'Master Women', 'U21 Men', 'U21 Women',
                        'U17 Men', 'U17 Women', 'U17 Men', 'U17 Women', 'U15 Mixed', 'Hardtail', 'E-Bike']
keys_list = ['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'U15 Mixed', 'U17 Men', 'U21 Men', 'U21 Women']

sorted_category_list = [a for a in sorted_category_list if a in keys_list]
print(sorted_category_list)
print(keys_list)

将键转换成一个集合:

keys_list = {'Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'U15 Mixed', 'U17 Men', 'U21 Men', 'U21 Women'}

然后删除使用集合:

sorted_category_list = ['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'Master Men', 'Master Women', 'U21 Men', 'U21 Women',
                        'U17 Men', 'U17 Women', 'U17 Men', 'U17 Women', 'U15 Mixed', 'Hardtail', 'E-Bike']

sorted_category_list[:] = [i for i in sorted_category_list if i in keys_list]

我建议也许只是将两个列表中的项目附加到一个新列表中。 这样您就可以避免更改原始列表。

repeats=[]
for item in keys_list:
    if item in sorted_category_list:
        repeats.append(item)

问题是您正在同时迭代和修改列表。
考虑一个列表 ['a','b','c','d'] 并且您有一个代码

for char in list:
  if char == 'a':
     list.remove(char)

在这种情况下,列表的迭代方式是
循环 1:char = a(索引 0)
由于 char 被删除,接下来要搜索的索引是 1。
但是现在列表是 ['b','c','d'] 所以索引 1 处的字符是 'c' 所以 'b' 被跳过。

因此,在您的情况下,第一个被删除的元素是“Master Men”,因此下一个元素即“Master Women”被跳过,这就是为什么它在列表中并且每次删除都会跳过下一个元素。

如果您绝对必须在迭代列表(或可迭代的)时修改它,请使用向后迭代来执行此操作,如下所示:

def clean_dataset(data: list, items_to_remove: list) -> list:
    end_index = len(data) - 1
    #enumerate the reversed list to iterate backwards from the last index
    for index, value in enumerate(reversed(data)):
        if value in items_to_remove:
            del data[end_index - index]
    return data

这在小型数据集上工作正常,但随着数据集的扩展很快变得不可用。 如果您可以删除列表的大片而不是一个接一个地删除,则可以对其进行优化。 如果你不能删除大片,那么最好按照建议附加

def new_dataset(data: list, items_to_remove: list) -> list:
    new_list = []
    for value in data:
        if value not in items_to_remove:
            new_list.append(value)
    return data

出于好奇,我检查了大小数据集的时间,即使只有 750,000 个项目,添加到新列表的速度也快得多:

sorted_category_list = ['Elite Men', 'Elite Women', 'Open Men',
                        'Open Women', 'Master Men', 'Master Women',
                         'U21 Men', 'U21 Women','U17 Men', 'U17 Women',
                          'U17 Men', 'U17 Women', 'U15 Mixed',
                           'Hardtail', 'E-Bike']

sorted_category_list3 = ['Elite Men', 'Elite Women', 'Open Men',
                        'Open Women', 'Master Men', 'Master Women',
                         'U21 Men', 'U21 Women','U17 Men', 'U17 Women',
                          'U17 Men', 'U17 Women', 'U15 Mixed',
                           'Hardtail', 'E-Bike']*50000

keys_list = ['Elite Men', 'Elite Women', 'Open Men',
             'Open Women', 'U15 Mixed', 'U17 Men', 'U21 Men', 'U21 Women']

if __name__ == "__main__":
    print('timing:')

    x1 = timeit.timeit("clean_dataset(sorted_category_list, keys_list)",
                        setup="from __main__ import clean_dataset,\
                             sorted_category_list, keys_list",
                             number=1)
    print(f"removal - small dataset:\t {x1:15.15f}")

    x2 = timeit.timeit("new_dataset(sorted_category_list, keys_list)",
                        setup="from __main__ import new_dataset,\
                             sorted_category_list, keys_list",
                             number=1)
    print(f"append - small dataset: \t {x2:15.15f}")

    y1 = timeit.timeit("clean_dataset(sorted_category_list3, keys_list)",
                        setup="from __main__ import clean_dataset,\
                             sorted_category_list3, keys_list",
                             number=1)
    print(f"removal - large dataset:\t {y1:15.15f}")

    y2 = timeit.timeit("new_dataset(sorted_category_list3, keys_list)",
                        setup="from __main__ import new_dataset,\
                             sorted_category_list3, keys_list",
                             number=1)
    print(f"append - large dataset: \t {y2:15.15f}")

输出:

timing:
removal - small dataset:         0.000006600000000
append - small dataset:          0.000005500000000
removal - large dataset:         17.711741400000001
append - large dataset:          0.064716900000001

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM