[英]Using list.remove() to sort list Python
我正在尝试通过删除未出现在已经按照我想要的方式排序的列表 (sorted_category_list) 中的项目来对 keys_list 进行排序。
sorted_category_list = ['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'Master Men', 'Master Women', 'U21 Men', 'U21 Women',
'U17 Men', 'U17 Women', 'U17 Men', 'U17 Women', 'U15 Mixed', 'Hardtail', 'E-Bike']
keys_list = ['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'U15 Mixed', 'U17 Men', 'U21 Men', 'U21 Women']
for category in sorted_category_list:
if category not in keys_list:
sorted_category_list.remove(category)
print(sorted_category_list)
print(keys_list)
我怎么只得到这个结果。 它似乎删除了一些项目而不是其他项目,所以我不确定我做错了什么:
['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'Master Women', 'U21 Men', 'U21 Women', 'U17 Men', 'U17 Men', 'U15 Mixed', 'E-Bike']
['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'U15 Mixed', 'U17 Men', 'U21 Men', 'U21 Women']
这是因为list.remove()
仅删除找到的第一个元素,因此如果列表中有两个相同的元素,它只会删除一个。
sorted_category_list = ['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'Master Men', 'Master Women', 'U21 Men', 'U21 Women',
'U17 Men', 'U17 Women', 'U17 Men', 'U17 Women', 'U15 Mixed', 'Hardtail', 'E-Bike']
keys_list = ['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'U15 Mixed', 'U17 Men', 'U21 Men', 'U21 Women']
sorted_category_list = [a for a in sorted_category_list if a in keys_list]
print(sorted_category_list)
print(keys_list)
将键转换成一个集合:
keys_list = {'Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'U15 Mixed', 'U17 Men', 'U21 Men', 'U21 Women'}
然后删除使用集合:
sorted_category_list = ['Elite Men', 'Elite Women', 'Open Men', 'Open Women', 'Master Men', 'Master Women', 'U21 Men', 'U21 Women',
'U17 Men', 'U17 Women', 'U17 Men', 'U17 Women', 'U15 Mixed', 'Hardtail', 'E-Bike']
sorted_category_list[:] = [i for i in sorted_category_list if i in keys_list]
我建议也许只是将两个列表中的项目附加到一个新列表中。 这样您就可以避免更改原始列表。
repeats=[]
for item in keys_list:
if item in sorted_category_list:
repeats.append(item)
问题是您正在同时迭代和修改列表。
考虑一个列表 ['a','b','c','d'] 并且您有一个代码
for char in list:
if char == 'a':
list.remove(char)
在这种情况下,列表的迭代方式是
循环 1:char = a(索引 0)
由于 char 被删除,接下来要搜索的索引是 1。
但是现在列表是 ['b','c','d'] 所以索引 1 处的字符是 'c' 所以 'b' 被跳过。
因此,在您的情况下,第一个被删除的元素是“Master Men”,因此下一个元素即“Master Women”被跳过,这就是为什么它在列表中并且每次删除都会跳过下一个元素。
如果您绝对必须在迭代列表(或可迭代的)时修改它,请使用向后迭代来执行此操作,如下所示:
def clean_dataset(data: list, items_to_remove: list) -> list:
end_index = len(data) - 1
#enumerate the reversed list to iterate backwards from the last index
for index, value in enumerate(reversed(data)):
if value in items_to_remove:
del data[end_index - index]
return data
这在小型数据集上工作正常,但随着数据集的扩展很快变得不可用。 如果您可以删除列表的大片而不是一个接一个地删除,则可以对其进行优化。 如果你不能删除大片,那么最好按照建议附加
def new_dataset(data: list, items_to_remove: list) -> list:
new_list = []
for value in data:
if value not in items_to_remove:
new_list.append(value)
return data
出于好奇,我检查了大小数据集的时间,即使只有 750,000 个项目,添加到新列表的速度也快得多:
sorted_category_list = ['Elite Men', 'Elite Women', 'Open Men',
'Open Women', 'Master Men', 'Master Women',
'U21 Men', 'U21 Women','U17 Men', 'U17 Women',
'U17 Men', 'U17 Women', 'U15 Mixed',
'Hardtail', 'E-Bike']
sorted_category_list3 = ['Elite Men', 'Elite Women', 'Open Men',
'Open Women', 'Master Men', 'Master Women',
'U21 Men', 'U21 Women','U17 Men', 'U17 Women',
'U17 Men', 'U17 Women', 'U15 Mixed',
'Hardtail', 'E-Bike']*50000
keys_list = ['Elite Men', 'Elite Women', 'Open Men',
'Open Women', 'U15 Mixed', 'U17 Men', 'U21 Men', 'U21 Women']
if __name__ == "__main__":
print('timing:')
x1 = timeit.timeit("clean_dataset(sorted_category_list, keys_list)",
setup="from __main__ import clean_dataset,\
sorted_category_list, keys_list",
number=1)
print(f"removal - small dataset:\t {x1:15.15f}")
x2 = timeit.timeit("new_dataset(sorted_category_list, keys_list)",
setup="from __main__ import new_dataset,\
sorted_category_list, keys_list",
number=1)
print(f"append - small dataset: \t {x2:15.15f}")
y1 = timeit.timeit("clean_dataset(sorted_category_list3, keys_list)",
setup="from __main__ import clean_dataset,\
sorted_category_list3, keys_list",
number=1)
print(f"removal - large dataset:\t {y1:15.15f}")
y2 = timeit.timeit("new_dataset(sorted_category_list3, keys_list)",
setup="from __main__ import new_dataset,\
sorted_category_list3, keys_list",
number=1)
print(f"append - large dataset: \t {y2:15.15f}")
输出:
timing:
removal - small dataset: 0.000006600000000
append - small dataset: 0.000005500000000
removal - large dataset: 17.711741400000001
append - large dataset: 0.064716900000001
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.