简体   繁体   English

从列表中删除元组会删除一些但不是全部

[英]Removing Tuples From A List Removes Some But Not All

I must be missing something really obvious. 我一定想念一些确实很明显的东西。

I have a list of tuples that are (phrase, number) pairs. 我有一个(短语,数字)对的元组列表。 I want to remove entire tuples that have phrases containing stopwords from my stopwords list. 我想从停用词列表中删除包含短语的整个元组。

stopwords = ['for', 'with', 'and', 'in', 'on', 'down']
tup_list = [('faucet', 5185), ('kitchen', 2719), ('faucets', 2628),
            ('kitchen faucet', 1511), ('shower', 1471), ('bathroom', 1131),
            ('handle', 1048), ('for', 1035), ('cheap', 960), ('bronze', 807),
            ('tub', 797), ('sale', 771), ('sink', 762), ('with', 696),
            ('single', 620), ('kitchen faucets', 615), ('stainless faucet', 613),
            ('pull', 603), ('and', 477), ('in', 447), ('single handle', 430),
            ('for sale', 406), ('bathroom faucet', 392), ('on', 369),
            ('down', 363), ('head', 359), ('pull down', 357), ('wall', 351),
            ('faucet with', 350)]

for p,n in tup_list:
    print('p', p, p.split(), any(phrase in stopwords for phrase in p.split()))

print(len(tup_list))
for p,n in tup_list:
    if any(phrase in stopwords for phrase in p.split()):
        tup_list.remove((p,n))
        print('Removing', p)
print(len(tup_list))

print([item for item in tup_list if item[0] == 'in'])

When I run the above, I get the following print-out: 当我运行以上命令时,得到以下打印输出:

p faucet ['faucet'] False
p kitchen ['kitchen'] False
p faucets ['faucets'] False
p kitchen faucet ['kitchen', 'faucet'] False
p shower ['shower'] False
p bathroom ['bathroom'] False
p handle ['handle'] False
p for ['for'] True
p cheap ['cheap'] False
p bronze ['bronze'] False
p tub ['tub'] False
p sale ['sale'] False
p sink ['sink'] False
p with ['with'] True
p single ['single'] False
p kitchen faucets ['kitchen', 'faucets'] False
p stainless faucet ['stainless', 'faucet'] False
p pull ['pull'] False
p and ['and'] True
p in ['in'] True
p single handle ['single', 'handle'] False
p for sale ['for', 'sale'] True
p bathroom faucet ['bathroom', 'faucet'] False
p on ['on'] True
p down ['down'] True
p head ['head'] False
p pull down ['pull', 'down'] True
p wall ['wall'] False
p faucet with ['faucet', 'with'] True
29
Removing for
Removing with
Removing and
Removing for sale
Removing on
Removing pull down
Removing faucet with
22
[('in', 447)]

My Question : why doesn't the tuple containing ('in', 447) get removed? 我的问题 :为什么不删除包含('in', 447)的元组? The printout shows p in ['in'] True meaning 'in' is in the stopwords list, so why does tup_list.remove((p,n)) not remove it? 打印输出p in ['in'] True显示p in ['in'] True表示“ in”在停用词列表中是正确的,为什么tup_list.remove((p,n))不能将其删除?

When you remove an item from the list in place, the indices change. 当您从适当的列表中删除项目时,索引会更改。 As you iterate over a list that changes, you will see unexpected results. 当您遍历更改的列表时,将看到意外的结果。

Here is one solution. 这是一个解决方案。 It's not the most efficient one, but may suit your needs. 它不是最有效的一种,但可能满足您的需求。

remove_indices = []

for i, (p, n) in enumerate(tup_list):
    if any(phrase in stopwords for phrase in p.split()):
        remove_indices.append(i)
        print('Removing', p)

tup_list = [i for j, i in enumerate(tup_list) if j not in remove_indices]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM