简体   繁体   English

python,从列表中删除字符串中的单词

[英]python, remove words in a string from a list

I want to remove words in a string, which the words or 'seed' words in a list, 我想删除字符串中的单词,其中的单词或“种子”单词在列表中,

example: 例:

query = "LK936033.1 Babesia assembly 454hybrid_PBjelly scaffold Contig1323  7"
seeds = ["assembly","454","scaffold","contig"]

expect result: 预期结果:

"LK936033.1 Babesia 7"

I found a way to do remove words like this: 我找到了一种删除此类单词的方法:

' '.join([i for i in query.split() if i not in seeds])

but this method only removes the exact words from the seed list, but not the words containing the seeds. 但是此方法只会从种子列表中删除确切的单词,而不会删除包含种子的单词。

You'll need to expand your test; 您需要扩展测试; use the any() function for efficiency: 使用any()函数可提高效率:

' '.join([i for i in query.split() if not any(w in i.lower() for w in seeds)])

The any(w in i.lower() for w in seeds) test uses a generator expression to test if the current word contains any of the seed words, lowercased; any(w in i.lower() for w in seeds)测试使用生成器表达式来测试当前单词是否包含任何种子单词(小写); it'll only evaluate the minimum number of tests to find a match. 它只会评估找到匹配项的最少测试数量。

Demo: 演示:

>>> query = "LK936033.1 Babesia assembly 454hybrid_PBjelly scaffold Contig1323  7"
>>> seeds = ["assembly","454","scaffold","contig"]
>>> ' '.join([i for i in query.split() if not any(w in i.lower() for w in seeds)])
'LK936033.1 Babesia 7'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM