[英]In Python, how can I remove items from a list based on a list of strings?
I have a list of strings that I want to remove items from. 我有一个要从中删除项目的字符串列表。 I have a list of keywords that I am searching for in these items. 我在这些项目中有要搜索的关键字列表。 I cannot seem to get the output I am looking for. 我似乎无法获得所需的输出。 I am not sure if regular expressions are the right way to handle this. 我不确定正则表达式是否是处理此问题的正确方法。
I want the output to be ['/item/page/cat-dog', '/item/page/animal-planet'] 我希望输出为['/ item / page / cat-dog','/ item / page / animal-planet']
valid = ['/item/page/cat-dog', '/item/page/animal-planet', '/item/page/variable']
keywords = ['cat','planet']
for item in valid:
#a = re.findall()
#
Python comes with the handy keywords in
and not in
to test if an object is or is not in a list. Python附带了方便的关键字in
而not in
使用方便的关键字来测试对象是否在列表中。
for your problem, you can simply do : 对于您的问题,您可以执行以下操作:
new_list = []
for item in valid:
if os.path.basename(item) not in keywords:
new_list.append(item)
os.path.basename
gives the name of the files without the arborescence. os.path.basename
给出不带树状文件的文件名。 new_list will then contain all the elements of valid
in which the filenames were not in keyword
. 然后new_list将包含文件名中没有keyword
所有valid
元素。
据我所知,根据@ dan-d的评论 ,您需要的是
[s for s in valid if not any(q in s for q in keywords)]
As suggested in the comments and other answers, the in operator may be used to check if a string is a substring of another string. 如注释和其他答案中所建议, in运算符可用于检查一个字符串是否是另一个字符串的子字符串。 For the example data in the question, using in
is the simplest and fastest way to get the desired result. 对于问题中的示例数据,使用in
是获得所需结果的最简单,最快的方法。
If the requirement is to match '/item/page/cat-dog' but not '/item/page/catapult' - that is only match the word 'cat', not just the sequence cat , then a regular expression may be used to do the matching. 如果要求匹配'/ item / page / cat-dog'而不匹配'/ item / page / catapult'-仅匹配单词 'cat',而不仅是序列cat ,那么可以使用正则表达式进行匹配。
The pattern to match a single word is '\\bfoo\\b'
where '\\b'
marks a word boundary. 匹配单个单词的模式是'\\bfoo\\b'
,其中'\\b'
标记单词边界。
The alternation operator '|'
交替运算符'|'
is used to match one pattern or another, for example 'foo|bar'
matches 'foo' or 'bar'. 用于匹配一个或另一个模式,例如'foo|bar'
匹配'foo' 或 'bar'。
Construct a pattern that matches the words in keywords
; 构建与keywords
中的单词匹配的模式; call re.escape on each keyword in case they contain characters that the regex engine might interpret as metacharacters. 如果每个关键字包含正则表达式引擎可能会解释为元字符的字符,请对每个关键字调用re.escape 。
>>> pattern = r'|'.join(r'\b{}\b'.format(re.escape(keyword)) for keyword in keywords)
>>> pattern
'\\bcat\\b|\\bplanet\\b'
Compile the pattern into a regular expression object . 将模式编译为正则表达式对象 。
>>> rx = re.compile(pattern)
Find the matches: using filter is elegant: 找到匹配项:使用filter很优雅:
>>> matches = list(filter(rx.search, valid))
>>> matches
['/item/page/cat-dog', '/item/page/animal-planet']
But it's common to use a list comprehension : 但是使用列表推导是很常见的:
>>> matches = [word for word in valid if rx.search(word)]
>>> matches
['/item/page/cat-dog', '/item/page/animal-planet']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.