简体   繁体   English

使用Lambda函数python进行过滤

[英]Filter using lambda function python

I have an array containing invalid string 我有一个包含无效字符串的数组

arr_invalid = ['aks', 'rabbbit', 'dog'].  

I am parsing through a RDD using lambda function and need to ignore the case if any of this invalid string comes in the input string like if input string is akss or aks ignore both. 我正在使用lambda函数通过RDD进行解析,并且需要忽略此无效字符串是否出现在输入字符串中的情况,例如输入字符串是akss还是aks忽略这两种情况。

How do I achieve this without writing filter for each invalid string? 如何在不为每个无效字符串编写过滤器的情况下实现这一目标?

You need to compare each string unless the words come sorted, you can use any to see if any substring is in each string: 您需要比较每个字符串,除非对单词进行排序,可以使用any来查看每个字符串中是否有子字符串:

arr_invalid = ['aks', 'rabbbit', 'dog']

strings = [ "aks", "akss","foo", "saks"]


filt = list(filter(lambda x: not any(s in x.lower() for s in arr_invalid),strings))

Output: 输出:

 ['foo']

If you only want to exclude the strings if they start with one of the substrings: 如果您只想排除以子字符串之一开头的字符串:

t = tuple(arr_invalid)
filt = list(filter(lambda x: not x.lower().startswith(t), strings))

Output: 输出:

['foo', 'saks']

If the input is a single string just split: 如果输入是单个字符串,则拆分:

st = "foo akss saks aks"
t = tuple(arr_invalid)
filt = list(filter(lambda x: not x.startswith(t),st.lower().split()))

You can also just use a list comp: 您也可以只使用列表组合:

 [s for s in st.lower().split() if not s.startswith(t)]

As poke commented you could find exact matches with a set, you will still need it to combine it with either any and in or str.startswith for matching substrings: 正如戳所评论的那样,您可以找到与集合完全匹配的内容,但仍需要将其与any和in。或str.startswith组合以匹配子字符串:

arr_invalid = {'aks', 'rabbbit', 'dog'}

st = "foo akss saks aks"
t = tuple(arr_invalid)

file = list(filter(lambda s: s not in st or not s.startswith(t),st.lower().split())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM