简体   繁体   English

检查字符串数组是否包含另一个字符串的 substring

[英]Check if an array of strings contains a substring of another string

Let's say I have an array of strings, along with some other sentences ie.假设我有一个字符串数组,以及其他一些句子,即。


fruits = ['apple', 'orange', 'grape']

example1 = 'John really enjoys melons'
example2 = 'John would like an apple'

I want to return True if any of the values inside 'fruits' are a valid substring of text1 or text2.如果“fruits”中的任何值是 text1 或 text2 的有效 substring,我想返回 True。

So in this case text1 is going to be false, and text2 is going to be true.所以在这种情况下 text1 将是假的,而 text2 将是真的。 Obviously I can iterate over every value in fruits and check if it is present in the example text one by one for each example.显然,我可以遍历 fruits 中的每个值,并为每个示例一个一个地检查它是否存在于示例文本中。

However, my actual data has hundreds of 'examples' I want to check, and my 'fruits' array is much longer.但是,我的实际数据有数百个我想检查的“示例”,而且我的“水果”数组要长得多。 Is there a short/concise way to do this in Python?在 Python 中是否有一种简短/简洁的方法来执行此操作? ie. IE。 Some sort of function that can be called instead of having to iterate over each value in "fruits" for every example.可以调用某种 function,而不必为每个示例遍历“fruits”中的每个值。

You could use any() with a generator expression to iterate over all the fruits : (It short-circuits, so once it finds one that's True , it stops)您可以将any()与生成器表达式一起使用来遍历所有fruits :(它会短路,因此一旦找到True ,它就会停止)

any(fruit in example1 for fruit in fruits) # True
any(fruit in example2 for fruit in fruits) # False

You can try set intersection if you just want code in an easily readable fashion.如果您只想以易于阅读的方式编写代码,您可以尝试设置交集 Internally, it has to go through every word in each of your example string but using a set might optimize it a little.在内部,它必须通过每个示例字符串中的每个单词 go 但使用集合可能会对其进行一些优化。

fruits = ['apple', 'orange', 'grape']
setFruits = set(fruits)

exampleList = ['John really enjoys melons', 'John really enjoys apple']

for example in exampleList:
    setExample = set(example.split(' '))
    if len(fruits.intersection(setExample)) > 0:
        print('True')
    else:
        print('False')

This might be a little fast because your example strings can contain multiple copies of the same word and you don't have to check them again and again.这可能有点快,因为您的示例字符串可以包含同一个单词的多个副本,您不必一次又一次地检查它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM