简体   繁体   English

检查字符串是否包含列表元素

[英]Check if a string contains the list elements

How to check if a string contains the elements in a list?如何检查字符串是否包含列表中的元素?

str1 = "45892190"
lis = [89,90]
str1 = "45892190"
lis = [89,90]

for i in lis:
    if str(i) in str1:
        print("The value " + str(i) + " is in the list")

OUTPUT:输出:

The value 89 is in the list值 89 在列表中

The value 90 is in the list值 90 在列表中

If you want to check if all the values in lis are in str1, the code of cricket_007如果要检查lis中的所有值是否都在str1中,cricket_007的代码

all(str(l) in str1 for l in lis)
out: True

is what you are looking for是你要找的

If no overlap is allowed, this problem becomes much harder than it looks at first.如果不允许重叠,这个问题就会变得比最初看起来要困难得多。 As far as I can tell, no other answer is correct (see test cases at the end).据我所知,没有其他答案是正确的(请参阅最后的测试用例)。

Recursion is needed because if a substring appears more than once, using one occurence instead of the other could prevent other substrings to be found.递归是必要的,因为如果一个子串出现不止一次,使用一个出现而不是另一个可能会阻止找到其他子串。

This answer uses two functions.这个答案使用了两个函数。 The first one finds every occurence of a substring in a string and returns an iterator of strings where the substring has been replaced by a character which shouldn't appear in any substring.第一个查找字符串中子字符串的每次出现,并返回字符串的迭代器,其中子字符串已被替换为不应出现在任何子字符串中的字符。

The second function recursively checks if there's any way to find all the numbers in the string:第二个函数递归检查是否有任何方法可以找到字符串中的所有数字:

def find_each_and_replace_by(string, substring, separator='x'):
    """
    list(find_each_and_replace_by('8989', '89', 'x'))
    # ['x89', '89x']
    list(find_each_and_replace_by('9999', '99', 'x'))
    # ['x99', '9x9', '99x']
    list(find_each_and_replace_by('9999', '89', 'x'))
    # []
    """
    index = 0
    while True:
        index = string.find(substring, index)
        if index == -1:
            return
        yield string[:index] + separator + string[index + len(substring):]
        index += 1


def contains_all_without_overlap(string, numbers):
    """
    contains_all_without_overlap("45892190", [89, 90])
    # True
    contains_all_without_overlap("45892190", [89, 90, 4521])
    # False
    """
    if len(numbers) == 0:
        return True
    substrings = [str(number) for number in numbers]
    substring = substrings.pop()
    return any(contains_all_without_overlap(shorter_string, substrings)
               for shorter_string in find_each_and_replace_by(string, substring, 'x'))

Here are the test cases:以下是测试用例:

tests = [
    ("45892190", [89, 90], True),
    ("8990189290", [89, 90, 8990], True),
    ("123451234", [1234, 2345], True),
    ("123451234", [2345, 1234], True),
    ("123451234", [1234, 2346], False),
    ("123451234", [2346, 1234], False),
    ("45892190", [89, 90, 4521], False),
    ("890", [89, 90], False),
    ("8989", [89, 90], False),
    ("8989", [12, 34], False)
]

for string, numbers, should in tests:
    result = contains_all_without_overlap(string, numbers)
    if result == should:
        print("Correct answer for %-12r and %-14r (%s)" % (string, numbers, result))
    else:
        print("ERROR : %r and %r should return %r, not %r" %
              (string, numbers, should, result))

And the corresponding output:以及相应的输出:

Correct answer for '45892190'   and [89, 90]       (True)
Correct answer for '8990189290' and [89, 90, 8990] (True)
Correct answer for '123451234'  and [1234, 2345]   (True)
Correct answer for '123451234'  and [2345, 1234]   (True)
Correct answer for '123451234'  and [1234, 2346]   (False)
Correct answer for '123451234'  and [2346, 1234]   (False)
Correct answer for '45892190'   and [89, 90, 4521] (False)
Correct answer for '890'        and [89, 90]       (False)
Correct answer for '8989'       and [89, 90]       (False)
Correct answer for '8989'       and [12, 34]       (False)

If you want non-overlapping matches I'd do it like this:如果你想要不重叠的比赛,我会这样做:

  • create a copy of the initial string (as we'll modify it)创建初始字符串的副本(因为我们将对其进行修改)
  • go through each element of the list and if we find the element in our string, we replace it with x遍历列表中的每个元素,如果我们在字符串中找到该元素,则将其替换为x
  • at the same time, if we find the number in our string, we increment a counter同时,如果我们在字符串中找到数字,我们就会增加一个计数器
  • at the end, if the variable equals the length of the list, it means that all of its elements are there最后,如果变量等于列表的长度,则意味着它的所有元素都在那里
str1 = "45890190"
lis1 = [89, 90]

copy, i = str1, 0
for el in lis1:
    if str(el) in copy:
        copy = copy.replace(str(el), 'x')
        i = i + 1

if i == len(lis1):
    print(True)

More, we don't really need a counter if we add an extra condition which will return False when an element isn't found in the string.此外,如果我们添加一个额外的条件,当在字符串中找不到元素时将返回False ,我们实际上并不需要计数器。 That is, we get to the following, final solution:也就是说,我们得到以下最终解决方案:

def all_matches(_list, _string):
    str_copy = _string
    for el in _list:
        if str(el) not in str_copy:
            return False
        str_copy = str_copy.replace(str(el), 'x')
    return True

Which you can test by writing:您可以通过编写来测试:

 str1 = "4589190" lis1 = [89, 90] print(all_matches(lis1, str1)) > True

This might not be the best solution for what you're looking, but I guess it serves the purpose.这可能不是您正在寻找的最佳解决方案,但我想它可以达到目的。

You can use all() function您可以使用all()函数

In [1]: str1 = "45892190"
   ...: lis = [89,90]
   ...: all(str(l) in str1 for l in lis)
   ...:
Out[1]: True
def contains(s, elems):
    for elem in elems:
        index = s.find(elem)
        if index == -1:
            return False
        s = s[:index] + s[index + len(elem) + 1:]
    return True

Usage:用法:

>>> str1 = "45892190"
>>> lis = [89,90]
>>> contains(str1, (str(x) for x in lis))
True
>>> contains("890", (str(x) for x in lis))
False

You can use the regular expression to search.您可以使用正则表达式进行搜索。

import re
str1 = "45892190"
lis = [89,90]
for i in lis:
  x = re.search(str(i), str1)
  print(x)

It is possible to implement this correctly using regular expressions.使用正则表达式可以正确实现这一点。 Generate all unique permutations of the input, for each permutation connect the terms with ".*" then connect all of the permutations with "|".生成输入的所有唯一排列,对于每个排列,用“.*”连接项,然后用“|”连接所有排列。 For example, [89, 90, 8990] gets turned into 89.*8990.*90|例如, [89, 90, 8990] 变成了 89.*8990.*90| 89.*90.*8990| 89.*90.*8990| 8990.*89.*90| 8990.*89.*90| 8990.*90.*89| 8990.*90.*89| 90.*89.*8990| 90.*89.*8990| 90.*8990.*89 , where I added a space after each "|" 90.*8990.*89 ,我在每个“|”之后添加了一个空格for clarity."为了清楚起见。”

The following passes Eric Duminil's test suite.以下通过 Eric Duminil 的测试套件。

import itertools
import re

def create_numbers_regex(numbers):
    # Convert each into a string, and double-check that it's an integer
    numbers = ["%d" % number for number in numbers]

    # Convert to unique regular expression terms
    regex_terms = set(".*".join(permutation)
                            for permutation in itertools.permutations(numbers))
    # Create the regular expression. (Sorted so the order is invariant.)
    regex = "|".join(sorted(regex_terms))
    return regex

def contains_all_without_overlap(string, numbers):
    regex = create_numbers_regex(numbers)
    pat = re.compile(regex)
    m = pat.search(string)
    if m is None:
        return False
    return True

However, and this is a big however, the regular expression size, in the worst case, grows as the factorial of the number of numbers.然而,这是一个很大的问题,在最坏的情况下,正则表达式的大小会随着数字数量的阶乘而增长。 Even with only 8 unique numbers, that's 40320 regex terms.即使只有 8 个唯一数字,也就是 40320 个正则表达式。 It takes Python several seconds just to compile that regex.仅仅编译那个正则表达式就需要 Python 几秒钟的时间。

The only time where this solution might be useful is if you have a handful of numbers and you wanted to search a lot of strings.此解决方案可能有用的唯一时间是如果您有少量数字并且您想搜索大量字符串。 In that case, you might also look into re2, which I believe could handle that regex without backtracking.在这种情况下,您还可以查看 re2,我相信它可以在不回溯的情况下处理该正则表达式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM