[英]Regular Expression (find matching characters in order)
Let us say that I have the following string variables: 让我们说我有以下字符串变量:
welcome = "StackExchange 2016"
string_to_find = "Sx2016"
Here, I want to find the string string_to_find
inside welcome
using regular expressions. 在这里,我想使用正则表达式在welcome
内部找到字符串string_to_find
。 I want to see if each character in string_to_find
comes in the same order as in welcome
. 我想看看在每一个字符string_to_find
进来的顺序相同welcome
。
For instance, this expression would evaluate to True
since the 'S'
comes before the 'x'
in both strings, the 'x'
before the '2'
, the '2'
before the 0
, and so forth. 例如,该表达式将计算结果为True
,因为'S'
来了之前'x'
两个字符串时, 'x'
前的'2'
,在'2'
之前的0
,等等。
Is there a simple way to do this using regex? 有没有一种简单的方法使用正则表达式来做到这一点?
Your answer is rather trivial. 您的回答很简单。 The .*
character combination matches 0 or more characters. .*
字符组合匹配0个或更多字符。 For your purpose, you would put it between all characters in there. 为了您的目的,您可以将其放在其中的所有字符之间。 As in S.*x.*2.*0.*1.*6
. 与S.*x.*2.*0.*1.*6
。 If this pattern is matched, then the string obeys your condition. 如果此模式匹配,则字符串符合您的条件。
For a general string you would insert the .*
pattern between characters, also taking care of escaping special characters like literal dots, stars etc. that may otherwise be interpreted by regex. 对于一般字符串,您应在字符之间插入.*
模式,同时还要避免转义特殊字符(如文字点,星号等),否则这些特殊字符可能会被正则表达式解释。
Use wildcard matches with .
与一起使用通配符匹配.
, repeating with *
: ,用*
重复:
expression = 'S.*x.*2.*0.*1.*6'
You can also assemble this expression with join()
: 您也可以使用join()
汇编此表达式:
expression = '.*'.join('Sx2016')
Or just find it without a regular expression, checking whether the location of each of string_to_find
's characters within welcome
proceeds in ascending order, handling the case where a character in string_to_find
is not present in welcome
by catching the ValueError
: 或者只是不使用正则表达式查找它,检查string_to_find
中每个字符在welcome
的位置是否以升序进行,通过捕获ValueError
处理string_to_find
中的字符不出现在welcome
的情况:
>>> welcome = "StackExchange 2016"
>>> string_to_find = "Sx2016"
>>> try:
... result = [welcome.index(c) for c in string_to_find]
... except ValueError:
... result = None
...
>>> print(result and result == sorted(result))
True
This function might fit your need 此功能可能符合您的需求
import re
def check_string(text, pattern):
return re.match('.*'.join(pattern), text)
'.*'.join(pattern)
create a pattern with all you characters separated by '.*'
. '.*'.join(pattern)
创建一个模式,其中所有字符都由'.*'
分隔。 For instance 例如
>> ".*".join("Sx2016")
'S.*x.*2.*0.*1.*6'
Actually having a sequence of chars like Sx2016
the pattern that best serve your purpose is a more specific: 实际上具有一系列字符(例如Sx2016
,最能满足您目的的模式是更具体的:
S[^x]*x[^2]*2[^0]*0[^1]*1[^6]*6
You can obtain this kind of check defining a function like this: 您可以获取定义如下功能的检查:
import re
def contains_sequence(text, seq):
pattern = seq[0] + ''.join(map(lambda c: '[^' + c + ']*' + c, list(seq[1:])))
return re.search(pattern, text)
This approach add a layer of complexity but brings a couple of advantages as well: 这种方法增加了一层复杂性,但也带来了两个优点:
It's the fastest one because the regex engine walk down the string only once while the dot-star approach go till the end of the sequence and back each time a .*
is used . 这是最快的一种方法,因为正则表达式引擎仅沿字符串走了一次,而点星方法一直走到序列的末尾, 每次使用.*
都返回 。 Compare on the same string (~1k chars): 比较相同的字符串(约1k个字符):
It works on multiline strings in input as well. 它也适用于输入中的多行字符串。
Example code 范例程式码
>>> sequence = 'Sx2016'
>>> inputs = ['StackExchange2015','StackExchange2016','Stack\nExchange\n2015','Stach\nExchange\n2016']
>>> map(lambda x: x + ': yes' if contains_sequence(x,sequence) else x + ': no', inputs)
['StackExchange2015: no', 'StackExchange2016: yes', 'Stack\nExchange\n2015: no', 'Stach\nExchange\n2016: yes']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.