简体   繁体   English

正则表达式(按顺序查找匹配的字符)

[英]Regular Expression (find matching characters in order)

Let us say that I have the following string variables: 让我们说我有以下字符串变量:

welcome = "StackExchange 2016"
string_to_find = "Sx2016"

Here, I want to find the string string_to_find inside welcome using regular expressions. 在这里,我想使用正则表达式在welcome内部找到字符串string_to_find I want to see if each character in string_to_find comes in the same order as in welcome . 我想看看在每一个字符string_to_find进来的顺序相同welcome

For instance, this expression would evaluate to True since the 'S' comes before the 'x' in both strings, the 'x' before the '2' , the '2' before the 0 , and so forth. 例如,该表达式将计算结果为True ,因为'S'来了之前'x'两个字符串时, 'x'前的'2' ,在'2'之前的0 ,等等。

Is there a simple way to do this using regex? 有没有一种简单的方法使用正则表达式来做到这一点?

Your answer is rather trivial. 您的回答很简单。 The .* character combination matches 0 or more characters. .*字符组合匹配0个或更多字符。 For your purpose, you would put it between all characters in there. 为了您的目的,您可以将其放在其中的所有字符之间。 As in S.*x.*2.*0.*1.*6 . S.*x.*2.*0.*1.*6 If this pattern is matched, then the string obeys your condition. 如果此模式匹配,则字符串符合您的条件。

For a general string you would insert the .* pattern between characters, also taking care of escaping special characters like literal dots, stars etc. that may otherwise be interpreted by regex. 对于一般字符串,您应在字符之间插入.*模式,同时还要避免转义特殊字符(如文字点,星号等),否则这些特殊字符可能会被正则表达式解释。

Use wildcard matches with . 与一起使用通配符匹配. , repeating with * : ,用*重复:

expression = 'S.*x.*2.*0.*1.*6'

You can also assemble this expression with join() : 您也可以使用join()汇编此表达式:

expression = '.*'.join('Sx2016')

Or just find it without a regular expression, checking whether the location of each of string_to_find 's characters within welcome proceeds in ascending order, handling the case where a character in string_to_find is not present in welcome by catching the ValueError : 或者只是不使用正则表达式查找它,检查string_to_find中每个字符在welcome的位置是否以升序进行,通过捕获ValueError处理string_to_find中的字符不出现在welcome的情况:

>>> welcome = "StackExchange 2016"
>>> string_to_find = "Sx2016"
>>> try:
...     result = [welcome.index(c) for c in string_to_find]
... except ValueError:
...     result = None
...
>>> print(result and result == sorted(result))
True

This function might fit your need 此功能可能符合您的需求

import re
def check_string(text, pattern):
    return re.match('.*'.join(pattern), text)

'.*'.join(pattern) create a pattern with all you characters separated by '.*' . '.*'.join(pattern)创建一个模式,其中所有字符都由'.*'分隔。 For instance 例如

>> ".*".join("Sx2016")
'S.*x.*2.*0.*1.*6'

Actually having a sequence of chars like Sx2016 the pattern that best serve your purpose is a more specific: 实际上具有一系列字符(例如Sx2016 ,最能满足您目的的模式是更具体的:

S[^x]*x[^2]*2[^0]*0[^1]*1[^6]*6

You can obtain this kind of check defining a function like this: 您可以获取定义如下功能的检查:

import re
def contains_sequence(text, seq):
    pattern = seq[0] + ''.join(map(lambda c: '[^' + c + ']*' + c, list(seq[1:])))
    return re.search(pattern, text)

This approach add a layer of complexity but brings a couple of advantages as well: 这种方法增加了一层复杂性,但也带来了两个优点:

  1. It's the fastest one because the regex engine walk down the string only once while the dot-star approach go till the end of the sequence and back each time a .* is used . 这是最快的一种方法,因为正则表达式引擎仅沿字符串走了一次,而点星方法一直走到序列的末尾, 每次使用.*都返回 Compare on the same string (~1k chars): 比较相同的字符串(约1k个字符):

  2. It works on multiline strings in input as well. 它也适用于输入中的多行字符串。

Example code 范例程式码

>>> sequence = 'Sx2016'
>>> inputs = ['StackExchange2015','StackExchange2016','Stack\nExchange\n2015','Stach\nExchange\n2016']
>>> map(lambda x: x + ': yes' if contains_sequence(x,sequence) else x + ': no', inputs)
['StackExchange2015: no', 'StackExchange2016: yes', 'Stack\nExchange\n2015: no', 'Stach\nExchange\n2016: yes']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM