简体   繁体   English

Pythonic方法找到匹配负正则表达式的字符串中的最后位置

[英]Pythonic way to find the last position in a string matching a negative regex

In Python, I try to find the last position in an arbitrary string that does match a given pattern, which is specified as negative character set regex pattern. 在Python中,我尝试在与给定模式匹配的任意字符串中找到最后一个位置,该模式被指定为负字符集正则表达式模式。 For example, with the string uiae1iuae200 , and the pattern of not being a number (regex pattern in Python for this would be [^0-9] ), I would need '8' (the last 'e' before the '200') as result. 例如,使用字符串uiae1iuae200 ,并且不是数字的模式(Python中的正则表达式模式为[^0-9] ),我需要'8'('200'之前的'e'' ) 结果。

What is the most pythonic way to achieve this? 实现这一目标的最pythonic方法是什么?

As it's a little tricky to quickly find method documentation and the best suited method for something in the Python docs (due to method docs being somewhere in the middle of the corresponding page, like re.search() in the re page ), the best way I quickly found myself is using re.search() - but the current form simply must be a suboptimal way of doing it: 因为在Python文档中快速找到方法文档和最适合的方法(由于方法文档位于相应页面中间的某个位置,如重新页面中的re.search() ),有点棘手,最好方式我很快发现自己正在使用re.search() - 但是当前的形式必须是一种次优的方式:

import re
string = 'uiae1iuae200' # the string to investigate
len(string) - re.search(r'[^0-9]', string[::-1]).start()

I am not satisfied with this for two reasons: - a) I need to reverse string before using it with [::-1] , and - b) I also need to reverse the resulting position (subtracting it from len(string) because of having reversed the string before. 我对此不满意有两个原因: - a)我需要在使用[::-1]之前反转string ,并且 - b)我还需要反转结果位置(从len(string)减去它因为以前扭转了弦。

There needs to be better ways for this, likely even with the result of re.search() . 需要有更好的方法,甚至可能是re.search()的结果。

I am aware of re.search(...).end() over .start() , but re.search() seems to split the results into groups, for which I did not quickly find a not-cumbersome way to apply it to the last matched group. 我知道re.search(...).end() over .start() ,但是re.search()似乎将结果拆分成组,为此我没有快速找到一种不麻烦的方式来申请它到最后一个匹配的组。 Without specifying the group, .start() , .end() , etc, seem to always match the first group, which does not have the position information about the last match. 如果不指定组, .start() .end()等似乎始终匹配第一个组,该组没有关于最后一个匹配的位置信息。 However, selecting the group seems to at first require the return value to temporarily be saved in a variable (which prevents neat one-liners), as I would need to access both the information about selecting the last group and then to select .end() from this group. 但是,选择组似乎首先要求将返回值临时保存在变量中(这会阻止整齐的单行),因为我需要访问有关选择最后一个组然后选择.end()来自这个群体。

What's your pythonic solution to this? 你的pythonic解决方案是什么? I would value being pythonic more than having the most optimized runtime. 我认为pythonic比拥有最优化的运行时更重要。

Update 更新

The solution should be functional also in corner cases, like 123 (no position that matches the regex), empty string, etc. It should not crash eg because of selecting the last index of an empty list. 解决方案在角落情况下也应该起作用,例如123 (没有与正则表达式匹配的位置),空字符串等。它不应该崩溃,例如因为选择空列表的最后一个索引。 However, as even my ugly answer above in the question would need more than one line for this, I guess a one-liner might be impossible for this (simply because one needs to check the return value of re.search() or re.finditer() before handling it). 然而,即使我在问题中上面的丑陋答案需要不止一行,我想这可能是不可能的(仅仅因为需要检查re.search()re.finditer()的返回值re.finditer()在处理之前)。 I'll accept pythonic multi-line solutions to this answer for this reason. 出于这个原因,我会接受pythonic多线解决方案。

You can use re.finditer to extract start positions of all matches and return the last one from list. 您可以使用re.finditer提取所有匹配项的起始位置,并从列表中返回最后一个匹配项。 Try this Python code: 试试这个Python代码:

import re
print([m.start(0) for m in re.finditer(r'\D', 'uiae1iuae200')][-1])

Prints: 打印:

8

Edit: For making the solution a bit more elegant to behave properly in for all kind of inputs, here is the updated code. 编辑:为了使解决方案更加优雅,以便在所有类型的输入中正常运行,这里是更新的代码。 Now the solution goes in two lines as the check has to be performed if list is empty then it will print -1 else the index value: 现在解决方案分为两行,因为如果列表为空则必须执行检查,然后它将打印-1否则索引值:

import re

arr = ['', '123', 'uiae1iuae200', 'uiae1iuae200aaaaaaaa']

for s in arr:
    lst = [m.start() for m in re.finditer(r'\D', s)]
    print(s, '-->', lst[-1] if len(lst) > 0 else None)

Prints the following, where if no such index is found then prints None instead of index: 打印以下内容,如果未找到此索引,则打印None而不是index:

 --> None
123 --> None
uiae1iuae200 --> 8
uiae1iuae200aaaaaaaa --> 19

Edit 2: As OP stated in his post, \\d was only an example we started with, due to which I came up with a solution to work with any general regex. 编辑2:正如OP在他的帖子中所述, \\d只是我们开始的一个例子,因此我提出了一个解决方案来处理任何一般的正则表达式。 But, if this problem has to be really done with \\d only, then I can give a better solution which would not require list comprehension at all and can be easily written by using a better regex to find the last occurrence of non-digit character and print its position. 但是,如果这个问题必须只用\\d来实现,那么我可以提供一个更好的解决方案,根本不需要列表理解,并且可以通过使用更好的正则表达式来查找最后出现的非数字字符来轻松编写并打印其位置。 We can use .*(\\D) regex to find the last occurrence of non-digit and easily print its index using following Python code: 我们可以使用.*(\\D)正则表达式查找最后一次出现的非数字,并使用以下Python代码轻松打印其索引:

import re

arr = ['', '123', 'uiae1iuae200', 'uiae1iuae200aaaaaaaa']

for s in arr:
    m = re.match(r'.*(\D)', s)
    print(s, '-->', m.start(1) if m else None)

Prints the string and their corresponding index of non-digit char and None if not found any: 打印字符串及其对应的非数字字符索引,如果没有找到则为None

 --> None
123 --> None
uiae1iuae200 --> 8
uiae1iuae200aaaaaaaa --> 19

And as you can see, this code doesn't need to use any list comprehension and is better as it can just find the index by just one regex call to match . 正如您所看到的,此代码不需要使用任何列表理解,并且更好,因为它只需通过一个正则表达式调用来match即可找到索引。

But in case OP indeed meant it to be written using any general regex pattern, then my above code using comprehension will be needed. 但是,如果OP确实意味着它使用任何一般的正则表达式模式编写,那么我将需要使用理解的上述代码。 I can even write it as a function that can take the regex (like \\d or even a complex one) as an argument and will dynamically generate a negative of passed regex and use that in the code. 我甚至可以把它写成一个函数,可以将正则表达式(如\\d甚至是复杂的)作为参数,并动态生成传递正则表达式的负数并在代码中使用它。 Let me know if this indeed is needed. 如果确实需要,请告诉我。

To me it sems that you just want the last position which matches a given pattern (in this case the not a number pattern). 对我而言,你只想要一个匹配给定模式的最后一个位置(在这种情况下不是一个数字模式)。
This is as pythonic as it gets: 这与pythonic一样:

import re

string = 'uiae1iuae200'
pattern = r'[^0-9]'

match = re.match(fr'.*({pattern})', string)
print(match.end(1) - 1 if match else None)

Output: 输出:

 8 

Or the exact same as a function and with more test cases: 或者与函数完全相同,并且有更多测试用例:

import re


def last_match(pattern, string):
    match = re.match(fr'.*({pattern})', string)
    return match.end(1) - 1 if match else None


cases = [(r'[^0-9]', 'uiae1iuae200'), (r'[^0-9]', '123a'), (r'[^0-9]', '123'), (r'[^abc]', 'abcabc1abc'), (r'[^1]', '11eea11')]

for pattern, string in cases:
    print(f'{pattern}, {string}: {last_match(pattern, string)}')

Output: 输出:

 [^0-9], uiae1iuae200: 8 [^0-9], 123a: 3 [^0-9], 123: None [^abc], abcabc1abc: 6 [^1], 11eea11: 4 

This does not look Pythonic because it's not a one-liner, and it uses range(len(foo)) , but it's pretty straightforward and probably not too inefficient. 这看起来不像Pythonic,因为它不是单行,它使用range(len(foo)) ,但它非常简单,可能效率不高。

def last_match(pattern, string):
    for i in range(1, len(string) + 1):
        substring = string[-i:]
        if re.match(pattern, substring):
            return len(string) - i

The idea is to iterate over the suffixes of string from the shortest to the longest, and to check if it matches pattern . 我们的想法是迭代string的后缀从最短到最长,并检查它是否与pattern匹配。

Since we're checking from the end, we know for sure that the first substring we meet that matches the pattern is the last. 由于我们从最后检查,我们确信我们遇到的匹配模式的第一个子串是最后一个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM