正则表达式：最小的子字符串匹配

Question

I have url strings such as: 我有如下网址字符串：

"https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/slide_3/"

Now, I need to capture the slide_3 part, more specifically the start position of the digit 3 on constraint that it should be a single digit( neither preceded nor succeeded by any digit) not preceded by an "=". 现在，我需要捕获slide_3部分，更具体地讲，数字3的开始位置受约束，即它应该是一个单一的数字（既不能在任何数字之前也不在其后），而不能以“ =”开头。 So, pageid=2 shouldn't match while slide_3 should. 因此， pageid=2不应该匹配，而slide_3应该匹配。

I tried this with python regex: 我用python regex尝试过这个：

p = re.compile('/.*(?<!=)(?<!\d)\d(?!\d).*/')
s = "https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/slide_3/"

for m in p.finditer(s):
    print(m.start(), m.group())

and the result is 结果是

6 //facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/slide_3/

I understand why I get this, the first and the last "/" satisfy the regexp but so does the substring "/slide_3/". 我知道为什么得到这个，第一个和最后一个“ /”满足正则表达式，但是子字符串“ / slide_3 /”也是如此。

How do I make sure I get the smallest substring that matches the regex. 如何确保获得与正则表达式匹配的最小子字符串。

Why doesn't this work: 为什么这样不起作用：

'/[^/](?<!=)(?<!\d)\d(?!\d).*/'

Non greedy operator .*? 非贪婪运算符.*? does not seem to do the trick since it does not guarantee the shortest possible match. 似乎无法解决问题，因为它不能保证最短的比赛。

Strings that should match: 应该匹配的字符串：

"https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/slide_3/" 
"https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/sno3/"
"https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/3/"

and the matches should be slide_3 , sno3, 3 respectively 并且匹配项应分别为slide_3，sno3，3

Strings which shouldn't: 不应该的字符串：

"https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/slide/"
"https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/slide_33/"
"https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/33/"

Answer 1

If I understand your question then you can use this to check if a string matches your expected pattern: 如果我理解您的问题，那么您可以使用它来检查字符串是否与您期望的模式匹配：

(?:^.*\/)([^\d]*\d)(?:\/?$)

and \\1 will contain: 和\\1将包含：

slide_3
sno3
3

https://regex101.com/r/h0rNdC/4 https://regex101.com/r/h0rNdC/4

This could be useful in getting the index of the match: Python Regex - How to Get Positions and Values of Matches 这可能对获取匹配项的索引很有用： Python Regex-如何获取匹配项的位置和值

Answer 2

You could match the forward slash, then match 0+ times any char except a digit, / , = or a newline. 您可以匹配正斜杠，然后匹配0+乘除数字， / ， =或换行符以外的任何字符。

Capture a single digit in a capturing group and match the trailing forward slash. 在捕获组中捕获一位数字并匹配尾随的正斜杠。

To get the start and the end indices of the match, you could for example use re.search which will return a match object . 要获取比赛的开始和结束索引，例如，您可以使用re.search ，它将返回一个match对象。

/[^\d/=\r\n]*(\d)/

regex demo | regex演示 | Python demo Python演示

For example 例如

import re

regex = r"/[^\d/=\r\n]*(\d)/"
strings = [
    "https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/slide_3/",
    "https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/sno3/",
    "https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/3/",
    "https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/slide/",
    "https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/slide_33/",
    "https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/33/"
]

for s in strings:
    matches = re.search(regex, s)
    if matches:
        print ("Group {groupNum} found at {start}-{end} value:{group}".format(groupNum = 1, start = matches.start(1), end = matches.end(1), group = matches.group(1)))

Result 结果

Group 1 found at 74-75 value:3
Group 1 found at 71-72 value:3
Group 1 found at 68-69 value:3

正则表达式：最小的子字符串匹配

问题描述

2 个解决方案

解决方案1
0 已采纳 2019-08-27 12:28:04

解决方案2
0 2019-08-27 17:41:11

正则表达式：最小的子字符串匹配

问题描述

2 个解决方案

解决方案1 0 已采纳 2019-08-27 12:28:04

解决方案2 0 2019-08-27 17:41:11

解决方案1
0 已采纳 2019-08-27 12:28:04

解决方案2
0 2019-08-27 17:41:11