[英]How to use '?' to extract optional substring between two matching pattern in python?
I was answering this questio n. 我正在回答这个问题 。 Consider this string
考虑这个字符串
str1 = '{"show permission allowed to 16": "show permission to 16\\nSchool permissions from group 17:student to group 16:teacher:\\n\\tAllow ALL-00\\nSchool permissions from group 18:library to group 16(Temp):teacher:\\n\\tNo Allow ALL-00\\nSchool permissions from group 20:Gym to group 16:teacher:\\n\\tCheck ALL-00\\nRTYAHY: FALSE\\nRTYAHY: FALSE\\n\\n#"}'
and suppose I want to extract the number after each substring from group
and the substring after \\\\t
having the smallest matching string. 并假设我
from group
提取每个子字符串之后的数字,并在\\\\t
之后的子字符串中提取具有最小匹配字符串的数字。
I did this with the following regular expression 我使用以下正则表达式执行了此操作
import re
res = re.findall(r'from group (\d+).*?\\t(.*? ALL-..)', str1)
output is: 输出为:
[('17', 'Allow ALL-00'), ('18', 'No Allow ALL-00'), ('20', 'Check ALL-00')]
Now between each substring I am extracting (the number and the substring after \\t
) there might be an optional substring, whose value is Temp
I want to extract (if present). 现在,在我提取的每个子字符串之间(数字和
\\t
之后的子字符串),可能会有一个可选的子字符串,其值为我想提取的Temp
(如果存在)。 For example between 18
and No Allow ALL-00
there is the substring Temp
I would like to extract. 例如,在
18
到No Allow ALL-00
有一个我想提取的子字符串Temp
。
I tried using ?
我尝试使用
?
as follows: 如下:
res = re.findall(r'from group (\d+).*?(Temp)?.*?\\t(.*? ALL-..)', str1)
but the corresponding second element of the resulting tuples is always empty: 但是结果元组的相应第二个元素始终为空:
[('17', '', 'Allow ALL-00'), ('18', '', 'No Allow ALL-00'), ('20', '', 'Check ALL-00')]
while I was expecting something like: 当我期待类似的东西:
[('17', '', 'Allow ALL-00'), ('18', 'Temp', 'No Allow ALL-00'), ('20', '', 'Check ALL-00')]
How to extract substrings in this case? 在这种情况下如何提取子字符串? What is the mistake I am doing?
我在做什么错?
One further question: suppose I want my resulting list not having this element (the one containing Temp
): should I just use [^]
and then the corresponding matching pattern? 另一个问题:假设我希望得到的列表中没有这个元素(一个包含
Temp
元素):我应该只使用[^]
然后使用相应的匹配模式吗?
The reason why it is not capturing Temp
is because you have made it optional due to which .*?
它没有捕获
Temp
的原因是由于哪个.*?
consumes it, and Temp
does not get captured in your optional group. 使用它,并且
Temp
不会被捕获到您的可选组中。
To solve that problem, you can use negative look ahead to reject Temp
getting captured except any other character using this regex, 要解决该问题,您可以使用否定前瞻来拒绝捕获的
Temp
,除了使用此正则表达式的其他任何字符外,
from group (\d+)(?:(?!Temp).)*?(Temp)?(?:(?!Temp).)*?\\t(.*? ALL-..)
^^^^^^^^^ This rejects Temp getting captured except any other character
Regex explanation: 正则表达式说明:
from group
- literal matching of this text from group
-此文本的文字匹配 (?:(?!Temp).)*?
- ?:
means its a non-capturing group which by default is a capturing group and this means that capturing anything but stop when you see Temp
string and *
means capture zero or more characters. ?:
表示它是一个非捕获组,默认情况下是捕获组,这意味着捕获时除了看到Temp
字符串外,所有捕获都停止, *
表示捕获零个或多个字符。 So this captures any string which doesn't contain Temp
and ?
Temp
和?
字符串?
means as less as possible (Temp)?
- Optionally capture Temp
if present Temp
如果存在) (?:(?!Temp).)*?
- Again capture any character zero or more times except Temp
just like above Temp
以外的任何字符,就像上面一样 \\\\t
- capture this literally \\\\t
逐字记录 (.*? ALL-..)
- Capturing any character as less as possible followed by a space followed by literal ALL-
followed by any two characters (.*? ALL-..)
-捕获尽可能少的字符,后跟空格,然后是文字ALL-
捕获任意两个字符 Hope this clarifies the regex. 希望这可以澄清正则表达式。 Let me know in case you have any further queries.
如果您还有其他疑问,请告诉我。
Sample Python Codes, 示例Python代码,
import re
s = '{"show permission allowed to 16": "show permission to 16\\nSchool permissions from group 17:student to group 16:teacher:\\n\\tAllow ALL-00\\nSchool permissions from group 18:library to group 16(Temp):teacher:\\n\\tNo Allow ALL-00\\nSchool permissions from group 20:Gym to group 16:teacher:\\n\\tCheck ALL-00\\nRTYAHY: FALSE\\nRTYAHY: FALSE\\n\\n#"}'
arr = re.findall(r'from group (\d+)(?:(?!Temp).)*?(Temp)?(?:(?!Temp).)*?\\t(.*? ALL-..)',s)
print(arr)
Prints, 打印,
[('17', '', 'Allow ALL-00'), ('18', 'Temp', 'No Allow ALL-00'), ('20', '', 'Check ALL-00')]
Edit: For listing only tuples that does not contain Temp
编辑:仅列出不包含
Temp
元组
You will need to use this regex to avoid matching substring that contains Temp
string within the match, 您将需要使用此正则表达式来避免匹配匹配项中包含
Temp
字符串的子字符串,
from group (\d+)(?:(?!Temp).)*\\t(.*? ALL-..)
Sample Python code, 示例Python代码,
import re
str1 = '{"show permission allowed to 16": "show permission to 16\\nSchool permissions from group 17:student to group 16:teacher:\\n\\tAllow ALL-00\\nSchool permissions from group 18:library to group 16(Temp):teacher:\\n\\tNo Allow ALL-00\\nSchool permissions from group 20:Gym to group 16:teacher:\\n\\tCheck ALL-00\\nRTYAHY: FALSE\\nRTYAHY: FALSE\\n\\n#"}'
arr = re.findall(r'from group (\d+)(?:(?!Temp).)*\\t(.*? ALL-..)',str1)
print(arr)
Prints, 打印,
[('17', 'Allow ALL-00'), ('20', 'Check ALL-00')]
Which does not contain the tuple having Temp
其中不包含具有
Temp
的元组
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.