简体   繁体   English

如何使用 '?' 在python中两个匹配模式之间提取可选子字符串?

[英]How to use '?' to extract optional substring between two matching pattern in python?

I was answering this questio n. 我正在回答这个问题 Consider this string 考虑这个字符串

str1 = '{"show permission allowed to 16": "show permission to 16\\nSchool permissions from group 17:student to group 16:teacher:\\n\\tAllow ALL-00\\nSchool permissions from group 18:library to group 16(Temp):teacher:\\n\\tNo Allow ALL-00\\nSchool permissions from group 20:Gym to group 16:teacher:\\n\\tCheck ALL-00\\nRTYAHY: FALSE\\nRTYAHY: FALSE\\n\\n#"}'

and suppose I want to extract the number after each substring from group and the substring after \\\\t having the smallest matching string. 并假设我from group提取每个子字符串之后的数字,并在\\\\t之后的子字符串中提取具有最小匹配字符串的数字。

I did this with the following regular expression 我使用以下正则表达式执行了此操作

import re
res = re.findall(r'from group (\d+).*?\\t(.*? ALL-..)', str1)

output is: 输出为:

[('17', 'Allow ALL-00'), ('18', 'No Allow ALL-00'), ('20', 'Check ALL-00')]

Now between each substring I am extracting (the number and the substring after \\t ) there might be an optional substring, whose value is Temp I want to extract (if present). 现在,在我提取的每个子字符串之间(数字和\\t之后的子字符串),可能会有一个可选的子字符串,其值为我想提取的Temp (如果存在)。 For example between 18 and No Allow ALL-00 there is the substring Temp I would like to extract. 例如,在18No Allow ALL-00有一个我想提取的子字符串Temp

I tried using ? 我尝试使用? as follows: 如下:

res = re.findall(r'from group (\d+).*?(Temp)?.*?\\t(.*? ALL-..)', str1)

but the corresponding second element of the resulting tuples is always empty: 但是结果元组的相应第二个元素始终为空:

[('17', '', 'Allow ALL-00'), ('18', '', 'No Allow ALL-00'), ('20', '', 'Check ALL-00')]

while I was expecting something like: 当我期待类似的东西:

[('17', '', 'Allow ALL-00'), ('18', 'Temp', 'No Allow ALL-00'), ('20', '', 'Check ALL-00')]

How to extract substrings in this case? 在这种情况下如何提取子字符串? What is the mistake I am doing? 我在做什么错?

One further question: suppose I want my resulting list not having this element (the one containing Temp ): should I just use [^] and then the corresponding matching pattern? 另一个问题:假设我希望得到的列表中没有这个元素(一个包含Temp元素):我应该只使用[^]然后使用相应的匹配模式吗?

The reason why it is not capturing Temp is because you have made it optional due to which .*? 它没有捕获Temp的原因是由于哪个.*? consumes it, and Temp does not get captured in your optional group. 使用它,并且Temp不会被捕获到您的可选组中。

To solve that problem, you can use negative look ahead to reject Temp getting captured except any other character using this regex, 要解决该问题,您可以使用否定前瞻来拒绝捕获的Temp ,除了使用此正则表达式的其他任何字符外,

from group (\d+)(?:(?!Temp).)*?(Temp)?(?:(?!Temp).)*?\\t(.*? ALL-..)
                   ^^^^^^^^^ This rejects Temp getting captured except any other character

Regex explanation: 正则表达式说明:

  • from group - literal matching of this text from group -此文本的文字匹配
  • (?:(?!Temp).)*? - ?: means its a non-capturing group which by default is a capturing group and this means that capturing anything but stop when you see Temp string and * means capture zero or more characters. - ?:表示它是一个非捕获组,默认情况下是捕获组,这意味着捕获时除了看到Temp字符串外,所有捕获都停止, *表示捕获零个或多个字符。 So this captures any string which doesn't contain Temp and ? 因此,它将捕获任何不包含Temp?字符串? means as less as possible 意味着尽可能少
  • (Temp)? - Optionally capture Temp if present -可以选择捕获Temp如果存在)
  • (?:(?!Temp).)*? - Again capture any character zero or more times except Temp just like above -再次捕获零个或更多次除Temp以外的任何字符,就像上面一样
  • \\\\t - capture this literally \\\\t逐字记录
  • (.*? ALL-..) - Capturing any character as less as possible followed by a space followed by literal ALL- followed by any two characters (.*? ALL-..) -捕获尽可能少的字符,后跟空格,然后是文字ALL-捕获任意两个字符

Hope this clarifies the regex. 希望这可以澄清正则表达式。 Let me know in case you have any further queries. 如果您还有其他疑问,请告诉我。

Demo 演示

Sample Python Codes, 示例Python代码,

import re

s = '{"show permission allowed to 16": "show permission to 16\\nSchool permissions from group 17:student to group 16:teacher:\\n\\tAllow ALL-00\\nSchool permissions from group 18:library to group 16(Temp):teacher:\\n\\tNo Allow ALL-00\\nSchool permissions from group 20:Gym to group 16:teacher:\\n\\tCheck ALL-00\\nRTYAHY: FALSE\\nRTYAHY: FALSE\\n\\n#"}'

arr = re.findall(r'from group (\d+)(?:(?!Temp).)*?(Temp)?(?:(?!Temp).)*?\\t(.*? ALL-..)',s)
print(arr)

Prints, 打印,

[('17', '', 'Allow ALL-00'), ('18', 'Temp', 'No Allow ALL-00'), ('20', '', 'Check ALL-00')]

Edit: For listing only tuples that does not contain Temp 编辑:仅列出不包含Temp元组

You will need to use this regex to avoid matching substring that contains Temp string within the match, 您将需要使用此正则表达式来避免匹配匹配项中包含Temp字符串的子字符串,

from group (\d+)(?:(?!Temp).)*\\t(.*? ALL-..)

Demo 演示

Sample Python code, 示例Python代码,

import re

str1 = '{"show permission allowed to 16": "show permission to 16\\nSchool permissions from group 17:student to group 16:teacher:\\n\\tAllow ALL-00\\nSchool permissions from group 18:library to group 16(Temp):teacher:\\n\\tNo Allow ALL-00\\nSchool permissions from group 20:Gym to group 16:teacher:\\n\\tCheck ALL-00\\nRTYAHY: FALSE\\nRTYAHY: FALSE\\n\\n#"}'

arr = re.findall(r'from group (\d+)(?:(?!Temp).)*\\t(.*? ALL-..)',str1)
print(arr)

Prints, 打印,

[('17', 'Allow ALL-00'), ('20', 'Check ALL-00')]

Which does not contain the tuple having Temp 其中不包含具有Temp的元组

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM