简体   繁体   English

如何在 python 中按顺序检查带有模式列表的输入字符串?

[英]how to check input string with list of pattern sequentially in python?

I have specific patterns which composed of string, numbers and special character in specific order.我有特定的模式,由特定顺序的字符串、数字和特殊字符组成。 I would like to check input string is in the list of pattern that I created and print error if seeing incorrect input.我想检查输入字符串是否在我创建的模式列表中,如果看到不正确的输入则打印错误。 To do so, I tried of using regex but my code is not neat enough.为此,我尝试使用regex ,但我的代码不够简洁。 I am wondering if someone help me with this.我想知道是否有人帮助我解决这个问题。

use case用例

I have input att2_epic_app_clm1_sub_valid , where I split them by _ ;我输入att2_epic_app_clm1_sub_valid ,我将它们分开_ here is list of pattern I am expecting to check and print error if not match.如果不匹配,这是我希望检查并打印错误的模式列表。

Rule:规则:

input should start with att and some number like [att][0-6]* , or [ptt][0-6] ;输入应以 att 和一些数字开头,例如[att][0-6]*[ptt][0-6] after that it should be continued at either epic or semi , then it should be continued with [app][0-6] or [app][0-6_][clm][0-9_]+[sub|sup] ;之后它应该继续以epicsemi ,然后应该继续[app][0-6][app][0-6_][clm][0-9_]+[sub|sup] then it should end with [valid|Invalid]那么它应该以[valid|Invalid]结尾

so I composed this pattern with re but when I passed invalid input, it is not detected and I expect error instead.所以我用re组成了这个模式,但是当我传递无效输入时,它没有被检测到,我期待错误。

import re

acceptable_pattern=re.compile(r'([att]+[0-6_])(epic|semi_)([app]+[0-6_]+[clm]+[0-6_])([sub|sup_])([valid|invalid]))'
    input='att1_epic_app2_clm1_sub_valid'   # this is valid string

wlist=input.split('_')
for each in wlist:
  if any(ext in each for ext in acceptable_pattern): 
     print("valid")
  else:
     print("invalid")

this is not quite working because I have to check the string from beginning to end where split the string by _ where each new string much match of of the predefined rule such as:这不太有效,因为我必须从头到尾检查字符串,其中将字符串拆分为_ ,其中每个新字符串与预定义规则非常匹配,例如:

input string should start with att|ptt which end with between 1-6;输入字符串应以 att|ptt 开头,以 1-6 结尾; then next new word either epic or semi;然后是下一个新词,史诗或半; then it should be app or app1~app6 or app{1_6} clm{1~6} {sub|sup_};那么它应该是 app 或 app1~app6 或 app{1_6} clm{1~6} {sub|sup_}; then string end with {valid|invalid};然后字符串以 {valid|invalid} 结尾;

how should I specify those rules by using re.compile to check pattern in input string and raise error if it is not sequentially?我应该如何通过使用 re.compile 来检查输入字符串中的模式并在不是顺序的情况下引发错误来指定这些规则? How should we do this in python?我们应该如何在 python 中做到这一点? any quick way of making this happen?有什么快速的方法可以做到这一点?

Instead of using split, you could consider writing a pattern that validates the whole string.您可以考虑编写一个验证整个字符串的模式,而不是使用拆分。

If I am reading the requirements, you might use:如果我正在阅读要求,您可以使用:

^[ap]tt[0-6]_(?:epic|semi)_app(?:[1-6]|[1-6_]clm[0-9]*_su[bp])?_valid$
  • ^ Start of string ^字符串开头
  • [ap]tt[0-6] match att or ptt and a digit 0-6 [ap]tt[0-6]匹配attptt和一个数字 0-6
  • _(?:epic|semi) Match _epic or _semi _(?:epic|semi)匹配_epic_semi
  • _app Match literally _app从字面上匹配
  • (?: Non capture group for the alternation (?:交替的非捕获组
    • [1-6] Match a digit 1-6 [1-6]匹配一个数字 1-6
    • | Or或者
    • [1-6_]clm[0-9]*_su[bp] Match a digit 1-6 or _ , then clm followed by optional digit 0-9 and then _sub or _sup [1-6_]clm[0-9]*_su[bp]匹配数字 1-6 或_ ,然后是clm后跟可选数字 0-9,然后是_sub_sup
  • )? Close the non capture group and make it optional关闭非捕获组并使其可选
  • _valid Match literally _valid匹配字面意思
  • $ End of string $字符串结尾

See a regex demo .查看正则表达式演示

If the string can also start with dev then you can use an alternation:如果字符串也可以以 dev 开头,那么您可以使用交替:

^(?:[ap]tt|dev)[0-6]_(?:epic|semi)_app(?:[1-6]|[1-6_]clm[0-9]*_su[bp])?_valid$

See another regex demo .查看另一个正则表达式演示

Then you can check if there was a match:然后你可以检查是否有匹配:

import re

pattern = r"^(?:[ap]tt|dev)[0-6]_(?:epic|semi)_app(?:[1-6]|[1-6_]clm[0-9]*_su[bp])?_valid$"

strings = [
    "att2_epic_app_clm1_sub_valid",
    "att12_epic_app_clm1_sub_valid",
    "att2_epic_app_valid",
    "att2_epic_app_clm1_sub_valid"
]

for s in strings:
    m = re.match(pattern, s, re.M)
    if m:
        print("Valid: " + m.group())
    else:
        print("Invalid: " + s)

Output Output

Valid: att2_epic_app_clm1_sub_valid
Invalid: att12_epic_app_clm1_sub_valid
Valid: att2_epic_app_valid
Valid: att2_epic_app_clm1_sub_valid

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM