简体   繁体   English

如何使用正则表达式验证字符串格式

[英]How to Use a Regex to Validate the Format of a String

Let's say I have a string that looks something like: 假设我有一个看起来像这样的字符串:

first_string = "(white cats || 'dogs) && ($1,000 || $500-$900' || 1,000+)"

And I replace each word with the text "replace" by doing: 我通过执行以下操作将每个单词替换为文本“替换”:

new_string = re.sub(r'[\w$\-+,][\w$\-+,\t ]*[\w$\-+,]|[\w$\-+,],', "replace", first_string, flags=re.IGNORECASE)

And I get out: 然后我出去:

new_string = "(replace || replace) && (replace || replace || replace)"

This works fine. 这很好。 But I'd like to validate that new_string has a particular format. 但我想验证new_string是否具有特定格式。

For example, is there a way using a regex, to make sure that new_string fits the above general format where: 例如,是否可以使用正则表达式来确保new_string适合上述常规格式,其中:

  • There are always sets of parens, separated by an && 总是有几组括号,以&&分隔
  • Each paren set contains strings separated by || 每个paren集包含用||分隔的字符串
  • Where the number of strings in each paren set and the number of paren sets could vary? 每个paren组中的字符串数和paren组中的数目可以在哪里变化?

Not used regex. 未使用正则表达式。

def is_valid(s):
    def surrounded_by_parens(s, next_validation):
        s = s.strip()
        return s.startswith('(') and s.endswith(')') and next_validation(s[1:-1])
    def separated_by_bars(s):
        return all(x.strip() == 'replace' for x in s.split('||'))
    return all(surrounded_by_parens(x, separated_by_bars) for x in s.split('&&'))

assert is_valid("(replace || replace) && (replace || replace || replace)")
assert is_valid("(replace || replace)")
assert not is_valid("(replace replace) && (replace || replace || replace)")
assert not is_valid("(replace || replace) (replace || replace || replace)")

It is always possible to make regex match any set of configurations or formats you want it to. 总是可以使正则表达式匹配您想要的任何一组配置或格式。 However some of the strings required to match a set of formats are incredibly long. 但是,与一组格式匹配所需的某些字符串非常长。 This one isn't too bad: 这个还算不错:

re.match(r"\( \w+ (\|\| \w+ )*\)( && \( \w+ (\|\| \w+ )*\))*$", new_string)

This will match: 这将匹配:

( replace )
( replace || replace || replace )
( replace || replace ) && ( replace )
( replace || replace ) && ( replace || replace ) && ( replace || replace )

You can check your string structure with this pattern: 您可以使用以下模式检查字符串结构:

^(?:(?:^|\s*[&|]{2}\s*)\([^|)]+(?:\s*\|\|\s*[^|)]+)*\))*$

if && can be inside parenthesis too, you can use: 如果&&也可以放在括号内,则可以使用:

^(?:(?:^|\s*[&|]{2}\s*)\([^&|)]+(?:\s*[&|]{2}\s*[^&|)]+)*\))*$

If your replacement pattern is good you don't need to check if the parent and the "child" have the same structure. 如果您的替换模式很好,则无需检查父级和“子级”结构是否相同。

Notice: if you want to allow void parenthesis, replace all the + quantifiers by * 注意:如果要允许使用无效括号,请将所有+量词替换为*

Description 描述

This regex will match the described foramt (replace || replace) && (replace || replace || replace) where: 此正则表达式将与描述的foramt (replace || replace) && (replace || replace || replace)匹配,其中:

  • There are always sets of parens, separated by an && 总是有几组括号,以&&分隔
  • Each paren set contains strings separated by || 每个paren集包含用||分隔的字符串
  • Where the number of strings in each paren set and the number of paren sets could vary? 每个paren组中的字符串数和paren组中的数目可以在哪里变化?

^(?:(?:&&|^)\\s*\\((?:(?:\\|\\|\\s*)?\\S+\\s*(?=\\|\\||\\)))+\\)\\s*(?=(?:&&|$)))+

在此处输入图片说明

Input text: 输入文本:

(Areplace || replace) && (replace || replace || replace)
(Breplace || replace) fda && (replace || replace || replace)
(Creplace || replace) && (replace || replace || replace) && (Creplace || replace) 
(whitecats || 'dogs) && ($1,000 || $500-$900' || 1,000+)

Matches 火柴

[0] => (Areplace || replace) && (replace || replace || replace)
[1] => (Creplace || replace) && (replace || replace || replace) && (Creplace || replace) 
[2] => (whitecats || 'dogs) && ($1,000 || $500-$900' || 1,000+)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM