[英]How to Use a Regex to Validate the Format of a String
Let's say I have a string that looks something like: 假设我有一个看起来像这样的字符串:
first_string = "(white cats || 'dogs) && ($1,000 || $500-$900' || 1,000+)"
And I replace each word with the text "replace" by doing: 我通过执行以下操作将每个单词替换为文本“替换”:
new_string = re.sub(r'[\w$\-+,][\w$\-+,\t ]*[\w$\-+,]|[\w$\-+,],', "replace", first_string, flags=re.IGNORECASE)
And I get out: 然后我出去:
new_string = "(replace || replace) && (replace || replace || replace)"
This works fine. 这很好。 But I'd like to validate that new_string has a particular format.
但我想验证new_string是否具有特定格式。
For example, is there a way using a regex, to make sure that new_string fits the above general format where: 例如,是否可以使用正则表达式来确保new_string适合上述常规格式,其中:
&&
&&
分隔 ||
||
分隔的字符串 Not used regex. 未使用正则表达式。
def is_valid(s):
def surrounded_by_parens(s, next_validation):
s = s.strip()
return s.startswith('(') and s.endswith(')') and next_validation(s[1:-1])
def separated_by_bars(s):
return all(x.strip() == 'replace' for x in s.split('||'))
return all(surrounded_by_parens(x, separated_by_bars) for x in s.split('&&'))
assert is_valid("(replace || replace) && (replace || replace || replace)")
assert is_valid("(replace || replace)")
assert not is_valid("(replace replace) && (replace || replace || replace)")
assert not is_valid("(replace || replace) (replace || replace || replace)")
It is always possible to make regex match any set of configurations or formats you want it to. 总是可以使正则表达式匹配您想要的任何一组配置或格式。 However some of the strings required to match a set of formats are incredibly long.
但是,与一组格式匹配所需的某些字符串非常长。 This one isn't too bad:
这个还算不错:
re.match(r"\( \w+ (\|\| \w+ )*\)( && \( \w+ (\|\| \w+ )*\))*$", new_string)
This will match: 这将匹配:
( replace )
( replace || replace || replace )
( replace || replace ) && ( replace )
( replace || replace ) && ( replace || replace ) && ( replace || replace )
You can check your string structure with this pattern: 您可以使用以下模式检查字符串结构:
^(?:(?:^|\s*[&|]{2}\s*)\([^|)]+(?:\s*\|\|\s*[^|)]+)*\))*$
if &&
can be inside parenthesis too, you can use: 如果
&&
也可以放在括号内,则可以使用:
^(?:(?:^|\s*[&|]{2}\s*)\([^&|)]+(?:\s*[&|]{2}\s*[^&|)]+)*\))*$
If your replacement pattern is good you don't need to check if the parent and the "child" have the same structure. 如果您的替换模式很好,则无需检查父级和“子级”结构是否相同。
Notice: if you want to allow void parenthesis, replace all the +
quantifiers by *
注意:如果要允许使用无效括号,请将所有
+
量词替换为*
This regex will match the described foramt (replace || replace) && (replace || replace || replace)
where: 此正则表达式将与描述的foramt
(replace || replace) && (replace || replace || replace)
匹配,其中:
^(?:(?:&&|^)\\s*\\((?:(?:\\|\\|\\s*)?\\S+\\s*(?=\\|\\||\\)))+\\)\\s*(?=(?:&&|$)))+
Input text: 输入文本:
(Areplace || replace) && (replace || replace || replace)
(Breplace || replace) fda && (replace || replace || replace)
(Creplace || replace) && (replace || replace || replace) && (Creplace || replace)
(whitecats || 'dogs) && ($1,000 || $500-$900' || 1,000+)
Matches 火柴
[0] => (Areplace || replace) && (replace || replace || replace)
[1] => (Creplace || replace) && (replace || replace || replace) && (Creplace || replace)
[2] => (whitecats || 'dogs) && ($1,000 || $500-$900' || 1,000+)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.