简体   繁体   English

如何记录正则表达式本身?

[英]How to document a regular expression in itself?

I have a regular expression eg at regex101 我有一个正则表达式, 例如在regex101

(\/+[^\/\[\]]+(?:\[[^\]']*(?:'[^']*')\])?)+

I have verified that it matches my test string 我已验证它与我的测试字符串匹配

//SapButton[@automationId='tbar[0]/btn[15]']

Since the Regex cannot be understood rightaway, I tried the documentation feature using (?#) , so I changed the Regex to also at regex101 由于无法立即理解正则表达式,因此我尝试使用(?#)进行文档功能,因此我将正则表达式更改为regex101

((?# Capturing group for the type name)
\/+(?# Start with / or // )
[^\/\[\]]+(?# Type name exclusing start of attribute and next type)
(?:(?# Non-capturing group for the attribute)
\[(?# Start of an attribute)
[^\]']*(?# Anything but end of attribute or start of string)
(?:(?# non-capturing group for string)
'(?# string start)
[^']*(?# anything inside the string, except end of string)
'(?# string end)
)(?# end of string group)
\](?# end of attribute)
)?(?# Attribute can occur 0 or one time)
)+(?# Type can occur once or many times)

But now the regex does not match my test string any more. 但是现在正则表达式不再匹配我的测试字符串。 The reason are the newlines. 原因是换行符。 Changing the Regex to 将正则表达式更改为

((?# Capturing group for the type name)\/+(?# Start with / or // )[^\/\[\]]+(?# Type name exclusing start of attribute and next type)(?:(?# Non-capturing group for the attribute)\[(?# Start of an attribute)[^\]']*(?# Anything but end of attribute or start of string)(?:(?# non-capturing group for string)'(?# string start)[^']*(?# anything inside the string, except end of string)'(?# string end))(?# end of string group)\](?# end of attribute))?(?# Attribute can occur 0 or one time))+(?# Type can occur once or many times)

works. 作品。 But it is unreadable again. 但这又是不可读的。

How do I document a regular expression in itself properly? 如何正确记录正则表达式本身?

Note that I want to avoid doing it in the comment of the C# method since this has too much potential for not being updated when the regex is changed. 请注意,我想避免在C#方法的注释中这样做,因为在更改正则表达式时,它有太多无法更新的可能性。

IMHO, it would be best done in a verbatim string with multiple lines (but it still has to work, of course). 恕我直言,最好用多行的逐字字符串来完成(当然,它仍然必须起作用)。

There is the Ignore White Space option 有“ 忽略空白”选项

The problem is that then you'll have to escape spaces and # with a \\ . 问题在于,那么您将不得不使用\\来转义空格和# The good news is that # will begin a comment, like // in C# 好消息是#将开始注释,例如//在C#中

You can activate it with RegexOptions.IgnorePatternWhitespace or with (?x) at the beginning of the regex. 您可以使用RegexOptions.IgnorePatternWhitespace或正则表达式开头的(?x)激活它。

(?x) is supported by https://regex101.com/ (?x)https://regex101.com/支持

It isn't exactly a desirable solution, but you can store your documented regexp in a String and, before match, you can replace all \\r\\n from your String. 这并不是一个理想的解决方案,但是您可以将记录的正则表达式存储在String中,并且在匹配之前,可以替换String中的所有\\ r \\ n。

Then, you will have a readable representation in your code, and a correct regexp at runtime. 然后,您的代码中将具有可读的表示形式,并且在运行时具有正确的正则表达式。

I don't know other way, but what you with is laudable. 我不知道别的办法,但是您所拥有的值得称赞。

Regards 问候

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM