简体   繁体   中英

How to document a regular expression in itself?

I have a regular expression eg at regex101

(\/+[^\/\[\]]+(?:\[[^\]']*(?:'[^']*')\])?)+

I have verified that it matches my test string

//SapButton[@automationId='tbar[0]/btn[15]']

Since the Regex cannot be understood rightaway, I tried the documentation feature using (?#) , so I changed the Regex to also at regex101

((?# Capturing group for the type name)
\/+(?# Start with / or // )
[^\/\[\]]+(?# Type name exclusing start of attribute and next type)
(?:(?# Non-capturing group for the attribute)
\[(?# Start of an attribute)
[^\]']*(?# Anything but end of attribute or start of string)
(?:(?# non-capturing group for string)
'(?# string start)
[^']*(?# anything inside the string, except end of string)
'(?# string end)
)(?# end of string group)
\](?# end of attribute)
)?(?# Attribute can occur 0 or one time)
)+(?# Type can occur once or many times)

But now the regex does not match my test string any more. The reason are the newlines. Changing the Regex to

((?# Capturing group for the type name)\/+(?# Start with / or // )[^\/\[\]]+(?# Type name exclusing start of attribute and next type)(?:(?# Non-capturing group for the attribute)\[(?# Start of an attribute)[^\]']*(?# Anything but end of attribute or start of string)(?:(?# non-capturing group for string)'(?# string start)[^']*(?# anything inside the string, except end of string)'(?# string end))(?# end of string group)\](?# end of attribute))?(?# Attribute can occur 0 or one time))+(?# Type can occur once or many times)

works. But it is unreadable again.

How do I document a regular expression in itself properly?

Note that I want to avoid doing it in the comment of the C# method since this has too much potential for not being updated when the regex is changed.

IMHO, it would be best done in a verbatim string with multiple lines (but it still has to work, of course).

There is the Ignore White Space option

The problem is that then you'll have to escape spaces and # with a \\ . The good news is that # will begin a comment, like // in C#

You can activate it with RegexOptions.IgnorePatternWhitespace or with (?x) at the beginning of the regex.

(?x) is supported by https://regex101.com/

It isn't exactly a desirable solution, but you can store your documented regexp in a String and, before match, you can replace all \\r\\n from your String.

Then, you will have a readable representation in your code, and a correct regexp at runtime.

I don't know other way, but what you with is laudable.

Regards

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM