POSIX, aka "The Open Group Base Specifications Issue 7, 2018 edition" , has this to say about regular expression operator precedence:
9.4.8 ERE Precedence
The order of precedence shall be as shown in the following table:
ERE Precedence (from high to low) Collation-related bracket symbols [==] [::] [..]
Escaped characters \
special-characterBracket expression []
Grouping ()
Single-character-ERE duplication * +? {m,n}
Concatenation ab Anchoring ^ $
Alternation |
I am curious as to the reason for the first two levels being in that order. Being a unix user from way back, I am accustomed to being able to "throw a backslash in front of it" to escape virtually anything. But it appears that with Collation-Related-Bracket-Symbols (CRBS), I can't do that. If I want to match a literal [.ch.]
I can't just type \[.ch.]
and rely on "dot matches dot" to handle things for me. I now have to match something like [[].ch.]
(or possibly worse?).
I'm trying, and failing, to imagine what the scenario was when whoever-thought-this-up decided this should be the order. Is there a concrete scenario where having CRBS ranked higher than backslash makes sense, or was this a case of "we don't understand CRBS yet so let's make it higher priority" or... what, exactly?
At least for Gnu grep, it looks like lib/dfa.c treats the CRBS as one lexical token, as per the function parse_bracket_exp()
.
For the example given, escaping the special characters (square brackets and dots) seems to give the results you are looking for. You can also match literal dots with [.]
which might be easier to see in a regular expression.
$ (echo c;echo '[.ch.]';echo .ch.;echo xchx)|grep '\[\.ch\.\]'
[.ch.]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.