Why is POSIX collating-related bracketed symbol higher-precedence than backslash?

Question

POSIX, aka "The Open Group Base Specifications Issue 7, 2018 edition" , has this to say about regular expression operator precedence:

9.4.8 ERE Precedence

The order of precedence shall be as shown in the following table:

ERE Precedence (from high to low)

Collation-related bracket symbols [==] [::] [..]

Escaped characters \ special-character

Bracket expression []

Grouping ()

Single-character-ERE duplication * +? {m,n}

Concatenation ab

Anchoring ^ $

Alternation |

I am curious as to the reason for the first two levels being in that order. Being a unix user from way back, I am accustomed to being able to "throw a backslash in front of it" to escape virtually anything. But it appears that with Collation-Related-Bracket-Symbols (CRBS), I can't do that. If I want to match a literal [.ch.] I can't just type \[.ch.] and rely on "dot matches dot" to handle things for me. I now have to match something like [[].ch.] (or possibly worse?).

I'm trying, and failing, to imagine what the scenario was when whoever-thought-this-up decided this should be the order. Is there a concrete scenario where having CRBS ranked higher than backslash makes sense, or was this a case of "we don't understand CRBS yet so let's make it higher priority" or... what, exactly?

Answer 1

At least for Gnu grep, it looks like lib/dfa.c treats the CRBS as one lexical token, as per the function parse_bracket_exp() .

For the example given, escaping the special characters (square brackets and dots) seems to give the results you are looking for. You can also match literal dots with [.] which might be easier to see in a regular expression.

$ (echo c;echo '[.ch.]';echo .ch.;echo xchx)|grep '\[\.ch\.\]'
[.ch.]

Why is POSIX collating-related bracketed symbol higher-precedence than backslash?

Question

9.4.8 ERE Precedence

1 answers

solution1
1 2022-11-23 00:52:24

ERE Precedence (from high to low)
Collation-related bracket symbols	`[==] [::] [..]`
Escaped characters	`\` special-character
Bracket expression	`[]`
Grouping	`()`
Single-character-ERE duplication	`* +? {m,n}`
Concatenation	ab
Anchoring	`^ $`
Alternation	`\|`

Why is POSIX collating-related bracketed symbol higher-precedence than backslash?

Question

9.4.8 ERE Precedence

1 answers

solution1 1 2022-11-23 00:52:24

solution1
1 2022-11-23 00:52:24