Why does sed match something outside the group as part of the group?

Question

I was trying to use sed recently to generate a bunch of methods from comma-and-newline separated enumeration members. I ran into the following behavior which seems unintuitive:

$ echo 'Hello,' | sed 's/\(.*\),\?/"Hi \1!"/g'
"Hi Hello,!"

Here I'm trying to capture everything before the comma into a group via \\(.*\\) , then I allow an optional comma with ,\\? . I expected this to replace \\1 with everything before the first comma, namely Hello , but for some reason the comma is getting included in the substitution too although it is not inside the group. Why is this the case?

Answer 1

Regular expressions do greedy matching (from left to right) by default, backtracking if the greediest match doesn't work. So in the case of \\(.*\\),\\? , the greediest match is to match Hello, to the \\(.*\\) and nothing to the ,\\? .

I'm not sure how to do non-greedy matching in basic regular expressions (which is what sed uses). In Perl-style regular expressions (not used by sed ), you put a question mark after the matching operator, so you'd use something like (.*?),? .

The next best thing you can do is to use something like \\([^,]*\\),\\? , but then it'd stop matching at the first comma it sees.

Answer 2

That's because sed Regex is greedy and the ? quantifier means to match 0 or 1 of the preceding token -- , in this case.

So, here the engine greedily matches till the end, and as the ? is made optional by ? , it is being included too within the captured group (.*) .

To get the desired behavior, drop ? :

%  echo 'Hello,' | sed 's/\(.*\),\?/"Hi \1!"/g'
"Hi Hello,!"

%  echo 'Hello,' | sed 's/\(.*\),/"Hi \1!"/g' 
"Hi Hello!"

Why does sed match something outside the group as part of the group?

Question

2 answers

solution1
1 ACCPTED 2017-02-09 03:51:19

solution2
1 2017-02-09 03:53:01

Why does sed match something outside the group as part of the group?

Question

2 answers

solution1 1 ACCPTED 2017-02-09 03:51:19

solution2 1 2017-02-09 03:53:01

solution1
1 ACCPTED 2017-02-09 03:51:19

solution2
1 2017-02-09 03:53:01