How to extract nested parentheses in sed?

Question

I am trying to extract whitespace separated columns with sed . Here is an example with ps :

$ ps | sed -n -E "s/^(\s*([^\s]+)){4}.*$/\0/p"
  PID TTY          TIME CMD
 8446 pts/185  00:00:00 ps
 8447 pts/185  00:00:00 sed
54326 pts/185  00:00:00 bash
$ ps | sed -n -E "s/^(\s*([^\s]+)){4}.*$/\1/p"
D
t
t
t

Why it does this way? How to specify nested parentheses?

I would like to get column of PIDs (in this example).

I found that I can't process non-nested parentheses either:

$ ps > out.txt
$ cat out.txt
  PID TTY          TIME CMD
14819 pts/185  00:00:00 ps
54326 pts/185  00:00:00 bash
$ cat out.txt | sed -n -E "s/^\s*([^\s]+)\s*([^\s]+)\s*([^\s]+)\s*([^\s]+).*$/\2/p"
C


$

In last case it prints line with C and 2 emptyy lines.

Why???

Answer 1

Suppose the raw file is

a1  a2 a3 a4
b1 b2 b3 b4
c1  c2 c3 c4
d1 d2 d3 d4

(If there is leading whitespace, remove it in a separate operation, 's/^ *//' )

Without extended regular expressions, you can do this:

sed 's/\([^ ][^ ]* *\)\{3\}.*/\1/'

which will yield

a3
b3
c3
d3

Extended regular expressions might make this a little cleaner, but not all implementations support backreferences, so the logic would be a little more complicated.

Answer 2

First, please avoid double quotes unless you want the shell to interpret it (see https://mywiki.wooledge.org/Quotes )

awk is better suited for field processing, but I'll try to provide a sed solution with explanations (assuming GNU sed as \\s is used)

$ sed -n -E 's/^(\s*([^\s]+)){4}.*$/\1/p' ip.txt
D
t
t
t

^ start of line anchor
[^\\s] this won't work as you wanted, it will match other than \\ and s characters. \\s , \\S , \\w and \\W are not recognized by sed inside character classes, in this case you can simply use \\S though
(\\s*([^\\s]+)) you probably intended to capture only the field value by using two capture groups
{4} however, when quantifier is used, only the last match will be available for backreferencing, other matches is overridden. (further reading: https://www.regular-expressions.info/captureall.html )
because of \\s* string like CMD matched as multiple fields in above case
also, not sure why you are using -n and p instead of leaving them out

To get specific column, I'd use:

$ sed -E 's/^\s*(\S+).*/\1/' ip.txt
PID
8446
8447
54326

$ sed -E 's/^\s*\S+\s+(\S+).*/\1/' ip.txt
TTY
pts/185
pts/185
pts/185

$ sed -E 's/^\s*\S+\s+\S+\s+(\S+).*/\1/' ip.txt
TIME
00:00:00
00:00:00
00:00:00

Which gives us the following generic formula:

$ sed -E 's/^\s*(\S+\s+){0}(\S+).*/\2/' ip.txt
PID
8446
8447
54326
$ sed -E 's/^\s*(\S+\s+){1}(\S+).*/\2/' ip.txt
TTY
pts/185
pts/185
pts/185

Answer 3

This might work for you (GNU sed):

sed -nE 's/\S+/\n&\n/1;s/.*\n(.*)\n.*/\1/p' file

This surrounds the nth column (in this example column 1) by newlines then uses pattern matching to remove the fields and newlines either side.

Alternatively:

sed -nE 's/^(\s*(\S+)){4}.*/\2/p' file

This will return the 4th field.

How to extract nested parentheses in sed?

Question

3 answers

solution1
0 2019-08-22 15:23:20

solution2
0 2019-08-22 15:46:16

solution3
0 2019-08-22 23:03:52

How to extract nested parentheses in sed?

Question

3 answers

solution1 0 2019-08-22 15:23:20

solution2 0 2019-08-22 15:46:16

solution3 0 2019-08-22 23:03:52

solution1
0 2019-08-22 15:23:20

solution2
0 2019-08-22 15:46:16

solution3
0 2019-08-22 23:03:52