I am trying to extract whitespace separated columns with sed
. Here is an example with ps
:
$ ps | sed -n -E "s/^(\s*([^\s]+)){4}.*$/\0/p"
PID TTY TIME CMD
8446 pts/185 00:00:00 ps
8447 pts/185 00:00:00 sed
54326 pts/185 00:00:00 bash
$ ps | sed -n -E "s/^(\s*([^\s]+)){4}.*$/\1/p"
D
t
t
t
Why it does this way? How to specify nested parentheses?
I would like to get column of PIDs (in this example).
I found that I can't process non-nested parentheses either:
$ ps > out.txt
$ cat out.txt
PID TTY TIME CMD
14819 pts/185 00:00:00 ps
54326 pts/185 00:00:00 bash
$ cat out.txt | sed -n -E "s/^\s*([^\s]+)\s*([^\s]+)\s*([^\s]+)\s*([^\s]+).*$/\2/p"
C
$
In last case it prints line with C
and 2 emptyy lines.
Why???
Suppose the raw file is
a1 a2 a3 a4
b1 b2 b3 b4
c1 c2 c3 c4
d1 d2 d3 d4
(If there is leading whitespace, remove it in a separate operation, 's/^ *//'
)
Without extended regular expressions, you can do this:
sed 's/\([^ ][^ ]* *\)\{3\}.*/\1/'
which will yield
a3
b3
c3
d3
Extended regular expressions might make this a little cleaner, but not all implementations support backreferences, so the logic would be a little more complicated.
First, please avoid double quotes unless you want the shell to interpret it (see https://mywiki.wooledge.org/Quotes )
awk
is better suited for field processing, but I'll try to provide a sed
solution with explanations (assuming GNU sed
as \\s
is used)
$ sed -n -E 's/^(\s*([^\s]+)){4}.*$/\1/p' ip.txt
D
t
t
t
^
start of line anchor [^\\s]
this won't work as you wanted, it will match other than \\
and s
characters. \\s
, \\S
, \\w
and \\W
are not recognized by sed
inside character classes, in this case you can simply use \\S
though (\\s*([^\\s]+))
you probably intended to capture only the field value by using two capture groups {4}
however, when quantifier is used, only the last match will be available for backreferencing, other matches is overridden. (further reading: https://www.regular-expressions.info/captureall.html ) \\s*
string like CMD
matched as multiple fields in above case-n
and p
instead of leaving them outTo get specific column, I'd use:
$ sed -E 's/^\s*(\S+).*/\1/' ip.txt
PID
8446
8447
54326
$ sed -E 's/^\s*\S+\s+(\S+).*/\1/' ip.txt
TTY
pts/185
pts/185
pts/185
$ sed -E 's/^\s*\S+\s+\S+\s+(\S+).*/\1/' ip.txt
TIME
00:00:00
00:00:00
00:00:00
Which gives us the following generic formula:
$ sed -E 's/^\s*(\S+\s+){0}(\S+).*/\2/' ip.txt
PID
8446
8447
54326
$ sed -E 's/^\s*(\S+\s+){1}(\S+).*/\2/' ip.txt
TTY
pts/185
pts/185
pts/185
This might work for you (GNU sed):
sed -nE 's/\S+/\n&\n/1;s/.*\n(.*)\n.*/\1/p' file
This surrounds the nth column (in this example column 1) by newlines then uses pattern matching to remove the fields and newlines either side.
Alternatively:
sed -nE 's/^(\s*(\S+)){4}.*/\2/p' file
This will return the 4th field.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.