简体   繁体   English

如何在sed中提取嵌套括号?

[英]How to extract nested parentheses in sed?

I am trying to extract whitespace separated columns with sed .我正在尝试使用sed提取空格分隔的列。 Here is an example with ps :这是一个带有ps的示例:

$ ps | sed -n -E "s/^(\s*([^\s]+)){4}.*$/\0/p"
  PID TTY          TIME CMD
 8446 pts/185  00:00:00 ps
 8447 pts/185  00:00:00 sed
54326 pts/185  00:00:00 bash
$ ps | sed -n -E "s/^(\s*([^\s]+)){4}.*$/\1/p"
D
t
t
t

Why it does this way?为什么会这样? How to specify nested parentheses?如何指定嵌套括号?


I would like to get column of PIDs (in this example).我想获取 PID 列(在本例中)。


I found that I can't process non-nested parentheses either:我发现我也无法处理非嵌套括号:

$ ps > out.txt
$ cat out.txt
  PID TTY          TIME CMD
14819 pts/185  00:00:00 ps
54326 pts/185  00:00:00 bash
$ cat out.txt | sed -n -E "s/^\s*([^\s]+)\s*([^\s]+)\s*([^\s]+)\s*([^\s]+).*$/\2/p"
C


$ 

In last case it prints line with C and 2 emptyy lines.在最后一种情况下,它打印带有C和 2 个空行的行。

Why???为什么???

Suppose the raw file is假设原始文件是

a1  a2 a3 a4
b1 b2 b3 b4
c1  c2 c3 c4
d1 d2 d3 d4

(If there is leading whitespace, remove it in a separate operation, 's/^ *//' ) (如果有前导空格,请在单独的操作中将其删除, 's/^ *//'

Without extended regular expressions, you can do this:没有扩展的正则表达式,你可以这样做:

sed 's/\([^ ][^ ]* *\)\{3\}.*/\1/'

which will yield这将产生

a3
b3
c3
d3

Extended regular expressions might make this a little cleaner, but not all implementations support backreferences, so the logic would be a little more complicated.扩展的正则表达式可能会使这更清晰一些,但并非所有实现都支持反向引用,因此逻辑会更复杂一些。

First, please avoid double quotes unless you want the shell to interpret it (see https://mywiki.wooledge.org/Quotes )首先,请避免使用双引号,除非您希望 shell 对其进行解释(请参阅https://mywiki.wooledge.org/Quotes

awk is better suited for field processing, but I'll try to provide a sed solution with explanations (assuming GNU sed as \\s is used) awk更适合现场处理,但我会尝试提供带有解释的sed解决方案(假设使用GNU sed作为\\s

$ sed -n -E 's/^(\s*([^\s]+)){4}.*$/\1/p' ip.txt
D
t
t
t
  • ^ start of line anchor ^线锚的开始
  • [^\\s] this won't work as you wanted, it will match other than \\ and s characters. [^\\s]这不会如你所愿,它会匹配\\s以外s字符。 \\s , \\S , \\w and \\W are not recognized by sed inside character classes, in this case you can simply use \\S though \\s , \\S , \\w\\W不被字符类中的sed识别,在这种情况下,您可以简单地使用\\S
  • (\\s*([^\\s]+)) you probably intended to capture only the field value by using two capture groups (\\s*([^\\s]+))您可能打算使用两个捕获组仅捕获字段值
  • {4} however, when quantifier is used, only the last match will be available for backreferencing, other matches is overridden. {4}但是,当使用量词时,只有最后一个匹配项可用于反向引用,其他匹配项将被覆盖。 (further reading: https://www.regular-expressions.info/captureall.html ) (进一步阅读: https : //www.regular-expressions.info/captureall.html
  • because of \\s* string like CMD matched as multiple fields in above case因为像CMD这样的\\s*字符串在上述情况下匹配为多个字段
  • also, not sure why you are using -n and p instead of leaving them out另外,不确定为什么要使用-np而不是将它们排除在外

To get specific column, I'd use:要获取特定列,我会使用:

$ sed -E 's/^\s*(\S+).*/\1/' ip.txt
PID
8446
8447
54326

$ sed -E 's/^\s*\S+\s+(\S+).*/\1/' ip.txt
TTY
pts/185
pts/185
pts/185

$ sed -E 's/^\s*\S+\s+\S+\s+(\S+).*/\1/' ip.txt
TIME
00:00:00
00:00:00
00:00:00

Which gives us the following generic formula:这给了我们以下通用公式:

$ sed -E 's/^\s*(\S+\s+){0}(\S+).*/\2/' ip.txt
PID
8446
8447
54326
$ sed -E 's/^\s*(\S+\s+){1}(\S+).*/\2/' ip.txt
TTY
pts/185
pts/185
pts/185

This might work for you (GNU sed):这可能对你有用(GNU sed):

sed -nE 's/\S+/\n&\n/1;s/.*\n(.*)\n.*/\1/p' file

This surrounds the nth column (in this example column 1) by newlines then uses pattern matching to remove the fields and newlines either side.这用换行符包围第 n 列(在此示例中为第 1 列),然后使用模式匹配删除任一侧的字段和换行符。

Alternatively:或者:

sed -nE 's/^(\s*(\S+)){4}.*/\2/p' file

This will return the 4th field.这将返回第 4 个字段。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM