简体   繁体   English

awk / sed正则表达式,提取包含定界符的列

[英]awk/sed regex, extract a column that has the delimiter

I have a file with this format: two columns of numbers in the beginning and two columns of number in the end and one column in the middle which is the name but the name has a delimiter of space which mess things up. 我有一个具有这种格式的文件:这是名称的开头两列数字,末尾两列数字中间是一列,但名称有一个空格分隔符,将事情弄乱了。

Is there any kind of regex that I can take out the name column correctly. 有什么我可以正确取出名称列的正则表达式。 Is there anyway that i can use sed to replace (or remove) the space in that column so that I can take that out column out easily? 无论如何,我可以使用sed替换(或删除)该列中的空间,以便我可以轻松地将该列取出来吗?

Example: 例:

 1 2 name 3 4
 12 12 name1 name2 3 4
 12 12 name1 name2 name3 name4 3 4 
 3 4 name 3 4 

-- The output that I want to have is: -我想要的输出是:

name 
name1_name2
name1_name2_name3_name4
name

Thanks, 谢谢,

Amir, 阿米尔,

One solution using awk is: 使用awk的一种解决方案是:

cat foo | awk '{ for(i=3; i<=NF-3; i++) { printf $i "_"; } printf $i "\n";  }'

Here is the same thing using sed: 这是使用sed的同一件事:

cat foo  | sed -e 's/^[0-9 ]*//g' -e 's/ [0-9 ]*$//g' -e 's/ /_/g'

POSIX compliant for clarity: 符合POSIX要求,以确保清晰:

cat foo  | sed -e 's/^[[:digit:][:space:]]*//g' -e 's/[[:space:]]*[[:digit:][:space:]]*$//g' -e 's/ /_/g'
sed 's/^[0-9]\+ [0-9]\+ \(.*\) [0-9]\+ [0-9]\+$/\1/;s/ /_/g'

another awk way without looping 没有循环的另一种awk方式

 awk 'BEGIN{OFS="_"}{$1=$2=$NF=$(NF-1)="";gsub(/__/,"")}1' yourFile

test : 测试

kent$  cat t
 1 2 name 3 4
 12 12 name1 name2 3 4
 12 12 name1 name2 name3 name4 3 4 
 3 4 name 3 4 

kent$  awk 'BEGIN{OFS="_"}{$1=$2=$NF=$(NF-1)="";gsub(/__/,"")}1' t
name
name1_name2
name1_name2_name3_name4
name

Couple of Perl options 几个Perl选项

perl -lne  '/\d+ \d+ (.+) \d+ \d+/ and do {($_ = $1) =~ s/ /_/g; print}'
perl -lape  'for (1..2) {shift @F; pop @F}; $_ = join "_", @F'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM