简体   繁体   中英

awk/sed regex, extract a column that has the delimiter

I have a file with this format: two columns of numbers in the beginning and two columns of number in the end and one column in the middle which is the name but the name has a delimiter of space which mess things up.

Is there any kind of regex that I can take out the name column correctly. Is there anyway that i can use sed to replace (or remove) the space in that column so that I can take that out column out easily?

Example:

 1 2 name 3 4
 12 12 name1 name2 3 4
 12 12 name1 name2 name3 name4 3 4 
 3 4 name 3 4 

-- The output that I want to have is:

name 
name1_name2
name1_name2_name3_name4
name

Thanks,

Amir,

One solution using awk is:

cat foo | awk '{ for(i=3; i<=NF-3; i++) { printf $i "_"; } printf $i "\n";  }'

Here is the same thing using sed:

cat foo  | sed -e 's/^[0-9 ]*//g' -e 's/ [0-9 ]*$//g' -e 's/ /_/g'

POSIX compliant for clarity:

cat foo  | sed -e 's/^[[:digit:][:space:]]*//g' -e 's/[[:space:]]*[[:digit:][:space:]]*$//g' -e 's/ /_/g'
sed 's/^[0-9]\+ [0-9]\+ \(.*\) [0-9]\+ [0-9]\+$/\1/;s/ /_/g'

another awk way without looping

 awk 'BEGIN{OFS="_"}{$1=$2=$NF=$(NF-1)="";gsub(/__/,"")}1' yourFile

test :

kent$  cat t
 1 2 name 3 4
 12 12 name1 name2 3 4
 12 12 name1 name2 name3 name4 3 4 
 3 4 name 3 4 

kent$  awk 'BEGIN{OFS="_"}{$1=$2=$NF=$(NF-1)="";gsub(/__/,"")}1' t
name
name1_name2
name1_name2_name3_name4
name

Couple of Perl options

perl -lne  '/\d+ \d+ (.+) \d+ \d+/ and do {($_ = $1) =~ s/ /_/g; print}'
perl -lape  'for (1..2) {shift @F; pop @F}; $_ = join "_", @F'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM