简体   繁体   中英

Split file using awk at pattern

Here is an example of the data that I have in a row in example.tsv:

somedata1:data1#||#somedata2:data2#||#somedata1:data3#||#somedata2:data4

I wanted to do two things:

  1. Split the data from the pattern '#||#' and write it to other file. The number of columns after splitting is not fixed. I have tried the awk command:

    awk -F"#\\|\\|#" '{print;}' example.tsv > splitted.tsv

    Output of the first file should be:

    column 1 somedata1:data1 somedata2:data2 somedata1:data3 somedata2:data4

  2. Next I want split the data in splitted.tsv based on the ':' .

    somedata1 data1 data3 And write it to a file. Is there a way we could do this in a single awk command?

You need to escape the | correctly. Then use split

awk -F'#\\|\\|#' '{split($2,a,":");print a[2]}' file
data2

To print all data out in a table:

awk -F'#\\|\\|#' '{for (i=1;i<=NF;i++) print $i}' file
somedata:data1
somedata:data2
somedata:data3
somedata:data1

To split the data even more:

awk -F'#\\|\\|#' '{for (i=1;i<=NF;i++) {split($i,a,":");print a[1],a[2]}}' file
somedata data1
somedata data2
somedata data3
somedata data1

For the first split, you could try

$ awk 'BEGIN{print "column1"}{gsub(/#\|\|#/,"\n"); print }' file 
column1
somedata:data1
somedata:data2
somedata:data3
somedata:data1

To then split on : , you could do:

$ awk -F: 'BEGIN{print "column1","column2"}
                {gsub(/#\|\|#/,"\n"); gsub(/:/," ");print }' file
column1 column2
somedata data1
somedata data2
somedata data3
somedata data1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM