简体   繁体   中英

Using 'sed' or 'awk' for a data conversion task

I have data in this format (many lines like this):

TASK : Task 1  
TASK : Task 2  
TASK : Task 3  
OWNER : Emp 1  
OWNER : Emp 2  
OWNER : Emp 3  
Deadline : Monday  
Deadline : Tuesday  
Deadline : Wednesday  

This, I want to convert to:

TASK          OWNER           Deadline
Task 1       Emp 1            Monday  
Task 2       Emp 2            Tuesday  
Task 3       Emp 3            Wednesday

Even if I can just extract each column without the column header names it'd be good. I can add the column names manually afterwards.

Is there a way to do it using 'awk' or 'sed' ?

one way with awk:

 awk -F': *' '{i=NR%3;i=i?i:3;a[i]=a[i]?a[i]"\t"$2:$2}
              END{for(x=1;x<=length(a);x++)print a[x]}' file

it keeps the order, omits the header line:

kent$  cat f
TASK : Task 1  
TASK : Task 2  
TASK : Task 3  
OWNER : Emp 1  
OWNER : Emp 2  
OWNER : Emp 3  
Deadline : Monday  
Deadline : Tuesday  
Deadline : Wednesday 

kent$  awk -F': *' '{i=NR%3;i=i?i:3;a[i]=a[i]?a[i]"\t"$2:$2}END{for(x=1;x<=length(a);x++)print a[x]}' f
Task 1          Emp 1   Monday  
Task 2          Emp 2   Tuesday  
Task 3          Emp 3   Wednesday

explanation

 awk -F': *'             #":any <space>" as FS
 '{i=NR%3;i=i?i:3;       #take NR%3 in i, if i=0, set i=3. because
                         #we want the i=0 case at the end of the output
 a[i]=a[i]?a[i]"\t"$2:$2}#concatenate the 2nd column to an array
 END{for and print}' file#print the content of the array at the end

Header

we can save the header in a var h and print it out before go through the a (array) :

awk -F': *' '{...h=i==1?(h?h"\t"$1:$1):h;a[i]=..} 
END{print h;for...}' file

Here's a relatively nice awk version:

BEGIN {FS=" : ";OFS="\t"}

/^TASK/     {task [tpos++] = $2}
/^OWNER/    {owner[opos++] = $2}
/^Deadline/ {due  [dpos++] = $2}

END {
  print "TASK", "OWNER", "DEADLINE"
  for (i in task) {
    print task[i],owner[i],due[i]
  }
}

:)

It saves a line per block because it does not need the gsub() call as it is using the : as the delimiter. Store it in, lets say, test.awk and execute it as follows:

awk -f test.awk input.txt

Update :

The above command leads to unaligned output in the shell:

TASK    OWNER    DEADLINE
Task 1      Emp 1      Monday  
Task 2      Emp 2      Tuesday  
Task 3      Emp 3      Wednesday 

You can fix this using the column command:

awk -f test.awk input.txt | column -t -s $'\t'

Now the output looks clean:

TASK      OWNER    DEADLINE
Task 1    Emp 1    Monday  
Task 2    Emp 2    Tuesday  
Task 3    Emp 3    Wednesday 

Perl solution:

perl -le 'while (<>) {
              chomp;
              ($h, $t) = split / : /;
              $i++, push @{$ar[0]}, $h if $h ne $ar[0][-1];
              push @{$ar[$i]}, $t;
          };
          $" = "\t";
          print "@{$ar[0]}";
          print join $", map shift @{$ar[$_]}, 1 .. $#ar while @{$ar[1]}'

If the tabs didn't align the text nicely, I'd use Text::Table .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM