简体   繁体   中英

How do I parse out a text file with AWK and fprint in BASH?

I have a sample.txt file as follows:

Name         City    ST Zip CTY
John Smith   BrooklynNY10050USA
Paul DavidsonQueens  NY10040USA
Michael SmithNY      NY10030USA
George HermanBronx   NY10020USA

Image of input (in case if upload doesn't show properly) Input

Desired output is into separate columns as shown below:

Desired Output

I tried this:

#!/bin/bash
awk '{printf "%13-s %-8s %-2s %-5s %-3s\n", $1, $2, $3, $4, $5}' sample.txt > new.txt

And it's unsuccessful with this result:

Name          City     ST Zip   CTY

John          Smith    BrooklynNY10050USA

Paul          DavidsonQueens NY10040USA

Michael       SmithNY  NY10030USA

George        HermanBronx NY10020USA

Would appreciate it if anyone could tweak this so the text file will be in delimited format as shown above. Thank you so much!!

With gawk you can set the input field widths in the BEGIN block:

$ gawk 'BEGIN { FIELDWIDTHS = "13 8 2 5 3" } { print $1, $2, $3, $4, $5 }' fw.txt
Name          City     ST  Zip  CTY
John Smith    Brooklyn NY 10050 USA
Paul Davidson Queens   NY 10040 USA
Michael Smith NY       NY 10030 USA
George Herman Bronx    NY 10020 USA

If your awk does not have FIELDWIDTHS , it's a bit tedious but you can use substr :

$ awk '{ print substr($0,1,13), substr($0,14,8), substr($0,22,2), substr($0,24,5), substr($0,29,3) }' fw.txt
Name          City     ST  Zip  CTY
John Smith    Brooklyn NY 10050 USA
Paul Davidson Queens   NY 10040 USA
Michael Smith NY       NY 10030 USA
George Herman Bronx    NY 10020 USA

您可以使用sed向特定位置插入空格:

 cat data.txt | sed -e 's#\(.\{13\}\)\(.*\)#\1 \2#g' | sed -e 's#\(.\{22\}\)\(.*\)#\1 \2#g' |sed -e '1s#\(.\{29\}\)\(.*\)#\1 \2#g' | sed -e '2,$s#\(.\{25\}\)\(.*\)#\1 \2#g' | sed -e 's#\(.\{31\}\)\(.*\)#\1 \2#g'

You can split the field lengths into an array then loop over $0 and gather the substrings in regular awk:

awk 'BEGIN {n=split("13 8 2 5 3",ar)} 
           {
             j=1
             s=""
             sep="\t" 
             for(i=1;i<n;i++) 
                 {s=s substr($0, j, ar[i]) sep; j+=ar[i]} 
             s=s substr($0, j, ar[i])
             print s
           }'   file

That uses a tab to delimit the fields, but you can also use a space if preferred.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM