简体   繁体   中英

awk regular expression in RS

My file is like this

A0010 A R G 222
ALBXXXXXLE DRIVE - NO N1 Y 2 C 1 0
A R G BOBBY BEARD 1 NC N N 0 0.00
 AERXXXX 0.00
 NC 22211 

A0013 

A & A SERVICE CENTER P O BOX 113 - NO N1 Y 2 C 1 0

A & A SERVICE CENTER 1 NC N Y 0 0.00

HARRELLSVILLE 0.00
 NC 27942 

A0016 A HOME GARDEN SHOP 111 E MAIN STREET 111-111-1110 NO N1 Y 2 U 1 0
 HOME GARDEN SHOP PAM 1 NC N Y 0 0.00
 AERBDER 0.00
 NC 24520 

A0039 XXXXXXX HILL APTS. P.O. BOX 604 222-7111 NO N1 Y 2 U 1 0
 XXXXXXX HILL APTS. TXXXMAN MORRIS 1 NC Y Y 0 0.00
 AERBDER 0.00
 NC 27510 

I want to separate each record using the first column A0010, A0013, A0016, A0039 and load into database. I tried using awk, but it took only the first matching as record separate.

cat temp1 | gawk 'BEGIN {RS="^[A-Z][0-9][0-9][0-9][0-9]";} {print NR,"and RT=" RT}' | sed -e 's/ \+/ /g'

o/p

1 and RT=A0010

2 and RT=

It is not taking the 2nd match. Please help

Replace your awk command with the following:

cat temp1 | awk 'BEGIN {RS="[A-Z][0-9][0-9][0-9][0-9]";} {print NR,"and RT=" RT}'

The ^ is causing your problem.

Edit (based on the comments):

If the pattern occurs at the beginning and in the middle of lines:

 grep -E "^[A-Z][0-9]{3}" temp1 | gawk 'BEGIN {RS="[A-Z][0-9][0-9][0-9][0-9]";} {print NR,"and RT=" RT}'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM