简体   繁体   中英

How to format phone numbers in bash with awk

I'm coding a new script in bash to format phone number to a french standard. Almost everything is done, but I don't know how to change values in a CSV files.

  1. Specifications :

    • Delete all not numbers caracters ( except "+" if is in first position)
    • Substitutions :
      • 06xxx -> +336xxx
      • 07xxx -> +337xxx
      • +3306xxx -> +336xxx
      • +3307xxx -> +337xxx
  2. Sample Data (admitting data will be in the third column of my csv file, with | separators) :

     ||0612345678| ||+33612345678f| ||+33712345678| ||+330612345678| ||+330712345678| ||06.12.34.56.78| ||06 12 34 56 78| ||06d12d34.h*56-78| ||+2258475| ||+65823| 
  3. Expected result:

     ||+33612345678| ||+33612345678| ||+33712345678| ||+33612345678| ||+33712345678| ||+33612345678| ||+33612345678| ||+33612345678| ||+2258475| ||+65823| 
  4. Current State

I tried to make this with sed. It's actually working with these expressions :

    sed -e "s/\b[^0-9]//g" sample > test
    sed -e "s/[a-z]//g" test > test2
    sed -e "s/\b[^0-9]//g" test2 > test3
    sed -e "s/^06/+336/g" test3 > test4
    sed -e "s/^07/+337/g" test4 > test5
    sed -e "s/^+3306/+336/g" test5 > test6
    sed -e "s/^+3307/+337/g" result

BUT I don't know how to make substitution in my CSV file, only on the third column.

Then, I tried with awk :

    awk '
    BEGIN {print substr($1,2); }
    {FS=OFS="|"} 
    {   
        gsub("\b[^0-9]","",$1);
        gsub("[a-z]","",$1);
        gsub("\b[^0-9]","",$1);
        gsub("^06","+336",$1);
        gsub("^07","+337",$1);
        gsub("^+3306","+336",$1);
        gsub("^+3307","+337",$1)
    } 1
    ' sample

but awk don't understand all the regex expressions. The result when using awk :

    +33612345678|
    +33612345678|
    +33712345678|
    +33612345678|
    +33712345678|
    +336.12.34.56.78|
    +336 12 34 56 78|
    +3361234.*56-78|
    +2258475|
    +65823|

I would like use my regex expressions directly in my csv files, advice will be much appreciated!

Sounds like this is all you need:

$ cat tst.awk
BEGIN { FS=OFS="|" }
$3 != "" {
    gsub(/[^0-9]+/,"",$3)
    sub(/^(33)?06/,"336",$3)
    sub(/^(33)?07/,"337",$3)
    $3 = "+" $3
}
{ print }

$ awk -f tst.awk file
||+33612345678|
||+33612345678|
||+33712345678|
||+33612345678|
||+33712345678|
||+33612345678|
||+33612345678|
||+33612345678|
||+2258475|
||+65823|

I can get you a little closer. I found a couple of mistakes with your awk script that should be corrected before making more progress. First, the BEGIN statement looks to be in error. Rather than print substr($1,2), it should just set the IFS and OFS. As you probably already know, BEGIN only gets executed once.

Also, once the IFS is set to pipe '|', you'll need to modify the third field in each input line. Thus, the target param for all your gsub calls should be $3, not $1.

Well, that's all I got for you. I suspect the remainder of the issues I'm seeing with your output not matching the expected results is do to the reason you mention - different regexp handling.

awk '
    BEGIN {FS=OFS="|"} 
    {   
        gsub("\b[^0-9]","",$3);
        gsub("[a-z]","",$3);
        gsub("\b[^0-9]","",$3);
        gsub("^06","+336",$3);
        gsub("^07","+337",$3);
        gsub("^+3306","+336",$3);
        gsub("^+3307","+337",$3)
    } 
    1
' sample

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM