简体   繁体   English

如何使用awk在bash中格式化电话号码

[英]How to format phone numbers in bash with awk

I'm coding a new script in bash to format phone number to a french standard. 我正在用bash编码一个新脚本,以将电话号码格式化为法语标准。 Almost everything is done, but I don't know how to change values in a CSV files. 几乎所有操作都已完成,但是我不知道如何更改CSV文件中的值。

  1. Specifications : 规格:

    • Delete all not numbers caracters ( except "+" if is in first position) 删除所有非数字字符(“ +”除外,如果在第一位)
    • Substitutions : 替代:
      • 06xxx -> +336xxx 06xxx-> + 336xxx
      • 07xxx -> +337xxx 07xxx-> + 337xxx
      • +3306xxx -> +336xxx + 3306xxx-> + 336xxx
      • +3307xxx -> +337xxx + 3307xxx-> + 337xxx
  2. Sample Data (admitting data will be in the third column of my csv file, with | separators) : 样本数据(允许的数据将在我的csv文件的第三列中,带有|分隔符):

     ||0612345678| ||+33612345678f| ||+33712345678| ||+330612345678| ||+330712345678| ||06.12.34.56.78| ||06 12 34 56 78| ||06d12d34.h*56-78| ||+2258475| ||+65823| 
  3. Expected result: 预期结果:

     ||+33612345678| ||+33612345678| ||+33712345678| ||+33612345678| ||+33712345678| ||+33612345678| ||+33612345678| ||+33612345678| ||+2258475| ||+65823| 
  4. Current State 当前状态

I tried to make this with sed. 我试图用sed做到这一点。 It's actually working with these expressions : 它实际上正在使用以下表达式:

    sed -e "s/\b[^0-9]//g" sample > test
    sed -e "s/[a-z]//g" test > test2
    sed -e "s/\b[^0-9]//g" test2 > test3
    sed -e "s/^06/+336/g" test3 > test4
    sed -e "s/^07/+337/g" test4 > test5
    sed -e "s/^+3306/+336/g" test5 > test6
    sed -e "s/^+3307/+337/g" result

BUT I don't know how to make substitution in my CSV file, only on the third column. 但是我不知道如何在CSV文件中进行替换,仅在第三列上。

Then, I tried with awk : 然后,我尝试了awk:

    awk '
    BEGIN {print substr($1,2); }
    {FS=OFS="|"} 
    {   
        gsub("\b[^0-9]","",$1);
        gsub("[a-z]","",$1);
        gsub("\b[^0-9]","",$1);
        gsub("^06","+336",$1);
        gsub("^07","+337",$1);
        gsub("^+3306","+336",$1);
        gsub("^+3307","+337",$1)
    } 1
    ' sample

but awk don't understand all the regex expressions. 但是awk不能理解所有的正则表达式。 The result when using awk : 使用awk时的结果:

    +33612345678|
    +33612345678|
    +33712345678|
    +33612345678|
    +33712345678|
    +336.12.34.56.78|
    +336 12 34 56 78|
    +3361234.*56-78|
    +2258475|
    +65823|

I would like use my regex expressions directly in my csv files, advice will be much appreciated! 我想直接在我的csv文件中使用我的正则表达式,请多多指教!

Sounds like this is all you need: 听起来这就是您所需要的:

$ cat tst.awk
BEGIN { FS=OFS="|" }
$3 != "" {
    gsub(/[^0-9]+/,"",$3)
    sub(/^(33)?06/,"336",$3)
    sub(/^(33)?07/,"337",$3)
    $3 = "+" $3
}
{ print }

$ awk -f tst.awk file
||+33612345678|
||+33612345678|
||+33712345678|
||+33612345678|
||+33712345678|
||+33612345678|
||+33612345678|
||+33612345678|
||+2258475|
||+65823|

I can get you a little closer. 我可以拉近你。 I found a couple of mistakes with your awk script that should be corrected before making more progress. 我发现您的awk脚本有两个错误,在取得更多进展之前,应先纠正它们。 First, the BEGIN statement looks to be in error. 首先,BEGIN语句看起来有误。 Rather than print substr($1,2), it should just set the IFS and OFS. 而不是打印substr($ 1,2),它应该只设置IFS和OFS。 As you probably already know, BEGIN only gets executed once. 您可能已经知道,BEGIN仅执行一次。

Also, once the IFS is set to pipe '|', you'll need to modify the third field in each input line. 同样,将IFS设置为管道'|'之后,您将需要修改每条输入行中的第三个字段。 Thus, the target param for all your gsub calls should be $3, not $1. 因此,所有gsub调用的目标参数应该为$ 3,而不是$ 1。

Well, that's all I got for you. 好吧,这就是我为您准备的。 I suspect the remainder of the issues I'm seeing with your output not matching the expected results is do to the reason you mention - different regexp handling. 我怀疑我看到的其余问题与您的输出与预期结果不符是您提到的原因-不同的regexp处理。

awk '
    BEGIN {FS=OFS="|"} 
    {   
        gsub("\b[^0-9]","",$3);
        gsub("[a-z]","",$3);
        gsub("\b[^0-9]","",$3);
        gsub("^06","+336",$3);
        gsub("^07","+337",$3);
        gsub("^+3306","+336",$3);
        gsub("^+3307","+337",$3)
    } 
    1
' sample

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM