I'm coding a new script in bash to format phone number to a french standard. Almost everything is done, but I don't know how to change values in a CSV files.
Specifications :
Sample Data (admitting data will be in the third column of my csv file, with | separators) :
||0612345678| ||+33612345678f| ||+33712345678| ||+330612345678| ||+330712345678| ||06.12.34.56.78| ||06 12 34 56 78| ||06d12d34.h*56-78| ||+2258475| ||+65823|
Expected result:
||+33612345678| ||+33612345678| ||+33712345678| ||+33612345678| ||+33712345678| ||+33612345678| ||+33612345678| ||+33612345678| ||+2258475| ||+65823|
I tried to make this with sed. It's actually working with these expressions :
sed -e "s/\b[^0-9]//g" sample > test
sed -e "s/[a-z]//g" test > test2
sed -e "s/\b[^0-9]//g" test2 > test3
sed -e "s/^06/+336/g" test3 > test4
sed -e "s/^07/+337/g" test4 > test5
sed -e "s/^+3306/+336/g" test5 > test6
sed -e "s/^+3307/+337/g" result
BUT I don't know how to make substitution in my CSV file, only on the third column.
Then, I tried with awk :
awk '
BEGIN {print substr($1,2); }
{FS=OFS="|"}
{
gsub("\b[^0-9]","",$1);
gsub("[a-z]","",$1);
gsub("\b[^0-9]","",$1);
gsub("^06","+336",$1);
gsub("^07","+337",$1);
gsub("^+3306","+336",$1);
gsub("^+3307","+337",$1)
} 1
' sample
but awk don't understand all the regex expressions. The result when using awk :
+33612345678|
+33612345678|
+33712345678|
+33612345678|
+33712345678|
+336.12.34.56.78|
+336 12 34 56 78|
+3361234.*56-78|
+2258475|
+65823|
I would like use my regex expressions directly in my csv files, advice will be much appreciated!
Sounds like this is all you need:
$ cat tst.awk
BEGIN { FS=OFS="|" }
$3 != "" {
gsub(/[^0-9]+/,"",$3)
sub(/^(33)?06/,"336",$3)
sub(/^(33)?07/,"337",$3)
$3 = "+" $3
}
{ print }
$ awk -f tst.awk file
||+33612345678|
||+33612345678|
||+33712345678|
||+33612345678|
||+33712345678|
||+33612345678|
||+33612345678|
||+33612345678|
||+2258475|
||+65823|
I can get you a little closer. I found a couple of mistakes with your awk script that should be corrected before making more progress. First, the BEGIN statement looks to be in error. Rather than print substr($1,2), it should just set the IFS and OFS. As you probably already know, BEGIN only gets executed once.
Also, once the IFS is set to pipe '|', you'll need to modify the third field in each input line. Thus, the target param for all your gsub calls should be $3, not $1.
Well, that's all I got for you. I suspect the remainder of the issues I'm seeing with your output not matching the expected results is do to the reason you mention - different regexp handling.
awk '
BEGIN {FS=OFS="|"}
{
gsub("\b[^0-9]","",$3);
gsub("[a-z]","",$3);
gsub("\b[^0-9]","",$3);
gsub("^06","+336",$3);
gsub("^07","+337",$3);
gsub("^+3306","+336",$3);
gsub("^+3307","+337",$3)
}
1
' sample
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.