簡體   English   中英

處理雙引號內的逗號 + awk

[英]handling commas inside double quotes + awk

這是我的文件

$ cat -v test2
"Rec Open Date","MSISDN","IMEI","Data Volume (Bytes)","Device Manufacturer","Device Model","Product Description"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan"
"2015-10-06","592","620","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY  STD - TRIAL - #16"
"2015-10-06","007","290","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY PLUS - $0 -"
"2015-10-06","592","050","48836832","Apple Inc","Apple iPhone 5S (A1530)","Talk and Text Connect Flexi Plan"
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006","$29.95 Carryover Plan (1GB)"
"2015-10-06","409","720","113755347","Samsung Korea","Samsung SM-G360G","$29 CARRYOVER PLAN"
"2015-10-06","742","620","19840943","Apple Inc","Apple iPhone S (A1530)","PREPAY STD - $0 - #2"
"2015-10-06","387","180","0","HUAWEI Technologies Co Ltd","HUAWEI HUAWEI G526-L11","PREPAY STD - $1 - #4"

此命令在末尾添加一列

$ awk -F, -v OFS=, -v q='"' 'NR==1{$8=q"Data_Volume_MB"q} NR>1{$8=$4; gsub(/"/,"",$8); $8= q $8/(1024*1024)q}1' test2 | cat -v
"Rec Open Date","MSISDN","IMEI","Data Volume (Bytes)","Device Manufacturer","Device Model","Product Description","Data_Volume_MB"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan","0.131383"
"2015-10-06","592","620","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY  STD - TRIAL - #16","0"
"2015-10-06","007","290","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY PLUS - $0 -","0"
"2015-10-06","592","050","48836832","Apple Inc","Apple iPhone 5S (A1530)","Talk and Text Connect Flexi Plan","46.5744"
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,"0.139818",OPPO N5116,OPPO X9006","$29.95 Carryover Plan (1GB)"
"2015-10-06","409","720","113755347","Samsung Korea","Samsung SM-G360G","$29 CARRYOVER PLAN","108.486"
"2015-10-06","742","620","19840943","Apple Inc","Apple iPhone S (A1530)","PREPAY STD - $0 - #2","18.9218"
"2015-10-06","387","180","0","HUAWEI Technologies Co Ltd","HUAWEI HUAWEI G526-L11","PREPAY STD - $1 - #4","0"

我的問題是這條線

"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,"0.139818",OPPO N5116,OPPO X9006","$29.95 Carryover Plan (1GB)"

變成了這個

"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,"0.139818",OPPO N5116,OPPO X9006","$29.95 Carryover Plan (1GB)"

它在錯誤的地方有這個"0.139818" 結果並不像其他人那樣。 問題似乎是該列中雙引號括起來的逗號: "OPPO X9076,OPPO R6006,"0.139818",OPPO N5116,OPPO X9006"

實現這一目標的最佳方式是什么,或者是否有可能? 這就是我希望這條線的樣子,就像其他線一樣。

"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,OPPO N5116,OPPO X9006","$29.95 Carryover Plan (1GB)","0.139818"

也許我需要整理數據,尤其是在它進入 awk 之前。


EDIT1 答案解決了

將分隔符從 , 更改為 ; 並在末尾添加新列

$ sed 's/","/";"/g' < test2 | awk -F';' -v OFS=';' -v q='"' 'NR==1{$8=q"Data_Volume_MB"q} NR>1{n=$4; gsub(/"/,"",n); $8= q n/(1024*1024)q}1'
"Rec Open Date";"MSISDN";"IMEI";"Data Volume (Bytes)";"Device Manufacturer";"Device Model";"Product Description";"Data_Volume_MB"
"2015-10-06";"427";"060";"137765";"Samsung Korea";"Samsung SM-G900I";"$39 Plan";"0.131383"
"2015-10-06";"592";"620";"0";"Apple Inc";"Apple iPhone 6 (A1586)";"PREPAY  STD - TRIAL - #16";"0"
"2015-10-06";"007";"290";"0";"Apple Inc";"Apple iPhone 6 (A1586)";"PREPAY PLUS - $0 -";"0"
"2015-10-06";"592";"050";"48836832";"Apple Inc";"Apple iPhone 5S (A1530)";"Talk and Text Connect Flexi Plan";"46.5744"
"2016-04-27";"498";"220";"146610";"Guangdong Oppo Mobile Telecommunications Corp Ltd";"OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006";"$29.95 Carryover Plan (1GB)";"0.139818"
"2015-10-06";"409";"720";"113755347";"Samsung Korea";"Samsung SM-G360G";"$29 CARRYOVER PLAN";"108.486"
"2015-10-06";"742";"620";"19840943";"Apple Inc";"Apple iPhone S (A1530)";"PREPAY STD - $0 - #2";"18.9218"
"2015-10-06";"387";"180";"0";"HUAWEI Technologies Co Ltd";"HUAWEI HUAWEI G526-L11";"PREPAY STD - $1 - #4";"0"

將分隔符從 , 更改為 | 並在末尾添加新列

$ sed 's/","/"|"/g' < test2 | awk -F'|' -v OFS='|' -v q='"' 'NR==1{$8=q"Data_Volume_MB"q} NR>1{n=$4; gsub(/"/,"",n); $8= q n/(1024*1024)q}1'
"Rec Open Date"|"MSISDN"|"IMEI"|"Data Volume (Bytes)"|"Device Manufacturer"|"Device Model"|"Product Description"|"Data_Volume_MB"
"2015-10-06"|"427"|"060"|"137765"|"Samsung Korea"|"Samsung SM-G900I"|"$39 Plan"|"0.131383"
"2015-10-06"|"592"|"620"|"0"|"Apple Inc"|"Apple iPhone 6 (A1586)"|"PREPAY  STD - TRIAL - #16"|"0"
"2015-10-06"|"007"|"290"|"0"|"Apple Inc"|"Apple iPhone 6 (A1586)"|"PREPAY PLUS - $0 -"|"0"
"2015-10-06"|"592"|"050"|"48836832"|"Apple Inc"|"Apple iPhone 5S (A1530)"|"Talk and Text Connect Flexi Plan"|"46.5744"
"2016-04-27"|"498"|"220"|"146610"|"Guangdong Oppo Mobile Telecommunications Corp Ltd"|"OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006"|"$29.95 Carryover Plan (1GB)"|"0.139818"
"2015-10-06"|"409"|"720"|"113755347"|"Samsung Korea"|"Samsung SM-G360G"|"$29 CARRYOVER PLAN"|"108.486"
"2015-10-06"|"742"|"620"|"19840943"|"Apple Inc"|"Apple iPhone S (A1530)"|"PREPAY STD - $0 - #2"|"18.9218"
"2015-10-06"|"387"|"180"|"0"|"HUAWEI Technologies Co Ltd"|"HUAWEI HUAWEI G526-L11"|"PREPAY STD - $1 - #4"|"0"

將分隔符從 , 更改為 ; 並將其插入倒數第二列之前

$ sed 's/","/";"/g' < test2 | awk -F';' -v OFS=';' -v q='"' 'NR==1{$(NF-1)=q"Data_Volume_MB"q FS $(NF-1)} NR>1{n=$4; gsub(/"/,"",n); $(NF-1)= q n/(1024*1024)q FS $(NF-1)}1'
"Rec Open Date";"MSISDN";"IMEI";"Data Volume (Bytes)";"Device Manufacturer";"Data_Volume_MB";"Device Model";"Product Description"
"2015-10-06";"427";"060";"137765";"Samsung Korea";"0.131383";"Samsung SM-G900I";"$39 Plan"
"2015-10-06";"592";"620";"0";"Apple Inc";"0";"Apple iPhone 6 (A1586)";"PREPAY  STD - TRIAL - #16"
"2015-10-06";"007";"290";"0";"Apple Inc";"0";"Apple iPhone 6 (A1586)";"PREPAY PLUS - $0 -"
"2015-10-06";"592";"050";"48836832";"Apple Inc";"46.5744";"Apple iPhone 5S (A1530)";"Talk and Text Connect Flexi Plan"
"2016-04-27";"498";"220";"146610";"Guangdong Oppo Mobile Telecommunications Corp Ltd";"0.139818";"OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006";"$29.95 Carryover Plan (1GB)"
"2015-10-06";"409";"720";"113755347";"Samsung Korea";"108.486";"Samsung SM-G360G";"$29 CARRYOVER PLAN"
"2015-10-06";"742";"620";"19840943";"Apple Inc";"18.9218";"Apple iPhone S (A1530)";"PREPAY STD - $0 - #2"
"2015-10-06";"387";"180";"0";"HUAWEI Technologies Co Ltd";"0";"HUAWEI HUAWEI G526-L11";"PREPAY STD - $1 - #4"

我建議先更改您的字段分隔符,如下所示(這里我將其從,更改為| ):

sed 's/","/"|"/g' < test2 > newfile

然后在newfile上使用您的awk代碼。

當然,您可以將所有這些都放在一行中(我在這里沒有使用您的awk代碼,而只是以我自己的awk代碼為例):

sed 's/","/"|"/g' < test2 | awk 'BEGIN{FS="|"} {print  $1}'

為了回應 OP 評論,請務必這樣運行您的命令(注意我將-F,更改為-F"|"

    sed 's/","/"|"/g' < test2 | awk -F"|" -v OFS=, -v q='"' 'NR==1{$8=q"Data_Volume_MB"q} NR>1{$8=$4; gsub(/"/,"",$8); $8= q $8/(1024*1024)q}1'

使用您的數據,這是我的結果:

"Rec Open Date","MSISDN","IMEI","Data Volume (Bytes)","Device Manufacturer","Device Model","Product Description","Data_Volume_MB"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I","$39 Plan","0.131383"
"2015-10-06","592","620","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY  STD - TRIAL - #16","0"
"2015-10-06","007","290","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY PLUS - $0 -","0"
"2015-10-06","592","050","48836832","Apple Inc","Apple iPhone 5S (A1530)","Talk and Text Connect Flexi Plan","46.5744"
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006","$ Carryover Plan (1GB)","0.139818"
"2015-10-06","409","720","113755347","Samsung Korea","Samsung SM-G360G","$29 CARRYOVER PLAN","108.486"
"2015-10-06","742","620","19840943","Apple Inc","Apple iPhone S (A1530)","PREPAY STD - $0 - #2","18.9218"
"2015-10-06","387","180","0","HUAWEI Technologies Co Ltd","HUAWEI HUAWEI G526-L11","PREPAY STD - $1 - #4","0"

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM