简体   繁体   中英

how to sed spacial character if it come inside double quote in linux file

I have txt file delimited by comma (,) and each column quoted by double quote

what I want to do is: I need to keep the delimiter as comma but I want to remove each comma come into double pair quote (as each column around by double quote)

sample on input and output file I want

input file:

"2022111812160156601777153","","","false","test1",**"here the , issue , that comma comma come inside the column"**

the output as I want:

"2022111812160156601777153","","","false","test1",**"here the  issue  that comma comma come inside the column"**

what I try:

sed -i ':a' -e 's/\("[^"]*\),\([^"]*"\)/\1~\2/;ta' test.txt

but above sed command replace all comma not only the comma that come inside the column

is there are way to do it?

Using sed

$ sed -Ei.bak ':a;s/((^|,)(\*+)?"[^"]*),/\1/;ta' input_file
"2022111812160156601777153","","","false","test1",**"here the  issue  that comma comma come inside the column"**

Any time you find yourself using more than s , g , and p (with -n ) in sed you'd be better off using awk for some combination of clarity, robustness, efficiency, portability, etc.

Using any awk in any shell on every Unix box:

$ awk 'BEGIN{FS=OFS="\""} {for (i=2; i<=NF; i+=2) gsub(/,/,"",$i)} 1' file
"2022111812160156601777153","","","false","test1",**"here the  issue  that comma comma come inside the column"**

Just like GNU sed has -i as in your question to update the input file with the command's output, GNU awk has -i inplace , or just add > tmp && mv tmp file with any awk or any other Unix command.

This might work for you (GNU sed):

sed -E ':a;s/^(("[^",]*"\**,?\**)*"[^",]*),/\1/;ta' file

This iterates through each line removing any commas within paired double quoted fields.

NB The solution above also caters for double quoted field prefixed/suffixed by zero or * 's. If this should not be catered for, here is an ameliorated solution:

 sed -E ':a;s/^(("[^",]*",?)*"[^",]*),/\1/;ta' file

NB Escaped double quotes and commas would need a or more involved regexp.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM