如果 sed 空格字符出现在 linux 文件中的双引号内，如何处理

Question

I have txt file delimited by comma (,) and each column quoted by double quote我有一个用逗号 (,) 分隔的 txt 文件，每列用双引号引用

what I want to do is: I need to keep the delimiter as comma but I want to remove each comma come into double pair quote (as each column around by double quote)我想要做的是：我需要将分隔符保留为逗号，但我想删除每个逗号进入双引号（因为每列都用双引号引起来）

sample on input and output file I want输入示例和我想要的 output 文件

input file:输入文件：

"2022111812160156601777153","","","false","test1",**"here the , issue , that comma comma come inside the column"**

the output as I want:我想要的 output：

"2022111812160156601777153","","","false","test1",**"here the  issue  that comma comma come inside the column"**

what I try:我尝试的是：

sed -i ':a' -e 's/\("[^"]*\),\([^"]*"\)/\1~\2/;ta' test.txt

but above sed command replace all comma not only the comma that come inside the column但上面的 sed 命令不仅替换了列内的逗号，还替换了所有逗号

is there are way to do it?有办法吗？

Answer 1

Using sed使用sed

$ sed -Ei.bak ':a;s/((^|,)(\*+)?"[^"]*),/\1/;ta' input_file
"2022111812160156601777153","","","false","test1",**"here the  issue  that comma comma come inside the column"**

Answer 2

Any time you find yourself using more than s , g , and p (with -n ) in sed you'd be better off using awk for some combination of clarity, robustness, efficiency, portability, etc.任何时候你发现自己在 sed 中使用的不仅仅是s 、 g和p （带-n ），你最好还是使用 awk 以获得清晰度、健壮性、效率、可移植性等的某种组合。

Using any awk in any shell on every Unix box:在每个 Unix 框上的任何 shell 中使用任何 awk：

$ awk 'BEGIN{FS=OFS="\""} {for (i=2; i<=NF; i+=2) gsub(/,/,"",$i)} 1' file
"2022111812160156601777153","","","false","test1",**"here the  issue  that comma comma come inside the column"**

Just like GNU sed has -i as in your question to update the input file with the command's output, GNU awk has -i inplace , or just add > tmp && mv tmp file with any awk or any other Unix command.就像 GNU sed 在您的问题中有-i使用命令的 output 更新输入文件一样，GNU awk 有-i inplace inplace ，或者只是添加> tmp && mv tmp file与任何 awk 或任何其他 Unix 命令。

Answer 3

This might work for you (GNU sed):这可能对你有用（GNU sed）：

sed -E ':a;s/^(("[^",]*"\**,?\**)*"[^",]*),/\1/;ta' file

This iterates through each line removing any commas within paired double quoted fields.这遍历每一行，删除成对的双引号字段中的任何逗号。

NB The solution above also caters for double quoted field prefixed/suffixed by zero or * 's.注意上面的解决方案也适用于以零或*为前缀/后缀的双引号字段。 If this should not be catered for, here is an ameliorated solution:如果不应该满足这一点，这里有一个改进的解决方案：

 sed -E ':a;s/^(("[^",]*",?)*"[^",]*),/\1/;ta' file

NB Escaped double quotes and commas would need a or more involved regexp. NB 转义双引号和逗号需要一个或多个涉及的正则表达式。

如果 sed 空格字符出现在 linux 文件中的双引号内，如何处理

问题描述

3 个解决方案

解决方案1
0 2023-01-13 11:42:00

解决方案2
0 2023-01-14 16:51:42

解决方案3
0 2023-01-15 01:12:40

如果 sed 空格字符出现在 linux 文件中的双引号内，如何处理

问题描述

3 个解决方案

解决方案1 0 2023-01-13 11:42:00

解决方案2 0 2023-01-14 16:51:42

解决方案3 0 2023-01-15 01:12:40

解决方案1
0 2023-01-13 11:42:00

解决方案2
0 2023-01-14 16:51:42

解决方案3
0 2023-01-15 01:12:40