简体   繁体   English

Unix-使用终止符逗号删除内部双引号

[英]Unix - Remove internal double quote with terminator comma

Input file: 输入文件:

"1","2col",""3col " "
"2","2col"," "3c,ol     " "
"3","2col"," 3co,l"     
"4","2col","3co,l"
"5","2col",""3co,l      ""              "
"6","2col",""3c,ol ""3c,ol"""

Output file: 输出文件:

"1","2col","3col    "
"2","2col"," 3c,ol       "
"3","2col"," 3co,l"     
"4","2col","3co,l"
"5","2col","3co,l                       "
"6","2col","3c,ol 3c,ol"

Please help me to get the above Output using Unix command. 请帮助我使用Unix命令获得上述输出。 Please note the 3rd column is modified in the output, all internal Double Quotes have been removed. 请注意,输出中的第三栏已修改,所有内部双引号均已删除。

Comma is terminator. 逗号是终结符。 When comma is present between Double quote then it is not considered as terminator. 当双引号之间出现逗号时,则不将其视为终止符。 See 6th line and after 2nd comma, comma is present as a text between Double quote which is fine. 参见第六行,第二逗号后,逗号在双引号之间以文本形式出现,这很好。

What I have tried so far: 到目前为止我尝试过的是:

sed 's/""|/|/g'
sed -e "s/\"\"//g"
perl -pe 's/(?<!^)(?<!\,)"(?!\,)(?!$)/""/g'

Hypothesis (first and 2nd columns are "clean", they do not contain , for example) 假设(第一和第二列是“干净”的,它们不包含,例如)

Input: 输入:

"1","2col",""3col " "
"2","2col"," "3c,ol     " "
"3","2col"," 3co,l"     
"4","2col","3co,l"
"5","2col",""3co,l      ""              "
"6","2col",""3c,ol ""3c,ol"""

Command: 命令:

tr -d '"' < input | awk -F',' -v OFS=',' '{$1="\""$1"\"";$2="\""$2"\"";printf $1 OFS $2 OFS "\"";for(u=3;u<=NF;u++){if(u!=NF)printf $u OFS;else printf $u};printf "\"" RS}'

Output: 输出:

"1","2col","3col  "
"2","2col"," 3c,ol      "
"3","2col"," 3co,l     "
"4","2col","3co,l"
"5","2col","3co,l                    "
"6","2col","3c,ol 3c,ol"

Explanations: 说明:

  • tr -d '"' < input will remove all the " tr -d '"' < input将删除所有的"
  • | awk | awk pipe the output to awk | awk将输出传递给awk
  • -F',' -v OFS=',' input/output field separator defined as comma -F',' -v OFS=','输入/输出字段分隔符定义为逗号
  • you surround the first 2 columns with " by using $1="\\""$1"\\"";$2="\\""$2"\\""; and you print them printf $1 OFS $2 OFS "\\""; 您通过使用$1="\\""$1"\\"";$2="\\""$2"\\"";来将前两列用" printf $1 OFS $2 OFS "\\"";起来$1="\\""$1"\\"";$2="\\""$2"\\"";然后将它们打印出来printf $1 OFS $2 OFS "\\"";
  • for(u=3;u<=NF;u++){if(u!=NF)printf $u OFS;else printf $u};printf "\\"" RS} for the rest of the column you just append them back together and you add the last " at the end of the line. for(u=3;u<=NF;u++){if(u!=NF)printf $u OFS;else printf $u};printf "\\"" RS}对于其余的列,您只需将它们附加回去并在行尾添加最后一个"

For readability: 为了提高可读性:

'{
  $1="\""$1"\""
  $2="\""$2"\""
  printf $1 OFS $2 OFS "\""
  for(u=3;u<=NF;u++)
  {
    if(u!=NF)printf $u OFS
    else printf $u
  }
  printf "\"" RS
}'

Use the quotes for finding the first two fields and concatenate the other fields. 使用引号查找前两个字段,并串联其他字段。

awk -F '"' '
   BEGIN {q="\""}
   {printf "%s", q$2q$3q$4q$5q; for (i=6;i<=NF;i++) printf "%s", $i; print q}
   ' inputfile

EDIT: An alternative 编辑:替代

paste -d, <( cut -d"," -f1,2 < inputfile) \
          <( cut -d"," -f3-  < inputfile | sed 's/"//g;s/.*/"&"/')

EDIT: Another alternative 编辑:另一种选择

sed 's/old/new/g : Apply the replacement to all matches to the regexp sed s/old/new/number`: Only replace the numberth match of the regexp. sed 's/old/new/g :将替换项应用于正则表达式的所有匹配项sed s / old / new / number`:仅替换正则表达式的数字匹配项。 When you mix the g and number modifiers in GNU sed, the first matces are ignored, and then match and replace all matches. 在GNU sed中混合使用g和number修饰符时,第一个括号将被忽略,然后匹配并替换所有匹配项。
In this case: 在这种情况下:

sed -r 's/"//g6;s/$/"/' inputfile

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM