删除 csv 文件中间的换行符

Question

I need to clean a csv file looking like this:我需要清理一个如下所示的 csv 文件：

food;1;ZZ;"lipsum";NR
foobar;123;NA;"asking 
price";NR
foobar;5;NN;Random text;NN
moongoo;13;VV;"Any label";OO

Yes sometimes without double quote, but the new line occurs only with double quote fields.是的，有时没有双引号，但新行只出现在双引号字段中。 The issue happens only with 4th field.该问题仅发生在第 4 个字段中。

I work on a awk command and it's now what I have:我在 awk 命令上工作，现在它就是我所拥有的：

awk '{ if (substr($4,1,1) == "\"" && substr($4,length($4)) != "\"") gsub(/\n/," ");}' FS=";" input_file

This awk look if first char of the field is a double quote and if the last one isn't a double quote.如果字段的第一个字符是双引号并且最后一个字符不是双引号，这个 awk 会检查。 Then try to remove the new line but he clearly didn't removing it.然后尝试删除新行，但他显然没有删除它。

I think I miss a little "easy" thing but can't figure out what is it.我想我错过了一点“简单”的东西，但无法弄清楚它是什么。

Thanks for your help.谢谢你的帮助。

Answer 1

You may use this awk :你可以使用这个awk ：

awk -F';' -v ORS= '1; {print (NF==4 ? " " : "\n")}' file

food;1;ZZ;"lipsum";NR
foobar;123;NA;"asking price";NR
foobar;5;NN;Random text;NN
moongoo;13;VV;"Any label";OO

How it works:怎么运行的：

This command sets ORS to empty character initially.此命令最初将ORS设置为空字符。
Then for each line it prints full record.然后它为每一行打印完整记录。
Then it prints a space when NF == 4 otherwise it prints a line break.然后它在NF == 4时打印一个空格，否则它打印一个换行符。

Answer 2

Using GNU sed使用 GNU sed

$ sed -Ez 's/(;"[^"]*)\n/\1/g' input_file
food;1;ZZ;"lipsum";NR
foobar;123;NA;"asking price";NR
foobar;5;NN;Random text;NN
moongoo;13;VV;"Any label";OO

Answer 3

One idea for tweaking OP's current awk code:调整 OP 当前awk代码的一个想法：

awk -F';' '
{ if (substr($4,1,1) == "\"" && substr($4,length($4)) != "\"") {    # if we have an incomplete line then ...
     printf $0                                                      # printf, sans a "\n", will leave the cursor at the end of the current line
     next                                                           # skip to next line of input
  }
}
1                                                                   # otherwise print current line
' input_file

# or as a one-liner sans comments:

awk -F';' ' { if (substr($4,1,1) == "\"" && substr($4,length($4)) != "\"") { printf $0; next } } 1 ' input_file

This generates:这会产生：

food;1;ZZ;"lipsum";NR
foobar;123;NA;"asking price";NR
foobar;5;NN;Random text;NN
moongoo;13;VV;"Any label";OO

Answer 4

With GNU awk for RT :使用用于RT的 GNU awk：

$ awk -v RS='"' '!(NR%2){gsub(/\n/,"")} {ORS=RT} 1' file
food;1;ZZ;"lipsum";NR
foobar;123;NA;"asking price";NR
foobar;5;NN;Random text;NN
moongoo;13;VV;"Any label";OO

Answer 5

This might work for you (GNU sed):这可能对你有用（GNU sed）：

sed -E ':a;/^[^\"]*(\\.[^\"]*)*("[^\"]*(\\.[^"\]*)*"[^\"]*)*"[^\"]*(\\.[^"\]*)*$/{N;s/\n//;ta}' file

This matches any unbalanced double quotes (with or without escaped double quotes), appends the following line, removes the newline and repeats until the double quotes are balanced.这匹配任何不平衡的双引号（带或不带转义双引号），附加以下行，删除换行符并重复直到双引号平衡。

A simpler solution in which escaped double quotes are forgone:一个更简单的解决方案，其中放弃了转义双引号：

sed -E ':a;/^[^"]*("[^"]*"[^"]*)*"[^"]*$/{N;s/\n//;ta}' file

Answer 6

echo 'food;1;ZZ;"lipsum";NR
      foobar;123;NA;"asking
      price";NR
      foobar;5;NN;Random text;NN
      moongoo;13;VV;"Any label";OO' |

 mawk '(ORS = (_<ORS)==(NF % 2)? RS: _)^_' FS=';' | gcat -n

 1  food;1;ZZ;"lipsum";NR
 2  foobar;123;NA;"asking price";NR
 3  foobar;5;NN;Random text;NN
 4  moongoo;13;VV;"Any label";OO

删除 csv 文件中间的换行符

问题描述

6 个解决方案

解决方案1
2 已采纳 2022-12-23 15:12:16

解决方案2
1 2022-12-23 15:20:18

解决方案3
0 2022-12-23 15:12:06

解决方案4
0 2022-12-23 16:32:47

解决方案5
0 2022-12-24 16:54:39

解决方案6
0 2022-12-25 07:02:33

删除 csv 文件中间的换行符

问题描述

6 个解决方案

解决方案1 2 已采纳 2022-12-23 15:12:16

解决方案2 1 2022-12-23 15:20:18

解决方案3 0 2022-12-23 15:12:06

解决方案4 0 2022-12-23 16:32:47

解决方案5 0 2022-12-24 16:54:39

解决方案6 0 2022-12-25 07:02:33

解决方案1
2 已采纳 2022-12-23 15:12:16

解决方案2
1 2022-12-23 15:20:18

解决方案3
0 2022-12-23 15:12:06

解决方案4
0 2022-12-23 16:32:47

解决方案5
0 2022-12-24 16:54:39

解决方案6
0 2022-12-25 07:02:33