[英]Removing multi-line string from a filed in a file
我有一个如下的csv文件,它是由源系统发送的,除了添加列外,它们从末端没有任何处理机制:
1,"Bob Smith
531 Pennsylvania Avenue
Washington, DC",3,4,"qqqqzzzz"
5,"Bob Smith
531 Pennsylvania Avenue
Washington, DC",6,7,"qqqqzzzz"
预期产量:
1,"Bob Smith 531 Pennsylvania Avenue Washington, DC",3,4
5,"Bob Smith 531 Pennsylvania Avenue Washington, DC",6,7
我尝试了以下方法:
请求的源系统在每行末尾添加一个标识“ qqqqzzzz”
试图用空格替换所有新行,然后再次用新行替换所有qqqqzzzz
但是最后一次替换qqqqzzzz会导致新行替换为引号,如下所示:
1,"Bob Smith 531 Pennsylvania Avenue Washington, DC",3,4,""
5,"Bob Smith
sed '/^$/d' all.csv|tr '\n' ' '|sed 's/qqqqzzzz/\n/g' >results.csv
尝试使用命令后更新:
$ sed 'N;N;s/\n//g;s/,"qqqqzzzz"$//' quotetest.csv
1,"Bob Smith 531 Pennsylvania Avenue Washington, DC",3,4,"qqqqzzzz"
5,"Bob Smith 531 Pennsylvania Avenue Washington, DC",6,7
使用GNU awk:
$ awk 'BEGIN{RS=",\"qqqqzzzz\" ?\r?\n"}{$1=$1}1' file
1,"Bob Smith 531 Pennsylvania Avenue Washington, DC",3,4
5,"Bob Smith 531 Pennsylvania Avenue Washington, DC",6,7
经过dos和unix行尾的测试。 关键是使用标识符和相关的额外字符(逗号,条件空格和行尾字符)作为记录分隔符( RS
),问题是第一个标识符后面有空格,而第二个标识符后面没有空格。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.