![](/img/trans.png)
[英]Search and replace (escape) double quotes within double quotes in CSV values
[英]CSV file: For values within double quotes, replace commas with semi colon and remove double quotes
我有一个格式的csv文件:
value1, value2, "some text in the, quotes, with commas and "nested quotes", some more text", value3, value4
我想用“;”替换第三字段最外引号内的逗号。 并删除内引号。 我曾尝试使用“ sed”,但是没有什么可以代替嵌套的引号。
您需要一个递归的正则表达式来匹配嵌套的引号,而更改引号和逗号的最简洁的方法是与Perl v5.14中可用的无损音译配合使用的表达式替换
像这样
use strict;
use warnings 'all';
use v5.14;
my $str = 'value1, value2, "some text in the, quotes, with commas and "nested quotes", some more text", value3, value4';
$str =~ s{ " ( (?: [^"]++ | (?R) )* ) " }{ $1 =~ tr/,"/;/dr }egx;
print $str, "\n";
value1, value2, some text in the; quotes; with commas and nested quotes; some more text, value3, value4
可以这样做。
条件是被引号内的引号是偶数
用逗号作为字段分隔符。
请注意,如果csv不遵守上述条件,则不会保存任何内容,
它永远不会被解析。
(?:^|,)\\s*\\K"([^"]*(?:"[^"]*"[^"]*)+)"(?=\\s*(?:,|$))
格式:
(?: ^ | , )
\s*
\K
"
( # (1 start)
[^"]*
(?: # Inner, even number of quotes
"
[^"]*
"
[^"]*
)+
) # (1 end)
"
(?=
\s*
(?: , | $ )
)
Perl示例:
use strict;
use warnings;
my $data = 'value1, value2, "some text in the, quotes, with commas and "nested quotes", some more text", value3, value4';
sub innerRepl
{
my ($in) = @_;
return '"' . ($in =~ tr/,"/;/dr ) . '"';
}
$data =~ s/(?:^|,)\s*\K"([^"]*(?:"[^"]*"[^"]*)+)"(?=\s*(?:,|$))/ innerRepl( $1 ) /eg;
print $data;
输出:
value1, value2, "some text in the; quotes; with commas and nested quotes; some more text", value3, value4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.