繁体   English   中英

CSV文件:对于双引号内的值,请用半冒号替换逗号并删除双引号

[英]CSV file: For values within double quotes, replace commas with semi colon and remove double quotes

我有一个格式的csv文件:

value1, value2, "some text in the, quotes, with commas and "nested quotes", some more text", value3, value4

我想用“;”替换第三字段最外引号内的逗号。 并删除内引号。 我曾尝试使用“ sed”,但是没有什么可以代替嵌套的引号。

您需要一个递归的正则表达式来匹配嵌套的引号,而更改引号和逗号的最简洁的方法是与Perl v5.14中可用的无损音译配合使用的表达式替换

像这样

use strict;
use warnings 'all';
use v5.14;

my $str = 'value1, value2, "some text in the, quotes, with commas and "nested quotes", some more text", value3, value4';

$str =~ s{ " ( (?: [^"]++ | (?R) )* ) " }{ $1 =~ tr/,"/;/dr }egx;

print $str, "\n";

产量

value1, value2, some text in the; quotes; with commas and nested quotes; some more text, value3, value4

可以这样做。
条件是被引号内的引号是偶数
用逗号作为字段分隔符。

请注意,如果csv不遵守上述条件,则不会保存任何内容,
它永远不会被解析。

(?:^|,)\\s*\\K"([^"]*(?:"[^"]*"[^"]*)+)"(?=\\s*(?:,|$))

格式:

 (?: ^ | , )
 \s* 
 \K 
 " 
 (                             # (1 start)
      [^"]* 
      (?:                           # Inner, even number of quotes

           "
           [^"]* 
           "
           [^"]* 
      )+
 )                             # (1 end)
 "    
 (?=
      \s* 
      (?: , | $ )
 )

Perl示例:

use strict;
use warnings;

my $data = 'value1, value2, "some text in the, quotes, with commas and "nested quotes", some more text", value3, value4';

sub innerRepl
{
    my ($in) = @_;
    return '"' . ($in =~ tr/,"/;/dr ) . '"';
}

$data =~ s/(?:^|,)\s*\K"([^"]*(?:"[^"]*"[^"]*)+)"(?=\s*(?:,|$))/ innerRepl( $1 ) /eg;

print $data;

输出:

value1, value2, "some text in the; quotes; with commas and nested quotes; some more text", value3, value4

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM