简体   繁体   中英

CSV file: For values within double quotes, replace commas with semi colon and remove double quotes

I have a csv file of format:

value1, value2, "some text in the, quotes, with commas and "nested quotes", some more text", value3, value4

I want to replace the commas within the outermost quotes of third field with ';' and remove the inner quotes. I have tried using "sed" but nothing has helped to replace the nested quotes.

You need a recursive regex to match nested quotes, and the tidiest way to alter the quotes and commas is an expression substitution in concert with a non-destructive transliteration which became available in v5.14 of Perl

Like this

use strict;
use warnings 'all';
use v5.14;

my $str = 'value1, value2, "some text in the, quotes, with commas and "nested quotes", some more text", value3, value4';

$str =~ s{ " ( (?: [^"]++ | (?R) )* ) " }{ $1 =~ tr/,"/;/dr }egx;

print $str, "\n";

output

value1, value2, some text in the; quotes; with commas and nested quotes; some more text, value3, value4

Could do it like this.
The criteria is even number of quotes within quoted field that is surrounded
by comma's as a field separator.

Note that if the csv does not abide by the above criteria, nothing will save it,
it can never be parsed.

(?:^|,)\\s*\\K"([^"]*(?:"[^"]*"[^"]*)+)"(?=\\s*(?:,|$))

Formatted:

 (?: ^ | , )
 \s* 
 \K 
 " 
 (                             # (1 start)
      [^"]* 
      (?:                           # Inner, even number of quotes

           "
           [^"]* 
           "
           [^"]* 
      )+
 )                             # (1 end)
 "    
 (?=
      \s* 
      (?: , | $ )
 )

Perl sample:

use strict;
use warnings;

my $data = 'value1, value2, "some text in the, quotes, with commas and "nested quotes", some more text", value3, value4';

sub innerRepl
{
    my ($in) = @_;
    return '"' . ($in =~ tr/,"/;/dr ) . '"';
}

$data =~ s/(?:^|,)\s*\K"([^"]*(?:"[^"]*"[^"]*)+)"(?=\s*(?:,|$))/ innerRepl( $1 ) /eg;

print $data;

Output:

value1, value2, "some text in the; quotes; with commas and nested quotes; some more text", value3, value4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM