简体   繁体   中英

Replace unembedded double quotes from specific tag of XML file using Batch script

I have CSV file with below data. I want to replace unembedded single " character with blank space only for comment tag. This tag can appear multiple times in a single record/line. I do not want to affect other tags and " character. File size is ~ 30MB .

ABCD ,
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>
<customerDetailsExtension xmlns=\"http://asdfg.net\">
<Comments>
<Comment><Date>2001-12-04</Date><AssociateID>12345</AssociateID>
<AssociateFirstName>ABC</AssociateFirstName>
<Comment>measurements: 34,28,37 height 5'4\". ABC</Comment>
<Priority>false</Priority><IsRead>false</IsRead>
</Comment>
<Comment>
<Date>2001-12-04</Date><AssociateID>12345</AssociateID><AssociateFirstName>ABC</AssociateFirstName>
<Comment>measurements: 32,24.5,34 height 5'3\". ABC</Comment><Priority>false</Priority><IsRead>false</IsRead>
</Comment>
<Comment><Date>2016-12-04</Date><AssociateID>12345</AssociateID><AssociateFirstName>ABC</AssociateFirstName>
<Comment>measurements: 32.5,26,36.5 height 5'5\"  ABC</Comment><Priority>false</Priority><IsRead>false</IsRead>
</Comment>
</Comments>
<EventDate>2017-06-10</EventDate>
</customerDetailsExtension>"

I dont have knowledge about Batch script . I tried below but it is not working.

@echo off

  for /f "delims=, tokens=2" %%A in (
    'findstr /r "<Comment>.*</Comment>" "D:\data.csv"'
  ) do (
    set code=%%A
    set code=!code:"=!
    echo(!code!
)

This should work for you

@echo off
setlocal EnableExtensions EnableDelayedExpansion 

>D:\data_new.csv (
  for /f "tokens=*" %%A in (D:\data.csv) do (
    set "code=%%A" & if /I "!code:~0,9!" EQU "<Comment>" set "code=!code:"=!"
    echo(!code!
  )
)  
rem remove the rem in next line to overwrite original file
rem copy /Y D:\data_new.csv D:\data.csv
exit/B

or

set "code=%%A" & if /I "!code:~0,9!" EQU "<Comment>" set "code=!code:\"=\!"

to avoid replacing another quotation marks

findstr is the wrong tool for the job for parsing XML, or for that matter CSV.

You have complicated examples of both, and - actually - probably need to csv-parse the CSV, and XML-parse the XML, if you want a solution that won't be brittle.

However, the fact that you're trying to remove escaped quotes in commments suggests that you're doing something else dirty, that's breaking because of quote parsing. I'd suggest first of all, reviewing what you're doing there, as this -may- be an XY problem.

Failing that though - I might do something like this:

#!/usr/bin/env perl
use strict;
use warnings;

use Text::ParseWords;
use XML::Twig;
use Data::Dumper;

sub fix_comment {
   my ( $twig, $comment ) = @_;


   my $text = $comment->text;
   $text =~ s/\"//g;
   $comment->set_text($text);

}

#extract quoted-comma separate things.

foreach my $entry (
   quotewords(
      ",", 0,
      do { local $/; <DATA> }
   )
  )
{

   if ( $entry =~ m/^\s*<\?xml/ms ) {
      $entry =~ s/^\s+//ms;

      #eval so we can fail gracefully if this doesn't work.
      my $twig = XML::Twig->new(
         pretty_print  => 'indented',
         twig_handlers => { 'Comment/Comment' => \&fix_comment }
      );
      eval { $twig->parse($entry) };
      if ($@) { warn $@ }
      else {
         $entry = $twig->sprint;
      }
   }
   print $entry;
}


__DATA__
DATA , " test ", 
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>
<customerDetailsExtension xmlns=\"http://asdfg.net\">
<Comments>
<Comment><Date>2001-12-04</Date><AssociateID>12345</AssociateID>
<AssociateFirstName>ABC</AssociateFirstName>
<Comment>measurements: 34,28,37 height 5'4\". ABC</Comment>
<Priority>false</Priority><IsRead>false</IsRead>
</Comment>
<Comment>
<Date>2001-12-04</Date><AssociateID>12345</AssociateID><AssociateFirstName>ABC</AssociateFirstName>
<Comment>measurements: 32,24.5,34 height 5'3\". ABC</Comment><Priority>false</Priority><IsRead>false</IsRead>
</Comment>
<Comment><Date>2016-12-04</Date><AssociateID>12345</AssociateID><AssociateFirstName>ABC</AssociateFirstName>
<Comment>measurements: 32.5,26,36.5 height 5'5\"  ABC</Comment><Priority>false</Priority><IsRead>false</IsRead>
</Comment>
</Comments>
<EventDate>2017-06-10</EventDate>
</customerDetailsExtension>", 

This isn't perfect really, because I'm not entirely sure I'm capturing your line feeds properly - Text::CSV might be a more appropriate solution to the problem. It's kinda hard to say.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM