[英]Replace unembedded double quotes from specific tag of XML file using Batch script
我有以下数据的CSV文件。 我只想将空白的未嵌入单字符替换为注释标记。此标记可以在单个记录/行中多次出现。我不想影响其他标记和字符。 文件大小约为30MB。
ABCD ,
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>
<customerDetailsExtension xmlns=\"http://asdfg.net\">
<Comments>
<Comment><Date>2001-12-04</Date><AssociateID>12345</AssociateID>
<AssociateFirstName>ABC</AssociateFirstName>
<Comment>measurements: 34,28,37 height 5'4\". ABC</Comment>
<Priority>false</Priority><IsRead>false</IsRead>
</Comment>
<Comment>
<Date>2001-12-04</Date><AssociateID>12345</AssociateID><AssociateFirstName>ABC</AssociateFirstName>
<Comment>measurements: 32,24.5,34 height 5'3\". ABC</Comment><Priority>false</Priority><IsRead>false</IsRead>
</Comment>
<Comment><Date>2016-12-04</Date><AssociateID>12345</AssociateID><AssociateFirstName>ABC</AssociateFirstName>
<Comment>measurements: 32.5,26,36.5 height 5'5\" ABC</Comment><Priority>false</Priority><IsRead>false</IsRead>
</Comment>
</Comments>
<EventDate>2017-06-10</EventDate>
</customerDetailsExtension>"
我不了解批处理脚本。 我在下面尝试过,但是没有用。
@echo off
for /f "delims=, tokens=2" %%A in (
'findstr /r "<Comment>.*</Comment>" "D:\data.csv"'
) do (
set code=%%A
set code=!code:"=!
echo(!code!
)
这应该为你工作
@echo off
setlocal EnableExtensions EnableDelayedExpansion
>D:\data_new.csv (
for /f "tokens=*" %%A in (D:\data.csv) do (
set "code=%%A" & if /I "!code:~0,9!" EQU "<Comment>" set "code=!code:"=!"
echo(!code!
)
)
rem remove the rem in next line to overwrite original file
rem copy /Y D:\data_new.csv D:\data.csv
exit/B
要么
set "code=%%A" & if /I "!code:~0,9!" EQU "<Comment>" set "code=!code:\"=\!"
避免替换另一个引号
findstr
是用于解析XML或CSV的错误工具。
您有两个复杂的示例,并且-实际上-如果您想要一个不易碎的解决方案,则可能需要csv解析CSV和XML解析XML。
但是,事实是您试图删除注释中的转义引号,这表明您在做其他肮脏的事情,这是因为引用解析而中断了。 我建议首先,回顾一下您在这里所做的事情,因为这可能是XY问题。
失败了-我可能会做这样的事情:
#!/usr/bin/env perl
use strict;
use warnings;
use Text::ParseWords;
use XML::Twig;
use Data::Dumper;
sub fix_comment {
my ( $twig, $comment ) = @_;
my $text = $comment->text;
$text =~ s/\"//g;
$comment->set_text($text);
}
#extract quoted-comma separate things.
foreach my $entry (
quotewords(
",", 0,
do { local $/; <DATA> }
)
)
{
if ( $entry =~ m/^\s*<\?xml/ms ) {
$entry =~ s/^\s+//ms;
#eval so we can fail gracefully if this doesn't work.
my $twig = XML::Twig->new(
pretty_print => 'indented',
twig_handlers => { 'Comment/Comment' => \&fix_comment }
);
eval { $twig->parse($entry) };
if ($@) { warn $@ }
else {
$entry = $twig->sprint;
}
}
print $entry;
}
__DATA__
DATA , " test ",
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>
<customerDetailsExtension xmlns=\"http://asdfg.net\">
<Comments>
<Comment><Date>2001-12-04</Date><AssociateID>12345</AssociateID>
<AssociateFirstName>ABC</AssociateFirstName>
<Comment>measurements: 34,28,37 height 5'4\". ABC</Comment>
<Priority>false</Priority><IsRead>false</IsRead>
</Comment>
<Comment>
<Date>2001-12-04</Date><AssociateID>12345</AssociateID><AssociateFirstName>ABC</AssociateFirstName>
<Comment>measurements: 32,24.5,34 height 5'3\". ABC</Comment><Priority>false</Priority><IsRead>false</IsRead>
</Comment>
<Comment><Date>2016-12-04</Date><AssociateID>12345</AssociateID><AssociateFirstName>ABC</AssociateFirstName>
<Comment>measurements: 32.5,26,36.5 height 5'5\" ABC</Comment><Priority>false</Priority><IsRead>false</IsRead>
</Comment>
</Comments>
<EventDate>2017-06-10</EventDate>
</customerDetailsExtension>",
这确实不是完美的,因为我不能完全确定是否正确捕获了换行符-Text Text::CSV
可能是解决该问题的更合适的方法。 很难说。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.