For example, from the following string
<?xml version="1.0"?><root><point><message>hello world 1</message></point><point><data><message>hello world 2</message></data></point></root>
if I want to extract message
, the result should be
hello world 1
hello world 2
Is there an easy way to do this?
All I can think of is to first find out the position of and and then generate substrings in a loop. Is there a better way?
Your data is not XML, so I guess you'll have to use a regular expression for that:
perl -n -E'say $1 while m{<message>(.*?)</message>}g' your_file_here.xml
If your file was proper XML, then XML::Twig would work nicely. You could even use the xml_grep
tool that comes with it to do just what you want.
update : with valid XML you can then do
xml_grep --text_only message mes.xml
or
xml_grep2 --text_only '//message' mes.xml # xml_grep2 is in App::xml_grep2
or
perl -MXML::Twig -E'XML::Twig->new( twig_handlers =>
{ message => sub { say $_->text; }, })
->parsefile( "mes.xml")'
Use an XML parser. XML::Parser
in Subs mode seems good enough.
Use an XML parser. I like XML::LibXML .
use strict;
use warnings;
use feature qw( say );
use XML::LibXML qw( );
my $xml = <<'__EOI__';
<?xml version="1.0"?><root>
<point><message>hello world 1</message></point>
<point><data><message>hello world 2</message></data></point>
</root>
__EOI__
my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($xml);
my $root = $doc->documentElement();
say $_->textContent() for $root->findnodes('//message');
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.