How to extract data between a particular tag from a string in Perl?

Question

For example, from the following string

<?xml version="1.0"?><root><point><message>hello world 1</message></point><point><data><message>hello world 2</message></data></point></root>

if I want to extract message , the result should be

hello world 1
hello world 2

Is there an easy way to do this?

All I can think of is to first find out the position of and and then generate substrings in a loop. Is there a better way?

Answer 1

Your data is not XML, so I guess you'll have to use a regular expression for that:

perl -n -E'say $1 while m{<message>(.*?)</message>}g' your_file_here.xml

If your file was proper XML, then XML::Twig would work nicely. You could even use the xml_grep tool that comes with it to do just what you want.

update : with valid XML you can then do

xml_grep --text_only message mes.xml

or

xml_grep2 --text_only '//message' mes.xml # xml_grep2 is in App::xml_grep2

or

perl -MXML::Twig -E'XML::Twig->new( twig_handlers => 
                                      { message => sub { say $_->text; }, })
                             ->parsefile( "mes.xml")'

Answer 2

Use an XML parser. XML::Parser in Subs mode seems good enough.

Answer 3

Use an XML parser. I like XML::LibXML .

use strict;
use warnings;
use feature qw( say );

use XML::LibXML qw( );

my $xml = <<'__EOI__';
   <?xml version="1.0"?><root>
   <point><message>hello world 1</message></point>
   <point><data><message>hello world 2</message></data></point>
   </root>
__EOI__

my $parser = XML::LibXML->new();
my $doc    = $parser->parse_string($xml);
my $root   = $doc->documentElement();

say $_->textContent() for $root->findnodes('//message');

How to extract data between a particular tag from a string in Perl?

Question

3 answers

solution1
3 ACCPTED 2011-09-27 14:38:45

solution2
2 2011-09-27 14:11:12

solution3
1 2011-09-27 17:06:33

How to extract data between a particular tag from a string in Perl?

Question

3 answers

solution1 3 ACCPTED 2011-09-27 14:38:45

solution2 2 2011-09-27 14:11:12

solution3 1 2011-09-27 17:06:33

solution1
3 ACCPTED 2011-09-27 14:38:45

solution2
2 2011-09-27 14:11:12

solution3
1 2011-09-27 17:06:33