[英]Extract data from an XML file with XML::LibXML
我有一个这样的XML文件,其中包含数千个条目
<mediawiki>
<page>
<title>page1</title>
<revision>
<id>2621</id>
<parentid>6</parentid>
<timestamp>2005-10-09T01:00:18Z</timestamp>
<contributor>
<username>Chaos</username>
<id>2</id>
</contributor>
<model>wikitext</model>
<format>text/x-wiki</format>
<text xml:space="preserve">text1</text>
</revision>
</page>
<page>
<title>page2</title>
<ns>8</ns>
<id>7</id>
<revision>
<id>2619</id>
<parentid>2618</parentid>
<timestamp>2005-10-09T00:56:39Z</timestamp>
<contributor>
<username>Chaos</username>
<id>2</id>
</contributor>
<model>wikitext</model>
<format>text/x-wiki</format>
<text xml:space="preserve">text2</text>
</revision>
</page>
<page>
<title>page3</title>
<ns>8</ns>
<id>6</id>
<revision>
<id>2621</id>
<parentid>6</parentid>
<timestamp>2005-10-09T01:00:18Z</timestamp>
<contributor>
<username>Chaos</username>
<id>2</id>
</contributor>
<model>wikitext</model>
<format>text/x-wiki</format>
<text xml:space="preserve">text3</text>
</revision>
</page>
</mediawiki>
通过我的脚本,每个页面必须位于一个文本文件中,该文件的名称为标记<title>
的内容,并且包含<text xml:space="preserve"></text>
我的密码
my $filename = "pages.xml";
my $parser = XML::LibXML->new();
my $xmldoc = $parser->parse_file( $filename );
my $file;
foreach my $page ( $xmldoc->findnodes( '/mediawiki/page' ) ) {
foreach my $title ( $page->findnodes( '/mediawiki/page/title' ) ) {
foreach my $rev ( $page->findnodes( '/mediawiki/page/revision' ) ) {
foreach my $text ( $rev->findnodes( 'text/text()' ) ) {
$file = $title->to_literal();
my $newfile = "$file.txt";
open( my $out, '>:utf8', $newfile )
or die "Unable to open '$newfile' for write: $!";
my $texte = $text->data;
print $out "$text\n";
close $out;
}
}
}
}
问题是每个构造的文件都包含与最后一个标记<text xml:space="preserve"></text>
您的错误是将所有这些嵌套for
循环中,而不使用相对的XPath表达式
这应该做你想要的
use utf8;
use strict;
use warnings 'all';
use feature 'say';
STDOUT->autoflush;
use XML::LibXML;
my $filename = "pages.xml";
my $doc = XML::LibXML->load_xml( location => $filename );
for my $page ( $doc->findnodes('/mediawiki/page') ) {
my ($title) = $page->findnodes('title');
my $file = $title->textContent;
my ($rev_text) = $page->findnodes('revision/text');
my $text = $rev_text->textContent;
open my $fh, '>:utf8', $file
or die qq{Unable to open "$file" for output: $!};
print $fh "$text\n";
close $fh;
say qq{File "$file" written with "$text"};
}
File "page1" written with "text1"
File "page2" written with "text2"
File "page3" written with "text3"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.