[英]Perl decoding xml into hash
我需要解碼復雜的XML結構。 XML看起來像這樣:
<?xml version="1.0" encoding="ISO-8859-1"?>
<MainNode comment="foo">
<FirstMainBranch>
<Struct>
<String name="aStringValueUnderMainBranch" comment="Child node under first main branch"/>
<String name="anotherStringValueUnderMainBranch" comment="Child node under first main branch"/>
<Integer name="anIntegerValueUnderMainBranch" comment="Child node under first main branch"/>
<List name="aList" comment="According to me this node should be an array, it could contain one or more child elements">
<Struct comment="The node name means that, the child nodes are grouped, I think that the most appropriate structure here is hash.
The node itself doesn't have name attribute, which means that it only shows the type of the element">
<String name="first" comment="
Default Value: 0
"/>
<Long name="second" comment="
Default Value: 0
"/>
<Long name="third" comment="
Default Value: 0
"/>
</Struct>
</List>
<List name="secondList" comment="According to me this node should be array, it could contain one or more child elements">
<Struct comment="The node name means that, the child nodes are grouped, I think that the most appropriate structure here is hash.
The node itself doesn't have name attribute, which means that it only shows the type of the element
">
<String name="first" comment="
Default Value: 0
"/>
<Long name="second" comment="
Default Value: 0
"/>
</Struct>
</List>
<Struct name="namedStruct" comment="Here the struct element has a name, which means that it should be decoded
">
<List name="thirdList" comment="Again list, but now it is inside struct element, and it contains struct element
">
<Struct comment="The node name means that, the child nodes are grouped, I think that the most appropriate structure here is hash.">
<Integer name="first" comment="Child element of the struct"/>
</Struct>
</List>
</Struct>
</Struct>
</FirstMainBranch>
<SecondMainBranch>
<Struct comment="">
<Struct name="namedStructAgain" comment="
">
<String name="First" comment="
"/>
<String name="Second" comment=""/>
</Struct>
</Struct>
</SecondMainBranch>
</MainNode>
我認為最合適的容器是哈希(如果您的意見不同,請告訴我)。 我發現難以解碼它,因為:
主節點沒有“name”屬性,但它們應該存在於最終結構中
只有存在“name”屬性時才應讀取子節點,但它們的數據類型(結構)取決於未解碼的父元素。
其中一些父元素具有“name”屬性 - 在這種情況下,它們應該存在於最終結構中。
我不關心整數,長整數,日期時間等數據類型,它們將被讀作字符串。 這里的主要問題是List和Struct類型
這是我愚蠢地嘗試應對任務:
use XML::LibXML;
use Data::Dumper;
use strict;
use warnings;
my $parser=XML::LibXML->new();
my $file="c:\\joro\\Data.xml";
my $xmldoc=$parser->parse_file($file);
sub buildHash{
my $mainParentNode=$_[0];
my $mainHash=\%{$_[1]};
my ($waitNextNode,$isArray,$arrayNode);
$waitNextNode=0;
$isArray=0;
sub xmlStructure{
my $parentNode=$_[0];
my $href=\%{$_[1]};
my ($name, %tmp);
my $parentType=$parentNode->nodeName();
$name=$parentNode->findnodes('@name');
foreach my $currentNode($parentNode->findnodes('child::*')){
my $type=$currentNode->nodeName();
if ($type&&$type eq 'List'){
$isArray=1;
}
elsif($type&&$type ne 'List'&&$parentType ne 'List'){
$isArray=0;
$arrayNode=undef;
}
if ($type&&!$currentNode->findnodes('@name')&&$type eq 'Struct'){
$waitNextNode=1;
}
else{
$waitNextNode=0;
}
if ($type&&$type ne 'List'&&$type ne 'Struct'&&!$currentNode->findnodes('@name')){
#$href->{$currentNode->nodeName()}={};
xmlStructure($currentNode,$href->{$currentNode->nodeName()});
}
# elsif ($type&&$type eq 'List'&&$currentNode->findnodes('@name')){
# print "2\n";
# $href->{$currentNode->findnodes('@name')}=[];
# xmlStructure($currentNode,$href->{$currentNode->findnodes('@name')});
# }
elsif ($type&&$type ne 'List'&&$currentNode->findnodes('@name')&&$parentType eq 'List'){
push(@{$href->{$currentNode->findnodes('@name')}},$currentNode->findnodes('@name'));
xmlStructure($currentNode,$href->{$currentNode->findnodes('@name')});
}
# elsif ($type&&$type ne 'List'&&!$currentNode->findnodes('@name')&&$parentType eq 'List'){
# print "4\n";
# push(@{$$href->{$currentNode->findnodes('@name')}},{});
##print Dumper %{$arrayNode};
# xmlStructure($currentNode,$href->{$currentNode->findnodes('@name')});
# }
else{
xmlStructure($currentNode,$href->{$currentNode->findnodes('@name')});
}
}
}
xmlStructure($mainParentNode,$mainHash);
}
my %href;
buildHash($xmldoc->findnodes('*'),\%href);
print "Printing the real HASH\n";
print Dumper %href;
但還有很長的路要走,因為:1。鑰匙和價值之間有一個寄生蟲,可能是未定義的元素。 2.我找不到在需要的地方將數據類型從哈希更改為子數組的方法。
這是輸出:
$VAR1 = 'FirstMainBranch';
$VAR2 = {
'' => {
'aList' => {
'' => {
'third' => {},
'second' => {},
'first' => {}
}
},
'namedStruct' => {
'thirdList' => {
'' => {
'first' => {}
}
}
},
'anotherStringValueUnderMainBranch' => {},
'secondList' => {
'' => {
'second' => {},
'first' => {}
}
},
'aStringValueUnderMainBranch' => {},
'anIntegerValueUnderMainBranch' => {}
}
};
$VAR3 = 'SecondMainBranch';
$VAR4 = {
'' => {
'namedStructAgain' => {
'First' => {},
'Second' => {}
}
}
};
任何幫助將不勝感激。 先感謝您。
編輯:關於Sobrique的評論 - XY問題:
這是我要解析的示例字符串:
(1,2,"N/A",-1,"foo","bar",NULL,3,2016-03-18 08:12:00.000,2016-03-18 08:12:00.559,2016-03-18 08:12:00.520,0,0,NULL,"foo","123456789",{NULL,NULL,NULL,NULL,NULL,NULL,2016-04-17 11:59:59.999,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,null,NULL,NULL,NULL,NULL,3,0,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,T,0,NULL,NULL,NULL,"9876543210",NULL,"foo","0","bar","foo","a1820000264d979c","0,0",NULL,"foo","192.168.1.82","SOAP",NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL},{INPUT="bar"},{aStringValueUnderMainBranch="ET", aList[{"first", "second", "third"}, {"first", "second", "third"}], secondList[{"first", "second"}, {"first", "second"}],namedStruct{thirdList[{first},{first}]}},{namedStructAgain{"first", "second"}},NULL,NULL,NULL,NULL,NULL)
不知何故,我應該將所有值分開,然后確定這一部分:
{aStringValueUnderMainBranch="ET", aList[{"first", "second", "third"}, {"first", "second", "third"}], secondList[{"first", "second"}, {"first", "second"}],namedStruct{thirdList[{first},{first}]}}
作為FirstMainBranch並解析XML中顯示的相應值。 在那之后,我應該確定:
{namedStructAgain{"first", "second"}}
作為SecondMainBranch並獲得各自的值。 主要數據分離還存在一個額外的問題,當它們在括號之間時,我不應該記住逗號。
我會用另一種方法。 我不是將XML轉換為哈希,而是使用XML :: Rabbit將其映射到對象。 我寫了一篇關於如何使用它的完整工作示例的小文章 。
XML :: Rabbit具有一系列優點:
如果您的XML文件足夠小以便使用XPath和DOM,我發現這種方法非常簡潔且易於維護。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.