简体   繁体   English

使用domDocument在PHP中进行XML解析

[英]XML Parsing in PHP with domDocument

I have a Xml which looks like 我有一个看起来像的Xml

<theme>
<name>Test</name>
<thumb>http://ecample.com/bla.jpg</thumb>;
<template>
<name>Hello</name>
<html>
<body> 
<div id="hell">
<input type="text" name="text1" id="text1" value="Type Some thing"/>
<input type="button" name="button1" id="button1" value="Button" />

<div class="hello">
<p>here is a paragraph</p>
</div>
<div class="hello123">
    <p><a href="#">Click Me!</a>here is a paragraph again!</p>
</div>
<textarea name="hello"></textarea>
</div>
</body> 
</html>
<css> CODE STUFF </css>
<javascript> CODE STUFF </javascript>
</template>
<template>
<name>World!</name>
<html> CODE STUFF </html>
<css> CODE STUFF </css>
<javascript> CODE STUFF </javascript>
</template>
</theme>

I want to get all html tags as they are in the body tag. 我想获取所有html标记,因为它们在body标记中。 but when i get html tag using domDocument most of tags are missing. 但是当我使用domDocument获取html标记时,大多数标记都丢失了。 this is my code below 这是我的代码在下面

$doc = new DOMDocument();
    $doc->loadXML( $xml_file_string );//xml file loading here
    $themes = $doc->getElementsByTagName( "theme" );
    foreach( $themes as $theme )
    {
        $theme_name = $theme->getElementsByTagName( "name" );
        $theme_thumb = $theme->getElementsByTagName( "thumb" );
        $theme_name = $theme_name->item(0)->nodeValue;
        $theme_thumb = $theme_thumb->item(0)->nodeValue;
        echo $theme_name.'<br>';
        echo $theme_thumb.'<br>';
        $templates = $theme->getElementsByTagName( "template" );
        foreach( $templates as $template )
        {
            $template_name = $template->getElementsByTagName( "name" );
            $template_name = $template_name->item(0)->nodeValue;
            $template_html = $template->getElementsByTagName( "html" );
            $template_html = $template_html->item(0)->nodeValue;
            $template_css  = $template->getElementsByTagName( "css" );
            $template_css  = $template_css->item(0)->nodeValue;
            $template_javascript = $template->getElementsByTagName( "javascript" );
            $template_javascript = $template_javascript->item(0)->nodeValue;
            echo $template_name.'<br>';
            echo html_entity_decode($template_html).'<br>';
            echo $template_css.'<br>';
            echo $template_javascript.'<br>';
        }
    }

and the result i am getting is like, 我得到的结果是

Test http://ecample.com/bla.jpg Hello {{rating}} {{content}} here is a paragraph Click Me!here is a paragraph again! 测试http://ecample.com/bla.jpg您好{{rating}} {{content}}这是一个段落单击我!这又是一个段落! CODE STUFF CODE STUFF World! CODE STUFF CODE STUFF世界! CODE STUFF CODE STUFF CODE STUFF 代码表代码表代码表

You can see here that most of html is not working here.. please help 您可以在此处看到大多数html都无法在此处使用..请帮助

First, you have to understand, that method getElementsByTagName and any other getter return object (or array of objects) of class DOMNode . 首先,您必须了解方法getElementsByTagName和其他任何getter返回类DOMNode对象(或对象数组)。 If it has content, but not wrapped in any tag, this content can be returned by nodeValue property. 如果它具有内容,但没有包装在任何标签中,则可以通过nodeValue属性返回此内容。 And you use it to get template name. 然后使用它来获取模板名称。 But nodeValue doesn't contain html of children. 但是nodeValue不包含子代的html。 You have to create it. 您必须创建它。 Here is example: 这是示例:

$tmp_dom = new DOMDocument(); 
$tmp_dom->appendChild($tmp_dom->importNode($child, true)); 
$html = trim($tmp_dom->saveHTML());

so your code should be like: 因此您的代码应类似于:

$doc = new DOMDocument();
$doc->loadXML( $xml_file_string );//xml file loading here
$themes = $doc->getElementsByTagName( "theme" );
foreach( $themes as $theme )
{
    $theme_name = $theme->getElementsByTagName( "name" );
    $theme_thumb = $theme->getElementsByTagName( "thumb" );
    $theme_name = $theme_name->item(0)->nodeValue;
    $theme_thumb = $theme_thumb->item(0)->nodeValue;
    echo $theme_name.'<br>';
    echo $theme_thumb.'<br>';
    $templates = $theme->getElementsByTagName( "template" );
    foreach( $templates as $template )
    {
        $template_name = $template->getElementsByTagName( "name" );
        $template_name = $template_name->item(0)->nodeValue;
        $template_html = $template->getElementsByTagName( "html" );

        //HERE IS CHANGE
        $tmpHtml = new DOMDocument();
        $tmpHtml->appendChild($tmpHtml->importNode($template_html->item(0), true)); 
        $template_html = trim($tmpHtml->saveHTML());

        //REST OF CODE
    }
}

I've only made change for $template_html , but I think you can now do the rest. 我只对$template_html进行了更改,但我想您现在可以完成其余工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM