在特定节点的xml文件中的段落标签中获取文本

Question

I have this xml file 我有这个xml文件

http://www.metacafe.com/tags/cats/rss.xml

With this code: 使用此代码：

$xml = simplexml_load_file('http://www.metacafe.com/tags/cats/rss.xml', 'SimpleXMLElement', LIBXML_NOCDATA);
echo $xml->channel->item->title . "<br>";
echo $xml->channel->item->description . "<br>";

I get this OUTPUT: 我得到这个输出：

Dad Challenges Kids to Climb Walls to Get Candy<br>
<a href="http://www.metacafe.com/watch/cb-M0fIp1ctKtsn/dad_challenges_kids_to_climb_walls_to_get_candy/"><img src="http://s3.mcstatic.com/thumb/11150410/28824820/4/directors_cut/0/1/dad_challenges_kids_to_climb_walls_to_get_candy.jpg?v=1" align="right" border="0" alt="Dad Challenges Kids to Climb Walls to Get Candy" vspace="4" hspace="4" width="134" height="78" /></a>
                <p>
                Nick Dietz compiles some of the week's best viral videos, 
                including an elephant trying really hard to break a stick, a cat
                sunbathing and kids climbing up the walls to get candy. Plus, 
                making  music with a Ford Fiesta.                              
                <br>Ranked <strong>4.00</strong> / 5 | 78 views | <a href="http://www.metacafe.com/watch/cb-M0fIp1ctKtsn/dad_challenges_kids_to_climb_walls_to_get_candy/">0 comments</a><br/>
                </p>
                <p>
                 <a href="http://www.metacafe.com/watch/cb-M0fIp1ctKtsn/dad_challenges_kids_to_climb_walls_to_get_candy/"><strong>Click here to watch the video</strong></a> (02:38)<br/>
                    Submitted By:                       <a href="http://www.metacafe.com/channels/CBS/">CBS</a><br/>
                    Tags:
                    <a href="http://www.metacafe.com/topics/penna/">Penna</a>&nbsp;
                    <a href="http://www.metacafe.com/topics/bjbj/">Bjbj</a>&nbsp;
                    <a href="http://www.metacafe.com/topics/ciao/">Ciao</a>&nbsp;                   <br/>
                    Categories: <a href='http://www.metacafe.com/videos/entertainment/'>Entertainment</a>
               </p>

        <br>

I need get this output (than its need remove all others elements): 我需要获得此输出（比其需要删除所有其他元素）：

Dad Challenges Kids to Climb Walls to Get Candy
Nick Dietz compiles some of the week's best viral videos, 
including an elephant trying really hard to break a stick, a cat
sunbathing and kids climbing up the walls to get candy. Plus, 
making  music with a Ford Fiesta.

I dont know how proceed to get this result. 我不知道如何继续获得这个结果。

Answer 1

The reason you're getting the elements inside description is the CDATA section. 您获得描述内的元素的原因是CDATA部分。 For the XML-Parser the content of a CDATA session is always text. 对于XML解析器，CDATA会话的内容始终是文本。 Elements like a <p> are not read into the DOM structure. 像<p>这样的元素不会读入DOM结构。

A simple strip_tags() will delete all elements. 一个简单的strip_tags()将删除所有元素。 For more control you need to load the html fragment into a DOM: 为了获得更多控制，您需要将html片段加载到DOM中：

$html = <<<'HTML'
<a href="http://www.metacafe.com/watch/cb-M0fIp1ctKtsn/dad_challenges_kids_to_climb_walls_to_get_candy/"><img src="http://s3.mcstatic.com/thumb/11150410/28824820/4/directors_cut/0/1/dad_challenges_kids_to_climb_walls_to_get_candy.jpg?v=1" align="right" border="0" alt="Dad Challenges Kids to Climb Walls to Get Candy" vspace="4" hspace="4" width="134" height="78" /></a>
                <p>
                Nick Dietz compiles some of the week's best viral videos, 
                including an elephant trying really hard to break a stick, a cat
                sunbathing and kids climbing up the walls to get candy. Plus, 
                making  music with a Ford Fiesta.                              
                <br>Ranked <strong>4.00</strong> / 5 | 78 views | <a href="http://www.metacafe.com/watch/cb-M0fIp1ctKtsn/dad_challenges_kids_to_climb_walls_to_get_candy/">0 comments</a><br/>
                </p>
                <p>
                 <a href="http://www.metacafe.com/watch/cb-M0fIp1ctKtsn/dad_challenges_kids_to_climb_walls_to_get_candy/"><strong>Click here to watch the video</strong></a> (02:38)<br/>
                    Submitted By:                       <a href="http://www.metacafe.com/channels/CBS/">CBS</a><br/>
                    Tags:
                    <a href="http://www.metacafe.com/topics/penna/">Penna</a>&nbsp;                 <br/>
                    Categories: <a href='http://www.metacafe.com/videos/entertainment/'>Entertainment</a>
               </p>

        <br>
HTML;

$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXPath($dom);

$content = $xpath->evaluate("string(//p[1]/text())");
var_dump($content);

The Xpath Expression Xpath表达式

//p/text()[1] is the first text node inside a p. //p/text()[1]是p内的第一个文本节点。 The string() function converts it into a string. string（）函数将其转换为字符串。 If the node does not exists, the expression will return an empty string. 如果该节点不存在，则表达式将返回一个空字符串。

在特定节点的xml文件中的段落标签中获取文本

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-11-25 18:24:20

The Xpath Expression Xpath表达式

在特定节点的xml文件中的段落标签中获取文本

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-11-25 18:24:20

The Xpath Expression Xpath表达式

解决方案1
1 已采纳 2013-11-25 18:24:20