简体   繁体   中英

PHP DOMdocument echoing problem

$content = '<!--<sup><span style="font-weight:bold;color:black;">0</span></sup><br/>-->
    <div class="popular-video-image">
        <a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>">
            <img src="/images/topvideo/1.jpg" alt=""/>
        </a>
        <span class="popular-video-artist ellipsis"><a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>" class="ellipsis">Far East Movement</a></span>
        <span class="popular-video-title ellipsis"><a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>" class="ellipsis">Like a G6</a></span>
    </div>';

    $dom = new DOMDocument;
    $dom->preserveWhiteSpace = false;
    $dom->loadHTML($content);
    foreach ($dom->getElementsByTagName('a') as $node)
    {
        $node->setAttribute('href', 'http://mysite.ru/' . $node->getAttribute('href'));
    }
    $dom->formatOutput = true;

    echo $dom->saveXml($dom->documentElement);

Output:

<html>
  <body>
    <div class="popular-video-image">&#13;
        <a href="http://mysite.ru/video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="&lt;lang video_go_to=Far East Movement - Like a G6&gt;">&#13;
            <img src="/images/topvideo/1.jpg" alt=""/></a>&#13;
        <span class="popular-video-artist ellipsis"><a href="http://mysite.ru/video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="&lt;lang video_go_to=Far East Movement - Like a G6&gt;" class="ellipsis">Far East Movement</a></span>&#13;
        <span class="popular-video-title ellipsis"><a href="http://mysite.ru/video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="&lt;lang video_go_to=Far East Movement - Like a G6&gt;" class="ellipsis">Like a G6</a></span>&#13;
    </div>

  </body>
</html>

I do not want to add html and body tags. Also do not want to tag replaced to &lt;lang&gt; . And &#13; is also unnecessary.

I want to receive such content, which was at the entrance, only with modified links..

Sorry for bad english!

You are seeing &#13; at the end of each line because your HTML has Windows-style line endings CR+LF . To get rid of them, run this on it before you feed it into DOMDocument — to convert them to Unix-style line endings LF :

$content = preg_replace('/\r\n/', "\n", $content);

saveXml takes an optional parameter to allow you to specify the node to output.

$dom->saveXml($dom->documentElement->firstChild->firstChild);

This will remove the html and body tags from the output.

I guess that the <html> and <body> tags get placed in because you are using loadHTML . Try using loadXML instead.

As for &lt;lang&gt; , it has to be replaced because otherwise the resulting XML would not be valid. If it is causing you problems, you should change your approach a little and work with it, not against it.

<?php
    $content = '<!--<sup><span style="font-weight:bold;color:black;">0</span></sup><br/>-->
    <div class="popular-video-image">
        <a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>">
            <img src="/images/topvideo/1.jpg" alt=""/>
        </a>
        <span class="popular-video-artist ellipsis"><a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>" class="ellipsis">Far East Movement</a></span>
        <span class="popular-video-title ellipsis"><a href="video/Far+East+Movement - Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement - Like a G6>" class="ellipsis">Like a G6</a></span>
    </div>';

    $dom = new DOMDocument;
    $dom->preserveWhiteSpace = false;
    $dom->loadHTML($content);
    foreach ($dom->getElementsByTagName('a') as $node)
    {
        $node->setAttribute('href', 'http://mysite.ru/' . $node->getAttribute('href'));
    }
    $dom->formatOutput = true;

    echo preg_replace('#^<!DOCTYPE.+?>#', '', str_replace( array('<html>', '</html>', '<body>', '</body>', "\n\n", '&lt;', '&gt;'), array('', '', '', '', '', '<', '>',), $dom->saveHTML()));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM