简体   繁体   English

使用dom(php)解析img和html代码

[英]Parse img and html codes with dom (php)

I have a code that parsing img and texts. 我有一个解析img和文本的代码。 Run the code in php file. 在php文件中运行代码。 It's just showing img src, abc, img src, dfe. 它只是显示img src,abc,img src,dfe。 And my code is not regular. 而且我的代码不规则。 img tags are may be with link. img标签可能带有链接。

I want to parse img and next html . 我想解析img和下一个html Like this: 像这样:

Array
(
    [0] => Array
        (
            [src] => http://www.whatever.com
            [text] =>  abc
    <br>
    <h3>title</h3>
    <div class="content">content <a href="link">my link</a></div>
        )

    [1] => Array
        (
            [src] => http://goingnowhere.com
            [text] =>  def
    <br>
    <h3>title 2</h3>
    <div class="content">content <a href="link">my link</a>

    bla bla bla

    </div>
        )

)

How can I do this? 我怎样才能做到这一点? My current code: 我当前的代码:

<?php $sample_html = '
<img src="http://www.whatever.com" alt="" />
abc
<br>
<h3>title</h3>
<div class="content">content <a href="link">my link</a></div>
<img src="http://goingnowhere.com" alt="">
def
<br>
<h3>title 2</h3>
<div class="content">content <a href="link">my link</a>

bla bla bla

</div>
';

$dom = new DOMDocument();
$dom->loadHTML($sample_html);

$data = array();
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
$data[] = array(
'src' => $image->getAttribute('src'),
'text' => trim($image->nextSibling->textContent),
);
}

echo '<pre>';
print_r($data); ?>

Use xpath to iterate through all nodes & retrieve data withing two img tags. 使用xpath遍历所有节点并使用两个img标签检索数据。

<?php $sample_html = '
<img src="http://www.whatever.com" alt="" />
abc
<br>
<h3>title</h3>
<div class="content">content <a href="link">my link</a></div>
<img src="http://goingnowhere.com" alt="">
def
<br>
<h3>title 2</h3>
<div class="content">content <a href="link">my link</a>

bla bla bla

</div>
';

$dom = new DOMDocument();
@$dom->loadHtml($sample_html);

$xpath = new DOMXPath($dom);

$snippet = '';
$arr = array();
$count = $xpath->query('//img')->length;
//loop through all img tags
for($i=0;$i<$count;$i++){

    $node = $xpath->query('//img')->item($i);
    $img_src = $node->getAttribute('src');//first image src

    while ($node = $node->nextSibling) {

      if (get_class($node) != 'DOMElement') {
        continue;
      }

      if ($node->tagName  == 'img') {
        $snippet .= $dom->saveXML($node);
        $arr[] = array(
            'src'=>$img_src,
            'content'=>$snippet
        );
        $img_src = $node->getAttribute('src');//last img src
        $snippet = '';
        break;
      }
      $snippet .= $dom->saveXML($node);

    }
}
//fill last img data
$arr[] = array('src'=>$img_src,'content'=>$snippet);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM