简体   繁体   中英

How to extract links and text inside li tags?

I am using the below code to get all the href tags, text and text inside ul tags but I don't know why I am not getting the href links it shows empty and text inside anchor tag and text inside ul tag are together but I want this in separate variables, I don't know where I am going wrong. any help would be appreciated.

<?php

    $str='<li><a href="test1.php">21.03.2017

    <ul>Test1</ul>
    </a><p>

    <a href="test1"></a>
    </p>

    </li>

    <li><a href="test2.php">21.03.2017

    <ul>Text2</ul>
    </a><p>

    <a href="test2.php"></a>
    </p>

    </li>';

    $dom = new DOMDocument;

    @$dom->loadHTML($str);


    $liList = $dom->getElementsByTagName('li');

    foreach ($liList as $li) {

              $output[] = array (
          'str' => $li->nodeValue,
          'href' => $li->getAttribute('href')
       );

    }
    var_dump($output);

?>

output

array(2) { [0]=> array(2) { ["str"]=> string(22) "21.03.2017 Test1 " ["href"]=> string(0) "" } [1]=> array(2) { ["str"]=> string(22) "21.03.2017 Text2 " ["href"]=> string(0) "" } }

href is an attribute of an <a> tag, not an <li> , change your code to $dom->getElementsByTagName('a'); and it will start working!

See here: https://3v4l.org/4Ln5E

Something along the lines of this:

$doc = new DOMDocument();
  $doc->loadHTML($str);
  $a= $doc->getElementsByTagName('a');

  foreach($a as $href) {
   echo $href->getAttribute('href')."<br />";

  }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM