I am trying to get the inner html of a <p>
tag and save it as a .txt file. It is a very simple page; there is only one <p>
on it. I tried using getElementsByTagName('p')
as per: Using PHP to get DOM Element . Unfortunately, it didn't work for me, but maybe I'm missing something. My code is:
<?php
$dataPage = file_get_contents('http://www.somedataurl.com');
$doc = new DOMDocument;
$doc->loadHTML($dataPage);
$dataNodeList = $doc->getElementsByTagName('p');
$dataNode = $dataNodeList->item(0);
function innerHTML($node) {
return implode(array_map([$node->ownerDocument, "saveHTML"],
iterator_to_array($node->childNodes)));
}
$theData = innerHTML($dataNode);
header('Content-Type: text/plain');
$filename = date('Y-m-d') . '.txt';
file_put_contents($filename, $theData);
The error log is giving me:
PHP Notice: Undefined property:: DOMNodeList (line 10)
PHP Notice: Undefined property:: DOMNodeList (line 11)
PHP Catchable fatal error (line 11)
These errors sound rather alarming, especially the last one.
Question: Is there a better tool I can use other than getElementsByTagName()
since I am only dealing with one <p>
? Or can this way work if I adjust a few things?
if there is only one P tag,i think you had better extract P content using Regular Expressions
example:
preg_match("/<p>(.*?)<\/p>/is",$dataPage,$match);
print_r($match[1]);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.