[英]DOMDocument() Problem with nodeValue and saveHTML() in PHP
[英]Parsing html, grabbing nodeValue problem with DOMdocument
试图解析 html 页面,但在抓取dt
和dd
标签的 nodeValue 时遇到了一些麻烦。
$outline ="http://www.sumitomo-rd-mansion.jp/kansai/higashi_umeda/detail.cgi";
foreach ($outlineUrl as $results) {
if (strpos($results, 'http://www.sumitomo-rd-mansion.jp') === 0) {
$html = file_get_contents($results);
$DOMParser = new \DOMDocument();
$DOMParser->loadHTML($html);
$changeForMyDB = [
'region' => '関西',
'link' => json_encode($results),
'building_name' => '',
'price' => '不明',
'old_price' => '',
'extend' => '不明',
'address' => '',
'total_house' => '',
'rooms' => '',
'cons_finish' => '',
'entry' => '不明',
'balcony' => '不明',
'company_name' => '',
'list_from' => ''
];
foreach ($DOMParser->getElementsByTagName('dl') as $tr) {
$property = trim($tr->getElementsByTagName('dt')[0]->nodeValue);
$value = trim($tr->getElementsByTagName('dd')[0]->nodeValue);
switch ($property) {
case '物件名':
$changeForMyDB['building_name'] = $value;
break;
case '販売価格':
$changeForMyDB['price'] = $value;
break;
case '専有面積':
$changeForMyDB['extend'] = $value;
break;
case '所在地':
$changeForMyDB['address'] = $value;
break;
case '総戸数':
$changeForMyDB['total_house'] = $value;
break;
case '間取り':
$changeForMyDB['rooms'] = $value;
break;
case '竣工時期':
$changeForMyDB['cons_finish'] = $value;
break;
case '管理会社':
$changeForMyDB['company_name'] = $value;
break;
case '入居時期':
$changeForMyDB['entry'] = $value;
break;
case 'バルコニー面積':
$changeForMyDB['balcony'] = $value;
break;
default:
break;
}
}
}
var_dump($changeForMyDB);
}
有了这个,我无法获取所有 dl 的dt
和dt
nodeValue。 刚拿到两个。 我的 foreach 循环错了还是什么? 谢谢你的协助!
代码有很多问题,我已经用注释完成了这个以帮助......
// Variable for list of details
$details = [];
// outlineUrl is an array of URL's (not a single string which doesn't work in foreach()
$outlineUrl = ["http://www.sumitomo-rd-mansion.jp/kansai/higashi_umeda/detail.cgi"];
foreach ($outlineUrl as $results) {
$html = file_get_contents($results);
$DOMParser = new \DOMDocument();
// Turn off some error reporting on import
libxml_use_internal_errors(true);
$DOMParser->loadHTML($html); // There was a missing ';'
foreach ($DOMParser->getElementsByTagName('dl') as $tr) {
// Build up a list of details (you were overwriting them all the time)
$dd = $tr->getElementsByTagName('dd');
foreach ( $tr->getElementsByTagName('dt') as $key => $ent ) {
$details[] = [ 'property' => trim($ent->nodeValue),
'value' => trim($dd[$key]->nodeValue) ];
}
}
}
// Output list of details
var_dump($details);
将遍历每个<dl...>
标记中的所有<dt>
和<dd>
值对。
更新代码...
$details = [];
$outlineUrl = ["http://www.sumitomo-rd-mansion.jp/kansai/higashi_umeda/detail.cgi"];
foreach ($outlineUrl as $results) {
$html = file_get_contents($results);
$DOMParser = new \DOMDocument();
file_put_contents("test.html", $html);
libxml_use_internal_errors(true);
$DOMParser->loadHTML($html);
foreach ($DOMParser->getElementsByTagName('dl') as $tr) {
$dd = $tr->getElementsByTagName('dd');
$newDetails = [];
foreach ( $tr->getElementsByTagName('dt') as $key => $ent ) {
$value = trim($dd[$key]->nodeValue);
switch ($ent->nodeValue) {
case '物件名':
$newDetails['building_name'] = $value;
break;
case '販売価格':
$newDetails['price'] = $value;
break;
case '専有面積':
$newDetails['extend'] = $value;
break;
case '所在地':
$newDetails['address'] = $value;
break;
case '総戸数':
$newDetails['total_house'] = $value;
break;
case '間取り':
$details['rooms'] = $value;
break;
case '竣工時期':
$newDetails['cons_finish'] = $value;
break;
case '管理会社':
$newDetails['company_name'] = $value;
break;
case '入居時期':
$newDetails['entry'] = $value;
break;
case 'バルコニー面積':
$newDetails['balcony'] = $value;
break;
default:
break;
}
}
$details[] = $newDetails;
}
}
var_dump($details);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.