繁体   English   中英

解析html,用DOMdocument抓取nodeValue问题

[英]Parsing html, grabbing nodeValue problem with DOMdocument

试图解析 html 页面,但在抓取dtdd标签的 nodeValue 时遇到了一些麻烦。

$outline ="http://www.sumitomo-rd-mansion.jp/kansai/higashi_umeda/detail.cgi";
foreach ($outlineUrl as $results) {
        if (strpos($results, 'http://www.sumitomo-rd-mansion.jp') === 0) {
            $html = file_get_contents($results);
            $DOMParser = new \DOMDocument();
            $DOMParser->loadHTML($html);

            $changeForMyDB = [
                'region' => '関西',
                'link' => json_encode($results),
                'building_name' => '',
                'price' => '不明',
                'old_price' => '',
                'extend' => '不明',
                'address' => '',
                'total_house' => '',
                'rooms' => '',
                'cons_finish' => '',
                'entry' => '不明',
                'balcony' => '不明',
                'company_name' => '',
                'list_from' => ''
            ];

                foreach ($DOMParser->getElementsByTagName('dl') as $tr) {
                    $property = trim($tr->getElementsByTagName('dt')[0]->nodeValue);
                    $value = trim($tr->getElementsByTagName('dd')[0]->nodeValue);

                    switch ($property) {
                        case '物件名':
                            $changeForMyDB['building_name'] = $value;
                            break;
                        case '販売価格':
                            $changeForMyDB['price'] = $value;
                            break;
                        case '専有面積':
                            $changeForMyDB['extend'] = $value;
                            break;
                        case '所在地':
                            $changeForMyDB['address'] = $value;
                            break;
                        case '総戸数':
                            $changeForMyDB['total_house'] = $value;
                            break;
                        case '間取り':
                            $changeForMyDB['rooms'] = $value;
                            break;
                        case '竣工時期':
                            $changeForMyDB['cons_finish'] = $value;
                            break;
                        case '管理会社':
                            $changeForMyDB['company_name'] = $value;
                            break;
                        case '入居時期':
                            $changeForMyDB['entry'] = $value;
                            break;
                        case 'バルコニー面積':
                            $changeForMyDB['balcony'] = $value;
                            break;
                        default:
                            break;
                    }
                }
            }

        var_dump($changeForMyDB);
        }

有了这个,我无法获取所有 dl 的dtdt nodeValue。 刚拿到两个。 我的 foreach 循环错了还是什么? 谢谢你的协助!

代码有很多问题,我已经用注释完成了这个以帮助......

// Variable for list of details
$details = [];
// outlineUrl is an array of URL's (not a single string which doesn't work in foreach()
$outlineUrl = ["http://www.sumitomo-rd-mansion.jp/kansai/higashi_umeda/detail.cgi"];
foreach ($outlineUrl as $results) {
    $html = file_get_contents($results);
    $DOMParser = new \DOMDocument();
    // Turn off some error reporting on import
    libxml_use_internal_errors(true);
    $DOMParser->loadHTML($html);      // There was a missing ';'

    foreach ($DOMParser->getElementsByTagName('dl') as $tr) {
        // Build up a list of details (you were overwriting them all the time)

        $dd = $tr->getElementsByTagName('dd');
        foreach ( $tr->getElementsByTagName('dt') as $key => $ent )  {
            $details[] = [ 'property' => trim($ent->nodeValue),
                'value' => trim($dd[$key]->nodeValue) ];
        }
    }
}
// Output list of details
var_dump($details);

将遍历每个<dl...>标记中的所有<dt><dd>值对。

更新代码...

$details = [];
$outlineUrl = ["http://www.sumitomo-rd-mansion.jp/kansai/higashi_umeda/detail.cgi"];
foreach ($outlineUrl as $results) {
    $html = file_get_contents($results);
    $DOMParser = new \DOMDocument();
    file_put_contents("test.html", $html);
    libxml_use_internal_errors(true);
    $DOMParser->loadHTML($html);

    foreach ($DOMParser->getElementsByTagName('dl') as $tr) {
        $dd = $tr->getElementsByTagName('dd');
        $newDetails = [];
        foreach ( $tr->getElementsByTagName('dt') as $key => $ent )  {
            $value = trim($dd[$key]->nodeValue);
            switch ($ent->nodeValue) {
                case '物件名':
                    $newDetails['building_name'] = $value;
                    break;
                case '販売価格':
                    $newDetails['price'] = $value;
                    break;
                case '専有面積':
                    $newDetails['extend'] = $value;
                    break;
                case '所在地':
                    $newDetails['address'] = $value;
                    break;
                case '総戸数':
                    $newDetails['total_house'] = $value;
                    break;
                case '間取り':
                    $details['rooms'] = $value;
                    break;
                case '竣工時期':
                    $newDetails['cons_finish'] = $value;
                    break;
                case '管理会社':
                    $newDetails['company_name'] = $value;
                    break;
                case '入居時期':
                    $newDetails['entry'] = $value;
                    break;
                case 'バルコニー面積':
                    $newDetails['balcony'] = $value;
                    break;
                default:
                    break;
            }
        }
        $details[] = $newDetails;
    }
}

var_dump($details);

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM