简体   繁体   中英

Creating DOMDocument: match one certain element in a PHP-parser

good evening dear Community,

Well first of all: felize Navidad - I wanna wish you a Merry Christmas!! In my season-break i am workin on a little parser-script.

Today i'm trying to debug a little DOMDocument object in php. Ideally it'd be nice if I could get DOMDocument to output in a array-like format, to store the data in a database!

My example: head over to the url - see the example: the target

I want to filter out the data in the block:

Schulart: BBS
Schulnummer:60119
Anschrift: Berufsbildende Schule Boppard Antoniusstr. 21; 56154 Boppard
Telefon: (0 67 42) 80 61-0
Telefax: (0 67 42) 80 61-29
E-Mail: sekretary@bbs-boppard.de
Internet: website 
Träger:Kreisverwaltung Rhein-Hunsr�ck-Kreis
letzte Änderung: 08 Feb 2010 14:33:12 von 60119

I have investigated the sourcecode - and found out that the attribute of interest should be this one: class="content"div class="content"><!-- TYPO3SEARCH_begin --> or even better: wfqbeResults

So if i run the DOMDucument way i can use this like so:

$dom->getElementById('wfqbeResults');

here the code is: - my trails

<?php

$dom = new DOMDocument();
@$dom->loadHTMLFile(' -> here the website goes in<- ');
$divElement = $dom->getElementById('wfqbeResults');

$innerHTML= '';
$children = $divElement->childNodes;
foreach ($children as $child) {
   $innerHTML .= $child->ownerDocument->saveXML( $child );
} 
echo $innerHTML;

<?

Duhh: this outputs lot of garbage. The code spits out a lot of html anyway. I have to overhaul the code a bit to get the wanted 9 lines out of the parser:

what is aimed: i want to get out the following:

a. 9 lines with nine labels and nine values. b. I want to prepare the output to store it in a MySQL-DB!

Look forward to some hints greetings zero

Here is the solution return the labels and values in a formatted array ready for input to mysql!

<?php

$dom = new DOMDocument();
@$dom->loadHTMLFile('http://schulen.bildung-rp.de/gehezu/startseite/einzelanzeige.html?tx_wfqbe_pi1%5buid%5d=60119');
$divElement = $dom->getElementById('wfqbeResults');

$innerHTML= '';
$children = $divElement->childNodes;
foreach ($children as $child) {
$innerHTML = $child->ownerDocument->saveXML( $child );

$doc = new DOMDocument();
$doc->loadHTML($innerHTML);
//$divElementNew = $dom->getElementsByTagName('td');
$divElementNew = $dom->getElementsByTagname('td');

    /*** the array to return ***/
    $out = array();
    foreach ($divElementNew as $item)
    {
        /*** add node value to the out array ***/
        $out[] = $item->nodeValue;
    }

echo '<pre>';
print_r($out);
echo '</pre>';

} 

?>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM