简体   繁体   中英

PHP simple_html_dom extract data from url

What is the problem with my code? Why it doesn't work?

This is my code that i tried to use:

function extract_data($url){

    // Create DOM from URL
    $html = file_get_html($url);

    // initialize empty array to store the data array from each row
    $theData = array();

    // loop over rows
    foreach($html->find('p.ch_title') as $row) {

        // initialize array to store the cell data from each row
        $rowData = array();
        foreach($row->find('p.ch_spec') as $cell) {

            // push the cell's text to the array
            $rowData[] = $cell->innertext;
        }

        // push the row's data array to the 'big' array
        $theData[] = $rowData;
    }

    return $theData;

}

and this is the html data from the url;

<div class="holder-specificatii">
       <div class="box-specificatie">
          <div class="ch_group">Dimensiuni</div>
          <p class="ch_title">Latime (mm):</p>
          <p class="ch_spec">195</p>
          <p class="ch_title">Inaltime:</p>
          <p class="ch_spec">65</p>
          <p class="ch_title">Diametru (inch):</p>
          <p class="ch_spec">15</p>
          <div class="clear"></div>
       </div>
       <div class="box-specificatie">
          <div class="ch_group">Caracteristici tehnice</div>
          <p class="ch_title">Anotimp:</p>
          <p class="ch_spec">Iarna</p>
          <p class="ch_title">Indice sarcina:</p>
          <p class="ch_spec">91</p>
          <p class="ch_title">Indice viteza:</p>
          <p class="ch_spec">T</p>
          <p class="ch_title">Economie de carburant:</p>
          <p class="ch_spec">C</p>
          <p class="ch_title">Franare pe suprafete umede:</p>
          <p class="ch_spec">C</p>
          <p class="ch_title">Tip vehicul:</p>
          <p class="ch_spec">Turism</p>
          <p class="ch_title">DOT:</p>
          <p class="ch_spec">2014</p>
          <p class="ch_title">Nivel de zgomot (dB):</p>
          <p class="ch_spec">72dB</p>
          <div class="clear"></div>
       </div>
    </div>

The problem is the function that returns an empty array.

You're pointing to an undefined object, you should be using $html instead:

function extract_data($url){

    $html = file_get_html($url);
    $theData = array();
    // loop over rows
    foreach($html->find('div.box-specificatie') as $k => $row) { // loop each container
        $temp = array();
        // $main_title = $row->find('div.ch_group', 0)->innertext;
        foreach($row->find('p.ch_title') as $title) { // each title
            $spec = $title->next_sibling()->innertext(); // pair up with spec
            $temp[] = array('title' => $title->innertext, 'spec' => $spec);
        }
        $theData[$k] = $temp; // push inside
        // $theData[$main_title] = $temp; // optionally you can use a main title

    }

    return $theData;
}

echo '<pre>';
print_r(extract_data($url));

在第一个foreach中,您做对了,使用了从file_get_html接收到的html,但是在嵌套的foreach中,您使用的是返回的$ row,它没有p.ch_spec的原因是它不是p.ch_title的子级。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM