简体   繁体   中英

PHP scrape data from website

I am new to programming. So I choose to build a webpage by using Wordpress. But I am trying to gather weather data from other sites, I could not find a fitting plugin for scraping the data, and decided to give it a try and put something together myself. But with my limited understanding of programming is giving me issues. With a little inspirations from the web I have put this together:

$html = file_get_contents('http://www.frederikshavnhavn.dk/scripts/weatherwindow.php?langid=2'); //get the html returned from the following url

$poke_doc = new DOMDocument();

libxml_use_internal_errors(false); //disable libxml errors

if(!empty($html)){ //if any html is actually returned

  $poke_doc->loadHTML($html);
  libxml_clear_errors(); //remove errors for yucky html

  $poke_xpath = new DOMXPath($poke_doc);

  //get all the spans's with an id
  $poke_type = $poke_xpath->query("//span[@class='weathstattype']");
  $poke_text = $poke_xpath->query("//span[@class='weathstattext']");

  foreach($poke_text as $text){ 
    foreach($poke_type as $type){
    echo $type->nodeValue;
    echo $text->nodeValue . "</br>";
    continue 2;
   } 
  break;
 }    
} 

Being that this is all new to me, and I am really trying to get this to work for me, hoping for a better understanding of the code behind the works.

What I am trying to achieve is a formatted list with the data. 1. value $type $text 2. value $type $text

Right now it is giving me a lot of trouble. when I use the continue 2 it does not change the value $type, but when I just use continue statement it changes $type but not $text. How can I make it change both values each time?

Thanks for your help.

try adding this method:

function get_inner_html( $node ) {
    $innerHTML= '';
    $children = $node->childNodes;
    foreach ($children as $child) {
        $innerHTML .= $child->ownerDocument->saveXML( $child );
    }

    return $innerHTML;
} 

then replace the foreach with this:

  foreach($poke_text as $text){ 
     //echo $type ->nodeValue . "</n>";
      echo get_inner_html($text ).'<br>';

  }  
    foreach($poke_type as $type){
     //echo $text ->nodeValue;
     echo get_inner_html($type ).'<br>';
  }

produces this:

  1. 197° (Syd) 5.7 °C Stigende 4.8 m/s Stigende 5.4 m/s Stigende -6 cm Faldende 1004 hPa Vindretning Lufttemperatur Middel vindhastighed Max vindhastighed Vandstand Lufttryk

Buddy in your code your foreach loops (in last) you use $type as $text and $text as $type.. I run the code and just change the variables as they should be its working fine..

$html = file_get_contents('http://www.frederikshavnhavn.dk/scripts/weatherwindow.php?langid=2'); //get the html returned from the following url

$poke_doc = new DOMDocument();

libxml_use_internal_errors(false); //disable libxml errors

if(!empty($html)){ //if any html is actually returned

  $poke_doc->loadHTML($html);
  libxml_clear_errors(); //remove errors for yucky html

  $poke_xpath = new DOMXPath($poke_doc);

  //get all the spans's with an id
  $poke_type = $poke_xpath->query("//span[@class='weathstattype']");

  $poke_text = $poke_xpath->query("//span[@class='weathstattext']");

  foreach($poke_text as $text){ 
     echo $text->nodeValue;
  }  
    foreach($poke_type as $type){
     echo $type->nodeValue;
  }
}

And this the out that I got from your code (by changing the variables in loop)

196° (Syd) 5.6 °C 4.1 m/s 5 m/s -6 cm 1004 hPa Vindretning Lufttemperatur Middel vindhastighed Max vindhastighed Vandstand Lufttryk

Now You have your data I think you can manage how to sort them out...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM