简体   繁体   中英

Scrape data from website using Simple_HTML_Dom.php

Using simple_html_dom.php , I'm trying to scrape the available sizes from the website of a friend. Unfortunately, I'm not succeeding with extracting as much as a single size, since I don't understand what the correct selection critirium would be.

In the below example I would like to extract "110", as it's the only available size. I tried with extracting labels, but then I guess I'd have to include the next criterium, which should be the value for the element_id: "for" - that starts with "attribute 6" Any help would be greatly appreciated.

<div id="sizeSelector" class="cf">
  <div class="titleHeader cf">
    <p class="text">
      <h2>Choose a Size</h2>
      <h2 id="print-size" style="display:none;">Sizes available</h2>
      <label for="attribute76">
         <input id="attribute76" class="jshide" type="radio" value="76" name="super_attribute[144]">
110
      </label>
   </div>
</div>

You could iterate through the labels and check whether the input is disabled or not:

foreach ($html->find('#sizeSelector label') as $e)
{   if ( $e->find('input[disabled]') )
    {   // $e is the <label> element, $e->plaintext gets the text content
        echo "Input " . $e->plaintext . " is disabled.\n";
    }
    else
    {   echo "Input " . $e->plaintext . " is enabled.\n";
    }
}

Unfortunately, trying to select labels that don't have the class disabled, using find('label[class!=disabled]') appears to automatically dismiss all labels without a class. You could query for labels without a class, find('label[!class]') , but you risk excluding labels that have some other class.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM