简体   繁体   English

使用Xpath和PHP从表中收集数据?

[英]Scraping data from the table using Xpath and PHP?

I want to extract the data from the table below and the markup of the table is given below.I'm using Xpath to extract data from the table but other suggestions are also welcome. 我想从下表中提取数据,表的标记如下所示。我正在使用Xpath从表中提取数据,但也欢迎其他建议。

      <div style="clear:both;" id="showPrice">
      <br>
      <table cellspacing="1">
         <tbody>
             <tr>
                 <td width="50px" style="text-align: left" class="tdhead">SN</td>
                 <td width="650px" style="text-align: left" class="tdhead">Companies</td>
                 <td width="20px" class="tdhead">Trans</td>
                 <td width="50px" class="tdhead"> Max Price</td>
                 <td width="50px" class="tdhead">Min Price</td>
                 <td width="50px" class="tdhead">Closing Price</td>
                 <td width="50px" class="tdhead">Total Shares</td>
                 <td width="50px" class="tdhead">Amount Rs.</td>
                 <td width="50px" class="tdhead">Prev. Closing</td>
                 <td width="20px" class="tdhead">Diff.</td>
                 <td width="50px" class="tdhead">Diff. %</td>
                 <td colspan="3" class="closing-price">
                     <table>
                         <tbody>
                            <tr>
                               <td colspan="3">365&nbsp;days</td>
                             </tr>
                             <tr>
                               <td width="50px" class="closing-price-lighter">Max Price</td>
                               <td width="50px" class="closing-price-lighter">Min Price</td>
                               <td width="50px" class="closing-price-lighter">Avg</td>    
                             </tr>
                         </tbody>
                     </table>
                     </td>
                    </tr>
                    <tr style="background-color: #A61A00">
                       <td style="text-align: center;color:white;">1</td>
                       <td style="text-align: left;padding:3px;">
                          <a href="viewcompany.php?symbol=ACEDBL&amp;id=177" style="text-decoration:none;color:white;">Ace Development Bank Limited</a>
                       </td>
                       <td class="numeric-data">3</td>
                       <td class="numeric-data">269.00</td>
                       <td class="numeric-data">264.00</td><td class="numeric-data" style="background-color:#99CCFF;color:black;">264.00</td>
                       <td class="numeric-data">495</td>
                       <td class="numeric-data">131,405</td>
                       <td class="numeric-data">265.00</td>
                       <td class="numeric-data">-1.00</td>
                       <td class="numeric-data" style="background-color:#99CCFF;color:black;">-0.38</td>
                       <td class="numeric-data" style="background-color:#99FFFF;color:black;">281</td>
                       <td class="numeric-data" style="background-color:#99FFFF;color:black;">102</td>
                       <td class="numeric-data" style="background-color:#99FFFF;color:black;">150.15</td>       
                   </tr>
               </tbody>
            </table>
         </div>

I want only the data after the closing-price class. 我只希望收盘价类之后的数据。 The data which I require is the text and the numeric value from the following td of the tr : 我需要的数据是tr的以下td中的文本和数值:

                       <td style="text-align: left;padding:3px;">
                          <a href="viewcompany.php?symbol=ACEDBL&amp;id=177" style="text-decoration:none;color:white;">Ace Development Bank Limited</a>
                       </td>
                       <td class="numeric-data">3</td>
                       <td class="numeric-data">269.00</td>
                       <td class="numeric-data">264.00</td><td class="numeric-data" style="background-color:#99CCFF;color:black;">264.00</td>
                       <td class="numeric-data">495</td>
                       <td class="numeric-data">131,405</td>
                       <td class="numeric-data">265.00</td>
                       <td class="numeric-data">-1.00</td>
                       <td class="numeric-data" style="background-color:#99CCFF;color:black;">-0.38</td>
                       <td class="numeric-data" style="background-color:#99FFFF;color:black;">281</td>
                       <td class="numeric-data" style="background-color:#99FFFF;color:black;">102</td>
                       <td class="numeric-data" style="background-color:#99FFFF;color:black;">150.15</td>       
                   </tr>   

I tried following expression but could not get the result: 我尝试了以下表达式,但无法获得结果:

  //div[@id='showPrice']/td[preceding-sibling::td[@class='closing-price']]/text()

You can also do somethin like this: 您还可以执行以下操作:

Point it to that particular <tr> tag: 将其指向该特定的<tr>标记:

$html_string = file_get_contents('http://www.sharesansar.com/today.php');
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html_string);
libxml_clear_errors();
$xpath = new DOMXpath($dom);
$values = array();
$row = $xpath->query('//div[@id="showPrice"]/table[1]/tr[2]/td');
foreach($row as $value) {
    $values[] = trim($value->textContent);
}

echo '<pre>';
print_r($values);

The results: 结果:

Array
(
    [0] => 1
    [1] => Ace Development Bank Limited
    [2] => 3
    [3] => 269.00
    [4] => 264.00
    [5] => 264.00
    [6] => 495
    [7] => 131,405
    [8] => 265.00
    [9] => -1.00
    [10] => -0.38
    [11] => 281
    [12] => 102
    [13] => 150.15
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM