簡體   English   中英

BeautifulSoup Python-HTML表格數據問題

[英]BeautifulSoup Python - HTML Table Data Issues

我需要從html表中提取一個值,該值可以從txt文件中的網絡服務器中獲取。 確切的要求是將最后的臨時讀取時間明智地提取到變量中。

我認為該表格的格式並不完美。

這是表格的html代碼示例...

<table border="1" rules="all">
<col />
<col />
   <col align="char" char="." />
   <col align="char" char="." />
   <col />
   <col />
   <col align="char" char="m" />
   <col align="char" char="m" />
   <col align="char" char="." />
   <col align="char" char="," />
   <tr>
     <th colspan="2" rowspan="2">Date &amp; time</th>
    <th rowspan="2">Temp</th>
    <th rowspan="2">Feels like</th>
    <th rowspan="2">Humidity</th>
    <th colspan="3">Wind</th>
    <th rowspan="2">Rain</th>
    <th rowspan="2">Pressure</th>
  </tr>
  <tr>
    <th>dir</th>
    <th>ave</th>
    <th>gust</th>
  </tr>
  <tr>
    <td>2014/01/08</td>
    <td>1056 GMT</td>
    <td>11.0 &deg;C</td>
    <td>9.8 &deg;C</td>
    <td>74%</td>
    <td>NNW</td>
    <td>1 mph</td>
    <td>6 mph</td>
    <td>0.3 mm</td>
    <td>1032.4 hPa, rising</td>
  </tr>
  <tr>
    <td></td>
    <td>1159 GMT</td>
    <td>10.8 &deg;C</td>
    <td>9.7 &deg;C</td>
    <td>74%</td>
    <td>SSE</td>
    <td>1 mph</td>
    <td>4 mph</td>
    <td>0.0 mm</td>
    <td>1032.0 hPa, rising slowly</td>
  </tr>
  <tr>
    <td></td>
    <td>1258 GMT</td>
    <td>11.0 &deg;C</td>
    <td>9.9 &deg;C</td>
    <td>73%</td>
    <td>SSE</td>
    <td>1 mph</td>
    <td>4 mph</td>
    <td>0.0 mm</td>
    <td>1031.5 hPa, falling slowly</td>
  </tr>
  <tr>
    <td></td>
    <td>1357 GMT</td>
    <td>10.8 &deg;C</td>
    <td>9.7 &deg;C</td>
    <td>75%</td>
    <td>SSW</td>
    <td>1 mph</td>
    <td>4 mph</td>
    <td>0.0 mm</td>
    <td>1030.7 hPa, falling</td>
  </tr>
  <tr>
    <td></td>
    <td>1456 GMT</td>
    <td>10.3 &deg;C</td>
    <td>9.3 &deg;C</td>
    <td>77%</td>
    <td>ENE</td>
    <td>1 mph</td>
    <td>4 mph</td>
    <td>0.0 mm</td>
    <td>1030.0 hPa, falling</td>
  </tr>
  <tr>
    <td></td>
    <td>1600 GMT</td>
    <td>9.7 &deg;C</td>
    <td>8.7 &deg;C</td>
    <td>81%</td>
    <td>WNW</td>
    <td>1 mph</td>
    <td>3 mph</td>
    <td>0.0 mm</td>
    <td>1028.7 hPa, falling</td>
  </tr>
  <tr>
    <td></td>
    <td>1658 GMT</td>
    <td>8.9 &deg;C</td>
    <td>7.9 &deg;C</td>
    <td>86%</td>
    <td>NNE</td>
    <td>1 mph</td>
    <td>4 mph</td>
    <td>0.0 mm</td>
    <td>1026.9 hPa, falling quickly</td>
  </tr>
</table>

我有以下python代碼將所有數據放入行

#!/usr/bin/python
from BeautifulSoup import BeautifulSoup
import urllib2
data = "http://****************/weather_station/data/6hrs.txt"
req = urllib2.Request(data)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)

table = soup.find('table')
for row in table.findAll('tr'):
        col = row.findAll('td')
#       time = col[0].string
#       temp = col[1].string

print col

這就是我卡住的地方。 time = col [0] .string返回的錯誤列表索引超出范圍,表明列表中沒有任何內容,但是如果我打印col,它將顯示我希望提取的數據。

有什么建議么?

以下答案非常適合該表。 我也希望從這樣的表中獲取相同的數據...

table = soup.find('table')
for row in table.findAll('tr')[1:]:
        col = row.findAll('td')
        if len(col) >= 2:
                time = col[0].string
                temp = col[1].string
print time
print temp

使用如下相同的代碼

 table = soup.find('table') for row in table.findAll('tr')[1:]: col = row.findAll('td') if len(col) >= 2: time = col[0].string temp = col[1].string print time print temp 

時間和溫度返回“無”

如果我打印col,所有的值都在那里。 為什么len(col)> = 2不適用於該數據?

您崩潰是因為您嘗試從此tr獲得td:

<tr>
 <th colspan="2" rowspan="2">Date &amp; time</th>
 <th rowspan="2">Temp</th>
 <th rowspan="2">Feels like</th>
 <th rowspan="2">Humidity</th>
 <th colspan="3">Wind</th>
 <th rowspan="2">Rain</th>
 <th rowspan="2">Pressure</th>
</tr>

只需添加如下內容:

col = row.findAll('td')
if len(col) >= 2:
    time = col[0].string
    temp = col[1].string

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM