簡體   English   中英

使用漂亮的湯從網絡抓取中獲取高度數據到列表中

[英]get height data from web scraping into a list using beautiful soup

我想嘗試使用漂亮的湯和請求從此網站http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights獲得數據。 這是我的代碼:

import requests
from bs4 import BeautifulSoup

response = requests.get("http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights")
soup = BeautifulSoup(response.text, "html.parser")
list_table_data = soup.find(class_="wikitable").contents
list_tr_data = list_table_data[1::2]
print(list_tr_data)

當您打印list_tr_data ,輸出將變為:

[<tr>
<th>Index</th><th>Height(Inches)</th><th>Weight(Pounds)
</th></tr>, <tr>
<td>1</td><td>65.78</td><td>112.99
</td></tr>, <tr>
<td>2</td><td>71.52</td><td>136.49
</td></tr>, <tr>
<td>3</td><td>69.40</td><td>153.03
</td></tr>,....,  <tr>
<td>200</td><td>71.39</td><td>127.88
</td></tr>]

我希望將此Height(Inches)數據放入名為list_height_data的列表中,但是當我嘗試使用此代碼進行訪問時:

list_height_data = []
for row in list_tr_data:
    list_height_data.append(row.find_all("tr"))
print(list_height_data)

這導致一個空列表:

[[], [], [], [], [], [], [], [], [], [], ... []]

我應該如何獲取身高(英寸)數據? 如果您打印list_height_data並打印len(list_height_data)應該變為:

[65.78, 71.52, 69.40, ..., 71.39]
200

您需要遍歷td標簽:

import requests
from bs4 import BeautifulSoup as soup
d = soup(requests.get('http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights').text, 'html.parser')
_, *results = [[float(c.text.replace('\n', '')) for c in i.find_all('td')] for i in d.find('table', {'class':'wikitable'}).find_all('tr')]
height = [i[1] for i in results]

輸出:

[65.78, 71.52, 69.4, 68.22, 67.79, 68.7, 69.8, 70.01, 67.9, 66.78, 66.49, 67.62, 68.3, 67.12, 68.28, 71.09, 66.46, 68.65, 71.23, 67.13, 67.83, 68.88, 63.48, 68.42, 67.63, 67.21, 70.84, 67.49, 66.53, 65.44, 69.52, 65.81, 67.82, 70.6, 71.8, 69.21, 66.8, 67.66, 67.81, 64.05, 68.57, 65.18, 69.66, 67.97, 65.98, 68.67, 66.88, 67.7, 69.82, 69.09, 69.91, 67.33, 70.27, 69.1, 65.38, 70.18, 70.41, 66.54, 66.36, 67.54, 66.5, 69.0, 68.3, 67.01, 70.81, 68.22, 69.06, 67.73, 67.22, 67.37, 65.27, 70.84, 69.92, 64.29, 68.25, 66.36, 68.36, 65.48, 69.72, 67.73, 68.64, 66.78, 70.05, 66.28, 69.2, 69.13, 67.36, 70.09, 70.18, 68.23, 68.13, 70.24, 71.49, 69.2, 70.06, 70.56, 66.29, 63.43, 66.77, 68.89, 64.87, 67.09, 68.35, 65.61, 67.76, 68.02, 67.66, 66.31, 69.44, 63.84, 67.72, 70.05, 70.19, 65.95, 70.01, 68.61, 68.81, 69.76, 65.46, 68.83, 65.8, 67.21, 69.42, 68.94, 67.94, 65.63, 66.5, 67.93, 68.89, 70.24, 68.27, 71.23, 69.1, 64.4, 71.1, 68.22, 65.92, 67.44, 73.9, 69.98, 69.52, 65.18, 68.01, 68.34, 65.18, 68.26, 68.57, 64.5, 68.71, 68.89, 69.54, 67.4, 66.48, 66.01, 72.44, 64.13, 70.98, 67.5, 72.02, 65.31, 67.08, 64.39, 69.37, 68.38, 65.31, 67.14, 68.39, 66.29, 67.19, 65.99, 69.43, 67.97, 67.76, 65.28, 73.83, 66.81, 66.89, 65.74, 65.98, 66.58, 67.11, 65.87, 66.78, 68.74, 66.23, 65.96, 68.58, 66.59, 66.97, 68.08, 70.19, 65.52, 67.46, 67.41, 69.66, 65.8, 66.11, 68.24, 68.02, 71.39]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM