简体   繁体   English

使用漂亮的汤从网络抓取中获取高度数据到列表中

[英]get height data from web scraping into a list using beautiful soup

I want to try get the data from this website http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights using beautiful soup and requests. 我想尝试使用漂亮的汤和请求从此网站http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights获得数据。 Here is my code: 这是我的代码:

import requests
from bs4 import BeautifulSoup

response = requests.get("http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights")
soup = BeautifulSoup(response.text, "html.parser")
list_table_data = soup.find(class_="wikitable").contents
list_tr_data = list_table_data[1::2]
print(list_tr_data)

when you print list_tr_data the output become: 当您打印list_tr_data ,输出将变为:

[<tr>
<th>Index</th><th>Height(Inches)</th><th>Weight(Pounds)
</th></tr>, <tr>
<td>1</td><td>65.78</td><td>112.99
</td></tr>, <tr>
<td>2</td><td>71.52</td><td>136.49
</td></tr>, <tr>
<td>3</td><td>69.40</td><td>153.03
</td></tr>,....,  <tr>
<td>200</td><td>71.39</td><td>127.88
</td></tr>]

I want this Height(Inches) data into a list called list_height_data , but when I trying to access using this code: 我希望将此Height(Inches)数据放入名为list_height_data的列表中,但是当我尝试使用此代码进行访问时:

list_height_data = []
for row in list_tr_data:
    list_height_data.append(row.find_all("tr"))
print(list_height_data)

this cause an empty list: 这导致一个空列表:

[[], [], [], [], [], [], [], [], [], [], ... []]

what should I do to get height(inches) data? 我应该如何获取身高(英寸)数据? If you print list_height_data and print len(list_height_data) should become: 如果您打印list_height_data并打印len(list_height_data)应该变为:

[65.78, 71.52, 69.40, ..., 71.39]
200

You need to iterate over the td tags: 您需要遍历td标签:

import requests
from bs4 import BeautifulSoup as soup
d = soup(requests.get('http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights').text, 'html.parser')
_, *results = [[float(c.text.replace('\n', '')) for c in i.find_all('td')] for i in d.find('table', {'class':'wikitable'}).find_all('tr')]
height = [i[1] for i in results]

Output: 输出:

[65.78, 71.52, 69.4, 68.22, 67.79, 68.7, 69.8, 70.01, 67.9, 66.78, 66.49, 67.62, 68.3, 67.12, 68.28, 71.09, 66.46, 68.65, 71.23, 67.13, 67.83, 68.88, 63.48, 68.42, 67.63, 67.21, 70.84, 67.49, 66.53, 65.44, 69.52, 65.81, 67.82, 70.6, 71.8, 69.21, 66.8, 67.66, 67.81, 64.05, 68.57, 65.18, 69.66, 67.97, 65.98, 68.67, 66.88, 67.7, 69.82, 69.09, 69.91, 67.33, 70.27, 69.1, 65.38, 70.18, 70.41, 66.54, 66.36, 67.54, 66.5, 69.0, 68.3, 67.01, 70.81, 68.22, 69.06, 67.73, 67.22, 67.37, 65.27, 70.84, 69.92, 64.29, 68.25, 66.36, 68.36, 65.48, 69.72, 67.73, 68.64, 66.78, 70.05, 66.28, 69.2, 69.13, 67.36, 70.09, 70.18, 68.23, 68.13, 70.24, 71.49, 69.2, 70.06, 70.56, 66.29, 63.43, 66.77, 68.89, 64.87, 67.09, 68.35, 65.61, 67.76, 68.02, 67.66, 66.31, 69.44, 63.84, 67.72, 70.05, 70.19, 65.95, 70.01, 68.61, 68.81, 69.76, 65.46, 68.83, 65.8, 67.21, 69.42, 68.94, 67.94, 65.63, 66.5, 67.93, 68.89, 70.24, 68.27, 71.23, 69.1, 64.4, 71.1, 68.22, 65.92, 67.44, 73.9, 69.98, 69.52, 65.18, 68.01, 68.34, 65.18, 68.26, 68.57, 64.5, 68.71, 68.89, 69.54, 67.4, 66.48, 66.01, 72.44, 64.13, 70.98, 67.5, 72.02, 65.31, 67.08, 64.39, 69.37, 68.38, 65.31, 67.14, 68.39, 66.29, 67.19, 65.99, 69.43, 67.97, 67.76, 65.28, 73.83, 66.81, 66.89, 65.74, 65.98, 66.58, 67.11, 65.87, 66.78, 68.74, 66.23, 65.96, 68.58, 66.59, 66.97, 68.08, 70.19, 65.52, 67.46, 67.41, 69.66, 65.8, 66.11, 68.24, 68.02, 71.39]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM