繁体   English   中英

使用 BeautifulSoup 解析网页上的表格

[英]Parsing a table on webpage using BeautifulSoup

尝试从网站SGX获取表格。

该页面保存到本地驱动器,我正在使用 BeautifulSoup 来解析它:

soup = BeautifulSoup(open(pages), "lxml")
soup.prettify()

list_0 = soup.find_all('table')[0]
print list_0

它返回的不是页面上的第一行:

[<tr><td>Zhongmin Baihui</td><td>5SR</td><td class="nowrap">09:44 AM</td><td class="nowrap">09:49 AM</td><td>0.615</td><td>0.675</td><td>0.555</td></tr>]

检索此表的正确方法是什么?

谢谢你。

在此处输入图像描述

Data are being fetched after page loads using AJAX request, by inspecting the page you can find the API URL (the Url below), and then you can use something like that:

import pandas as pd
import requests
import json

response = requests.get('https://api.sgx.com/securities/v1.1?excludetypes=bonds&params=nc%2Cadjusted-vwap%2Cb%2Cbv%2Cp%2Cc%2Cchange_vs_pc%2Cchange_vs_pc_percentage%2Ccx%2Ccn%2Cdp%2Cdpc%2Cdu%2Ced%2Cfn%2Ch%2Ciiv%2Ciopv%2Clt%2Cl%2Co%2Cp_%2Cpv%2Cptd%2Cs%2Csv%2Ctrading_time%2Cv_%2Cv%2Cvl%2Cvwap%2Cvwap-currency')
data = json.loads(response.content)["data"]["prices"]
df = pd.DataFrame(data)
print(df)

如果您的要求很复杂并且您的爬网定期完成,我建议使用 scrapy。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM