[英]python: How can I download data from the webpage where the link is hidden by the download button?
Suppose I want to download data here: http://www.dce.com.cn/publicweb/quotesdata/memberDealPosiQuotes.html 假设我要在此处下载数据: http : //www.dce.com.cn/publicweb/quotesdata/memberDealPosiQuotes.html
When click the button shown below, I got a .csv
file: 单击下面显示的按钮时,我得到了一个.csv
文件:
I want to do this automatically using python where I can specify the date etc. 我想使用python自动执行此操作,可以在其中指定日期等。
I find here that one can use pandas pd.read_csv
to read data from webpage, but first one need to get the right url. 我在这里发现可以使用pandas pd.read_csv
从网页读取数据,但是第一个需要获取正确的url。 However in my case I don't know what the url is. 但是就我而言,我不知道URL是什么。
Besides, I also want to specify the date and the contract etc. myself. 此外,我也想自己指定日期和合同等。
Before asking, I actually tried to the dev tool, I still can't see the url, and I don't know how to make it programatic. 在询问之前,我实际上尝试使用dev工具,但仍然看不到url,而且我也不知道如何使其具有程序性。
The javascript exportData('excel')
results in a form that is submitted. javascript exportData('excel')
生成提交的表单。 By using Chrome devtools and the Network panel, you can figure out the headers and the post data used, and then write a python script to submit an identical http request. 通过使用Chrome devtools和“ 网络”面板,您可以找出所使用的标题和帖子数据,然后编写python脚本来提交相同的http请求。
import requests
url = 'http://www.dce.com.cn/publicweb/quotesdata/exportMemberDealPosiQuotesData.html'
formdata = {
'memberDealPosiQuotes.variety':'a',
'memberDealPosiQuotes.trade_type':0,
'contract.contract_id':'all',
'contract.variety_id':'a',
'exportFlag':'excel',
}
response = requests.post(url, data=formdata)
filename = response.headers.get('Content-Disposition').split('=')[-1]
with open(filename, 'wb') as fp:
fp.write(response.content)
It's probably possible to find ways to modify the post data to fetch different data. 可能可以找到修改帖子数据以获取不同数据的方法。 Either by reverse engineering, by trial and error or by finding some documentation. 通过逆向工程,反复试验或查找一些文档。
For example, you can include fields for year and date: 例如,您可以包括年份和日期的字段:
'year':2017,
'month':3,
'day':20
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.