简体   繁体   English

从JS网站抓取网页

[英]Web scraping from js website

I want to scrape the form data from the https://www.investing.com/commodities/gold-historical-data , but this form generate by js. 我想从https://www.investing.com/commodities/gold-historical-data抓取表单数据,但是此表单是由js生成的。 I tried to imacros to see the action and got this: 我试图伊马克洛斯看动作,得到了这个:

TAG POS=1 TYPE=DIV ATTR=ID:widgetFieldDateRange
    TAG POS=1 TYPE=A ATTR=TXT:20
    TAG POS=2 TYPE=A ATTR=TXT:13
    TAG POS=1 TYPE=A ATTR=ID:applyBtn

Can anyone tell me how to change this to python code which I can use in selenium? 谁能告诉我如何将其更改为我可以在硒中使用的python代码?

It seems like you need a POST request (Ajax). 似乎您需要POST请求(Ajax)。

How did I find that? 我是怎么找到的?

Well, I inspected the XHR from the Network section 好吧,我从“网络”部分检查了XHR

investing_ajax_post

The POST data you need is (replace with the dates you want): 您需要的POST数据是(用所需的日期替换):

curr_id=8830
smlID=300004
st_date=08/09/2017
end_date=08/21/2017
interval_sec=Daily
sort_col=date
sort_ord=DESC
action=historical_data

The IDs from the aobe POST data, probably are for this market only(gold-historical-data), so for others inspect the network again and see POST data everytime. 来自aobe POST数据的ID可能仅适用于该市场(黄金历史数据),因此对于其他人,再次检查网络并每次查看POST数据。

How do you implement this in Python? 您如何在Python中实现呢?

You need a module called requests . 您需要一个称为requests的模块。

Specifically, read this 具体来说,读

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM