[英]Webscraping using Selenium and Python: Capture JSON body
I'm trying to scrape a webpage that has some AJAX running in the background. 我正在尝试抓取在后台运行一些AJAX的网页。 Using python and Selenium, I've gotten as far as loading the webpage, entering data into a form, clicking submit and waiting. 使用python和Selenium,我已经完成了加载网页,将数据输入表单,单击提交并等待的工作。 At this point I'm trying to catch the JSON-formatted data that's retured, however this article suggests getting the JSON body out isn't possible. 在这一点上,我想抓住多数民众赞成retured的JSON格式的数据,但是这条建议得到JSON身体出是不可能的。 I've tried to look into the Selenium code myself to get it to return everything, but I haven't had much luck. 我尝试自己查看Selenium代码以使其返回所有内容,但是我运气不高。 Any one out there who has encountered a similar problem and has a suggestion on how to solve it? 有谁遇到过类似的问题并提出了解决方案的建议? I don't HAVE to use selenium (or python for that matter) Thanks! 我没有使用硒(或python)谢谢!
I do this by looking at the ajax call the website is doing (in the source). 我通过查看网站正在做的ajax调用来做到这一点(在源代码中)。 This is usually a POST (sometimes a GET). 这通常是POST(有时是GET)。 Then I cURL (PHP) or urllib2 (Python) that URL + sending the needed data. 然后,我cURL(PHP)或urllib2(Python)该URL +发送所需的数据。 This returns the body, including the JSON for me. 这将返回正文,包括对我而言的JSON。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.