简体   繁体   English

使用Selenium和Python进行Web爬网:捕获JSON正文

[英]Webscraping using Selenium and Python: Capture JSON body

I'm trying to scrape a webpage that has some AJAX running in the background. 我正在尝试抓取在后台运行一些AJAX的网页。 Using python and Selenium, I've gotten as far as loading the webpage, entering data into a form, clicking submit and waiting. 使用python和Selenium,我已经完成了加载网页,将数据输入表单,单击提交并等待的工作。 At this point I'm trying to catch the JSON-formatted data that's retured, however this article suggests getting the JSON body out isn't possible. 在这一点上,我想抓住多数民众赞成retured的JSON格式的数据,但是条建议得到JSON身体出是不可能的。 I've tried to look into the Selenium code myself to get it to return everything, but I haven't had much luck. 我尝试自己查看Selenium代码以使其返回所有内容,但是我运气不高。 Any one out there who has encountered a similar problem and has a suggestion on how to solve it? 有谁遇到过类似的问题并提出了解决方案的建议? I don't HAVE to use selenium (or python for that matter) Thanks! 我没有使用硒(或python)谢谢!

I do this by looking at the ajax call the website is doing (in the source). 我通过查看网站正在做的ajax调用来做到这一点(在源代码中)。 This is usually a POST (sometimes a GET). 这通常是POST(有时是GET)。 Then I cURL (PHP) or urllib2 (Python) that URL + sending the needed data. 然后,我cURL(PHP)或urllib2(Python)该URL +发送所需的数据。 This returns the body, including the JSON for me. 这将返回正文,包括对我而言的JSON。

In this case you should be able to get the JSON directly. 在这种情况下,您应该可以直接获取JSON。 The JSON is located here . JSON位于此处 You can use firefox with firebug to inspect the XHR requests to find it. 您可以将firefox与firebug一起使用以检查XHR请求以找到它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM