简体繁体 English

使用Selenium和Python进行Web爬网：捕获JSON正文

[英]Webscraping using Selenium and Python: Capture JSON body

原文 2013-03-24 09:06:49 2 2 python/ ajax/ json/ selenium/ web-scraping

I'm trying to scrape a webpage that has some AJAX running in the background. 我正在尝试抓取在后台运行一些AJAX的网页。 Using python and Selenium, I've gotten as far as loading the webpage, entering data into a form, clicking submit and waiting. 使用python和Selenium，我已经完成了加载网页，将数据输入表单，单击提交并等待的工作。 At this point I'm trying to catch the JSON-formatted data that's retured, however this article suggests getting the JSON body out isn't possible. 在这一点上，我想抓住多数民众赞成retured的JSON格式的数据，但是这条建议得到JSON身体出是不可能的。 I've tried to look into the Selenium code myself to get it to return everything, but I haven't had much luck. 我尝试自己查看Selenium代码以使其返回所有内容，但是我运气不高。 Any one out there who has encountered a similar problem and has a suggestion on how to solve it? 有谁遇到过类似的问题并提出了解决方案的建议？ I don't HAVE to use selenium (or python for that matter) Thanks! 我没有使用硒（或python）谢谢！

2 个解决方案

I do this by looking at the ajax call the website is doing (in the source). 我通过查看网站正在做的ajax调用来做到这一点（在源代码中）。 This is usually a POST (sometimes a GET). 这通常是POST（有时是GET）。 Then I cURL (PHP) or urllib2 (Python) that URL + sending the needed data. 然后，我cURL（PHP）或urllib2（Python）该URL +发送所需的数据。 This returns the body, including the JSON for me. 这将返回正文，包括对我而言的JSON。

In this case you should be able to get the JSON directly. 在这种情况下，您应该可以直接获取JSON。 The JSON is located here . JSON位于此处。 You can use firefox with firebug to inspect the XHR requests to find it. 您可以将firefox与firebug一起使用以检查XHR请求以找到它。