简体   繁体   English

需要抓取需要设置“我同意”cookie的网站内容

[英]Need to scrape contents of website that requires an “i agree” cookie to be set

From everything I've read, it seems that this is an impossible. 从我读过的所有内容来看,这似乎是不可能的。 But here is my scenario: 但这是我的情景:

I need to scrape a table's content containing for sale housing information. 我需要抓一张包含待售房屋信息的表格内容。 The page is not password protected or anything, but you first have to click an "I Agree" link on the previous page so that a cookie gets set saying you agree that the content may not be 100% accurate. 该页面没有密码保护或任何内容,但您首先必须单击上一页上的“我同意”链接,以便设置cookie,表示您同意该内容可能不是100%准确。 You are only then shown the data. 然后,您才会显示数据。 Is there any way at all to accomplish this using php/jquery/javascript? 使用php / jquery / javascript有什么方法可以实现这个目的吗? I know you cannot create an iframe because of the fact that it is cross-domain. 我知道你不能创建一个iframe,因为它是跨域的。 I also do not have access to this other website. 我也无权访问其他网站。

Thanks for any answers, as I'm not really expecting anything positive. 谢谢你的回答,因为我并不是真的期待任何积极的事情。 :) And many thanks if you can tell me how to do this. :)非常感谢你能告诉我如何做到这一点。 :D :d

Use server side script (PHP using cURL ) to crawl the website and return the information you need. 使用服务器端脚本(使用cURL的 PHP)来爬网并返回所需的信息。 Make sure you set the appropriate HTTP header with your request that represents the "I agree" cookie. 确保使用代表“我同意”cookie的请求设置适当的HTTP标头。

Sample: 样品:

<?php

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, 'http://www.example.com/');
curl_setopt($ch, CURLOPT_COOKIE, 'I_Agree=1');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$responseBody = curl_exec($ch);

curl_close($ch);

// Read the information you need from $responseBody and return it as response body

?>

Now you can access the information from your website by calling your server side script above. 现在,您可以通过调用上面的服务器端脚本来访问您网站上的信息。 For details about how to use cURL take a look at the documentation . 有关如何使用cURL的详细信息,请查看文档

CURL can store or recall cookies from a file depending on the options you set. CURL可以根据您设置的选项存储或调用文件中的cookie。 Here is the "cookiejar" example: 这是“cookiejar”示例:

http://curl.haxx.se/libcurl/php/examples/cookiejar.html http://curl.haxx.se/libcurl/php/examples/cookiejar.html

Check out the CURLOPT_COOKIEJAR and CURLOPT_COOKIEFILE options 查看CURLOPT_COOKIEJAR和CURLOPT_COOKIEFILE选项

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM