[英]Curl scraping of Google doesn't work in production
我正在使用Curl和PHP進行一個小項目,以刮取Google Scholar的結果。 在我的開發模式下它可以正常工作,但是當我在生產模式下嘗試時,有些東西不起作用,也沒有結果...
這是我的代碼:
// SCRAPING GOOGLE SCHOLAR
if (isset($_POST['google'])){
$googleURL = 'http://scholar.google.com/scholar?hl=fr&q=' . $url_subject;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $googleURL);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_USERAGENT, $random->random_user_agent());
$result = curl_exec ($ch);
curl_close($ch);
$html = $this->container->get('simple_html_dom');
$html->load($result);
謝謝您的幫助
Google學術搜索不願抓取其內容。 這違反了他們的服務條款。 命令行curl可幫助解決此類問題:
$ curl -vv https://scholar.google.com/scholar?hl=en&q=neurotransmitters
> GET /scholar?hl=en HTTP/1.1
> User-Agent: curl/7.35.0
> Host: scholar.google.com
> Accept: */*
>
< HTTP/1.1 403 Forbidden
...
<html>...<title>Sorry...</title></head><body>
<h1>We're sorry...</h1>
<p>... but your computer or network may be sending automated queries.
To protect our users, we can't process your request right now.</p>
<div style="margin-left: 4em;">See
<a href="https://support.google.com/websearch/answer/86640">Google Help</a>
for more information.</div>
</body></html>
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.