Python：从Google图片搜索下载图片的正确URL

Question

我正在尝试从Google Image搜索中获取特定查询的图像。 但我下载的页面没有图片，它将我重定向到谷歌的原始页面。 这是我的代码：

AGENT_ID   = "Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"

GOOGLE_URL = "https://www.google.com/images?source=hp&q={0}"

_myGooglePage = ""

def scrape(self, theQuery) :
    self._myGooglePage = subprocess.check_output(["curl", "-L", "-A", self.AGENT_ID, self.GOOGLE_URL.format(urllib.quote(theQuery))], stderr=subprocess.STDOUT)
    print self.GOOGLE_URL.format(urllib.quote(theQuery))
    print self._myGooglePage
    f = open('./../../googleimages.html', 'w')
    f.write(self._myGooglePage)

我究竟做错了什么？

谢谢

Answer 1

这是我用来从Google搜索和下载图像的Python代码，希望它有所帮助：

import os
import sys
import time
from urllib import FancyURLopener
import urllib2
import simplejson

# Define search term
searchTerm = "hello world"

# Replace spaces ' ' in search term for '%20' in order to comply with request
searchTerm = searchTerm.replace(' ','%20')


# Start FancyURLopener with defined version 
class MyOpener(FancyURLopener): 
    version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
myopener = MyOpener()

# Set count to 0
count= 0

for i in range(0,10):
    # Notice that the start changes for each iteration in order to request a new set of images for each loop
    url = ('https://ajax.googleapis.com/ajax/services/search/images?' + 'v=1.0&q='+searchTerm+'&start='+str(i*4)+'&userip=MyIP')
    print url
    request = urllib2.Request(url, None, {'Referer': 'testing'})
    response = urllib2.urlopen(request)

    # Get results using JSON
    results = simplejson.load(response)
    data = results['responseData']
    dataInfo = data['results']

    # Iterate for each result and get unescaped url
    for myUrl in dataInfo:
        count = count + 1
        print myUrl['unescapedUrl']

        myopener.retrieve(myUrl['unescapedUrl'],str(count)+'.jpg')

    # Sleep for one second to prevent IP blocking from Google
    time.sleep(1)

您还可以在此处找到非常有用的信息。

Answer 2

这是我写的一个简短的剧本，完成了整个行动。

Answer 3

我会给你一个提示......从这里开始：

https://ajax.googleapis.com/ajax/services/search/images?v=1.0&q=JULIE%20NEWMAR

JULIE和NEWMAR是您的搜索字词。

这将返回你需要的json数据...你需要使用json.load或simplejson.load来解析它以获取dict ...然后潜入它以找到responseData ，然后是结果列表包含您将要下载其网址的各个项目。

虽然我没有建议以任何方式对谷歌进行自动抓取，因为他们（不赞成使用）的API明确表示不会这样做。

Answer 4

我只是想回答这个问题，即使它已经过时了。 有一个更简单的方法去做这件事。

def google_image(x):
        search = x.split()
        search = '%20'.join(map(str, search))
        url = 'http://ajax.googleapis.com/ajax/services/search/images?v=1.0&q=%s&safe=off' % search
        search_results = urllib.request.urlopen(url)
        js = json.loads(search_results.read().decode())
        results = js['responseData']['results']
        for i in results: rest = i['unescapedUrl']
        return rest

这就对了。

Answer 5

最好的方法之一是使用icrawler 。 检查以下答案。 它对我有用。

https://stackoverflow.com/a/51204611/4198099

Python：从Google图片搜索下载图片的正确URL

问题描述

5 个解决方案

解决方案1
6 2012-11-24 07:33:12

解决方案2
3 2012-05-27 23:29:36

解决方案3
3 已采纳 2012-02-17 00:06:24

解决方案4
0 2013-09-11 19:26:54

解决方案5
0 2018-07-06 07:10:43

Python：从Google图片搜索下载图片的正确URL

问题描述

5 个解决方案

解决方案1 6 2012-11-24 07:33:12

解决方案2 3 2012-05-27 23:29:36

解决方案3 3 已采纳 2012-02-17 00:06:24

解决方案4 0 2013-09-11 19:26:54

解决方案5 0 2018-07-06 07:10:43

解决方案1
6 2012-11-24 07:33:12

解决方案2
3 2012-05-27 23:29:36

解决方案3
3 已采纳 2012-02-17 00:06:24

解决方案4
0 2013-09-11 19:26:54

解决方案5
0 2018-07-06 07:10:43