"Python - 從谷歌圖片搜索下載圖片？"

Question

我想使用 python 下載谷歌圖像搜索的所有圖像。 我使用的代碼有時似乎有問題。我的代碼是

import os
import sys
import time
from urllib import FancyURLopener
import urllib2
import simplejson

# Define search term
searchTerm = "parrot"

# Replace spaces ' ' in search term for '%20' in order to comply with request
searchTerm = searchTerm.replace(' ','%20')


# Start FancyURLopener with defined version 
class MyOpener(FancyURLopener): 
    version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127     Firefox/2.0.0.11'
    myopener = MyOpener()

    # Set count to 0
    count= 0

    for i in range(0,10):
    # Notice that the start changes for each iteration in order to request a new set of   images for each loop
    url = ('https://ajax.googleapis.com/ajax/services/search/images?' + 'v=1.0& q='+searchTerm+'&start='+str(i*10)+'&userip=MyIP')
    print url
    request = urllib2.Request(url, None, {'Referer': 'testing'})
    response = urllib2.urlopen(request)

# Get results using JSON
    results = simplejson.load(response)
    data = results['responseData']
    dataInfo = data['results']

# Iterate for each result and get unescaped url
    for myUrl in dataInfo:
        count = count + 1
        my_url = myUrl['unescapedUrl']
        myopener.retrieve(myUrl['unescapedUrl'],str(count)+'.jpg')

下載幾頁后，我收到如下錯誤：

回溯（最近一次通話最后）：

  File "C:\Python27\img_google3.py", line 37, in <module>
    dataInfo = data['results']
TypeError: 'NoneType' object has no attribute '__getitem__'

該怎么辦？？？？？？

Answer 1

我已經修改了我的代碼。 現在代碼可以為給定的查詢下載 100 張圖像，並且圖像是全高分辨率的，即正在下載原始圖像。

我正在使用 urllib2 和 Beautiful soup 下載圖像

from bs4 import BeautifulSoup
import requests
import re
import urllib2
import os
import cookielib
import json

def get_soup(url,header):
    return BeautifulSoup(urllib2.urlopen(urllib2.Request(url,headers=header)),'html.parser')


query = raw_input("query image")# you can change the query for the image  here
image_type="ActiOn"
query= query.split()
query='+'.join(query)
url="https://www.google.co.in/search?q="+query+"&source=lnms&tbm=isch"
print url
#add the directory for your image here
DIR="Pictures"
header={'User-Agent':"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
}
soup = get_soup(url,header)


ActualImages=[]# contains the link for Large original images, type of  image
for a in soup.find_all("div",{"class":"rg_meta"}):
    link , Type =json.loads(a.text)["ou"]  ,json.loads(a.text)["ity"]
    ActualImages.append((link,Type))

print  "there are total" , len(ActualImages),"images"

if not os.path.exists(DIR):
            os.mkdir(DIR)
DIR = os.path.join(DIR, query.split()[0])

if not os.path.exists(DIR):
            os.mkdir(DIR)
###print images
for i , (img , Type) in enumerate( ActualImages):
    try:
        req = urllib2.Request(img, headers={'User-Agent' : header})
        raw_img = urllib2.urlopen(req).read()

        cntr = len([i for i in os.listdir(DIR) if image_type in i]) + 1
        print cntr
        if len(Type)==0:
            f = open(os.path.join(DIR , image_type + "_"+ str(cntr)+".jpg"), 'wb')
        else :
            f = open(os.path.join(DIR , image_type + "_"+ str(cntr)+"."+Type), 'wb')


        f.write(raw_img)
        f.close()
    except Exception as e:
        print "could not load : "+img
        print e

我希望這對你有幫助

Answer 2

Google Image Search API 已棄用，您需要使用Google 自定義搜索來實現您想要實現的目標。 要獲取圖像，您需要執行以下操作：

import urllib2
import simplejson
import cStringIO

fetcher = urllib2.build_opener()
searchTerm = 'parrot'
startIndex = 0
searchUrl = "http://ajax.googleapis.com/ajax/services/search/images?v=1.0&q=" + searchTerm + "&start=" + startIndex
f = fetcher.open(searchUrl)
deserialized_output = simplejson.load(f)

這將為您提供 4 個結果，作為 JSON，您需要通過增加 API 請求中的startIndex來迭代獲取結果。

要獲取圖像，您需要使用像cStringIO這樣的庫。

例如，要訪問第一張圖像，您需要執行以下操作：

imageUrl = deserialized_output['responseData']['results'][0]['unescapedUrl']
file = cStringIO.StringIO(urllib.urlopen(imageUrl).read())
img = Image.open(file)

Answer 3

這是我最新的 google image snarfer，用 Python 編寫，使用 Selenium 和無頭 Chrome。

它需要python-selenium 、 chromium-driver和一個名為retry from pip 的模塊。

鏈接： http : //sam.aiki.info/b/google-images.py

示例用法：

google-images.py tiger 10 --opts isz:lt,islt:svga,itp:photo > urls.txt
parallel=5
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36"
(i=0; while read url; do wget -e robots=off -T10 --tries 10 -U"$user_agent" "$url" -O`printf %04d $i`.jpg & i=$(($i+1)) ; [ $(($i % $parallel)) = 0 ] && wait; done < urls.txt; wait)

幫助用法：

$ google-images.py --help
usage: google-images.py [-h] [--safe SAFE] [--opts OPTS] query n

Fetch image URLs from Google Image Search.

positional arguments:
  query        image search query
  n            number of images (approx)

optional arguments:
  -h, --help   show this help message and exit
  --safe SAFE  safe search [off|active|images]
  --opts OPTS  search options, e.g.
               isz:lt,islt:svga,itp:photo,ic:color,ift:jpg

代碼：

#!/usr/bin/env python3

# requires: selenium, chromium-driver, retry

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import selenium.common.exceptions as sel_ex
import sys
import time
import urllib.parse
from retry import retry
import argparse
import logging

logging.basicConfig(stream=sys.stderr, level=logging.INFO)
logger = logging.getLogger()
retry_logger = None

css_thumbnail = "img.Q4LuWd"
css_large = "img.n3VNCb"
css_load_more = ".mye4qd"
selenium_exceptions = (sel_ex.ElementClickInterceptedException, sel_ex.ElementNotInteractableException, sel_ex.StaleElementReferenceException)

def scroll_to_end(wd):
    wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")

@retry(exceptions=KeyError, tries=6, delay=0.1, backoff=2, logger=retry_logger)
def get_thumbnails(wd, want_more_than=0):
    wd.execute_script("document.querySelector('{}').click();".format(css_load_more))
    thumbnails = wd.find_elements_by_css_selector(css_thumbnail)
    n_results = len(thumbnails)
    if n_results <= want_more_than:
        raise KeyError("no new thumbnails")
    return thumbnails

@retry(exceptions=KeyError, tries=6, delay=0.1, backoff=2, logger=retry_logger)
def get_image_src(wd):
    actual_images = wd.find_elements_by_css_selector(css_large)
    sources = []
    for img in actual_images:
        src = img.get_attribute("src")
        if src.startswith("http") and not src.startswith("https://encrypted-tbn0.gstatic.com/"):
            sources.append(src)
    if not len(sources):
        raise KeyError("no large image")
    return sources

@retry(exceptions=selenium_exceptions, tries=6, delay=0.1, backoff=2, logger=retry_logger)
def retry_click(el):
    el.click()

def get_images(wd, start=0, n=20, out=None):
    thumbnails = []
    count = len(thumbnails)
    while count < n:
        scroll_to_end(wd)
        try:
            thumbnails = get_thumbnails(wd, want_more_than=count)
        except KeyError as e:
            logger.warning("cannot load enough thumbnails")
            break
        count = len(thumbnails)
    sources = []
    for tn in thumbnails:
        try:
            retry_click(tn)
        except selenium_exceptions as e:
            logger.warning("main image click failed")
            continue
        sources1 = []
        try:
            sources1 = get_image_src(wd)
        except KeyError as e:
            pass
            # logger.warning("main image not found")
        if not sources1:
            tn_src = tn.get_attribute("src")
            if not tn_src.startswith("data"):
                logger.warning("no src found for main image, using thumbnail")          
                sources1 = [tn_src]
            else:
                logger.warning("no src found for main image, thumbnail is a data URL")
        for src in sources1:
            if not src in sources:
                sources.append(src)
                if out:
                    print(src, file=out)
                    out.flush()
        if len(sources) >= n:
            break
    return sources

def google_image_search(wd, query, safe="off", n=20, opts='', out=None):
    search_url_t = "https://www.google.com/search?safe={safe}&site=&tbm=isch&source=hp&q={q}&oq={q}&gs_l=img&tbs={opts}"
    search_url = search_url_t.format(q=urllib.parse.quote(query), opts=urllib.parse.quote(opts), safe=safe)
    wd.get(search_url)
    sources = get_images(wd, n=n, out=out)
    return sources

def main():
    parser = argparse.ArgumentParser(description='Fetch image URLs from Google Image Search.')
    parser.add_argument('--safe', type=str, default="off", help='safe search [off|active|images]')
    parser.add_argument('--opts', type=str, default="", help='search options, e.g. isz:lt,islt:svga,itp:photo,ic:color,ift:jpg')
    parser.add_argument('query', type=str, help='image search query')
    parser.add_argument('n', type=int, default=20, help='number of images (approx)')
    args = parser.parse_args()

    opts = Options()
    opts.add_argument("--headless")
    # opts.add_argument("--blink-settings=imagesEnabled=false")
    with webdriver.Chrome(options=opts) as wd:
        sources = google_image_search(wd, args.query, safe=args.safe, n=args.n, opts=args.opts, out=sys.stdout)

main()

Answer 4

Google 棄用了他們的 API，抓取 Google 很復雜，所以我建議改用 Bing API：

https://datamarket.azure.com/dataset/5BA839F1-12CE-4CCE-BF57-A49D98D29A44

谷歌沒那么好，微軟也沒那么邪惡

Answer 5

尚未查看您的代碼，但這是一個使用 selenium 制作的示例解決方案，嘗試從搜索詞中獲取 400 張圖片

# -*- coding: utf-8 -*-
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import json
import os
import urllib2

searchterm = 'vannmelon' # will also be the name of the folder
url = "https://www.google.co.in/search?q="+searchterm+"&source=lnms&tbm=isch"
browser = webdriver.Firefox()
browser.get(url)
header={'User-Agent':"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"}
counter = 0
succounter = 0

if not os.path.exists(searchterm):
    os.mkdir(searchterm)

for _ in range(500):
    browser.execute_script("window.scrollBy(0,10000)")

for x in browser.find_elements_by_xpath("//div[@class='rg_meta']"):
    counter = counter + 1
    print "Total Count:", counter
    print "Succsessful Count:", succounter
    print "URL:",json.loads(x.get_attribute('innerHTML'))["ou"]

    img = json.loads(x.get_attribute('innerHTML'))["ou"]
    imgtype = json.loads(x.get_attribute('innerHTML'))["ity"]
    try:
        req = urllib2.Request(img, headers={'User-Agent': header})
        raw_img = urllib2.urlopen(req).read()
        File = open(os.path.join(searchterm , searchterm + "_" + str(counter) + "." + imgtype), "wb")
        File.write(raw_img)
        File.close()
        succounter = succounter + 1
    except:
            print "can't get img"

print succounter, "pictures succesfully downloaded"
browser.close()

Answer 6

添加到Piees 的答案，要從搜索結果中下載任意數量的圖像，我們需要在加載前 400 個結果后模擬單擊“顯示更多結果”按鈕。

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import os
import json
import urllib2
import sys
import time

# adding path to geckodriver to the OS environment variable
# assuming that it is stored at the same path as this script
os.environ["PATH"] += os.pathsep + os.getcwd()
download_path = "dataset/"

def main():
    searchtext = sys.argv[1] # the search query
    num_requested = int(sys.argv[2]) # number of images to download
    number_of_scrolls = num_requested / 400 + 1 
    # number_of_scrolls * 400 images will be opened in the browser

    if not os.path.exists(download_path + searchtext.replace(" ", "_")):
        os.makedirs(download_path + searchtext.replace(" ", "_"))

    url = "https://www.google.co.in/search?q="+searchtext+"&source=lnms&tbm=isch"
    driver = webdriver.Firefox()
    driver.get(url)

    headers = {}
    headers['User-Agent'] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
    extensions = {"jpg", "jpeg", "png", "gif"}
    img_count = 0
    downloaded_img_count = 0

    for _ in xrange(number_of_scrolls):
        for __ in xrange(10):
            # multiple scrolls needed to show all 400 images
            driver.execute_script("window.scrollBy(0, 1000000)")
            time.sleep(0.2)
        # to load next 400 images
        time.sleep(0.5)
        try:
            driver.find_element_by_xpath("//input[@value='Show more results']").click()
        except Exception as e:
            print "Less images found:", e
            break

    # imges = driver.find_elements_by_xpath('//div[@class="rg_meta"]') # not working anymore
    imges = driver.find_elements_by_xpath('//div[contains(@class,"rg_meta")]')
    print "Total images:", len(imges), "\n"
    for img in imges:
        img_count += 1
        img_url = json.loads(img.get_attribute('innerHTML'))["ou"]
        img_type = json.loads(img.get_attribute('innerHTML'))["ity"]
        print "Downloading image", img_count, ": ", img_url
        try:
            if img_type not in extensions:
                img_type = "jpg"
            req = urllib2.Request(img_url, headers=headers)
            raw_img = urllib2.urlopen(req).read()
            f = open(download_path+searchtext.replace(" ", "_")+"/"+str(downloaded_img_count)+"."+img_type, "wb")
            f.write(raw_img)
            f.close
            downloaded_img_count += 1
        except Exception as e:
            print "Download failed:", e
        finally:
            print
        if downloaded_img_count >= num_requested:
            break

    print "Total downloaded: ", downloaded_img_count, "/", img_count
    driver.quit()

if __name__ == "__main__":
    main()

完整代碼在這里。

Answer 7

您還可以將 Selenium 與 Python 結合使用。 方法如下：

from selenium import webdriver
import urllib
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome('C:/Python27/Scripts/chromedriver.exe')
word="apple"
url="http://images.google.com/search?q="+word+"&tbm=isch&sout=1"
driver.get(url)
imageXpathSelector='//*[@id="ires"]/table/tbody/tr[1]/td[1]/a/img'
img=driver.find_element_by_xpath(imageXpathSelector)
src=(img.get_attribute('src'))
urllib.urlretrieve(src, word+".jpg")
driver.close()

（此代碼適用於 Python 2.7）請注意，您應該使用 ' pip install selenium ' 安裝 Selenium 包，並且您應該從這里下載 chromedriver.exe

與其他網絡抓取技術相反，Selenium 會打開瀏覽器並下載項目，因為 Selenium 的任務是測試而不是抓取。

Answer 8

這個是因為其他代碼片段已經過時並且不再適合我。 為每個關鍵字下載 100 張圖片，靈感來自上述解決方案之一。

from bs4 import BeautifulSoup
import urllib2
import os


class GoogleeImageDownloader(object):
    _URL = "https://www.google.co.in/search?q={}&source=lnms&tbm=isch"
    _BASE_DIR = 'GoogleImages'
    _HEADERS = {
        'User-Agent':"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
    }

    def __init__(self):
        query = raw_input("Enter keyword to search images\n")
        self.dir_name = os.path.join(self._BASE_DIR, query.split()[0])
        self.url = self._URL.format(urllib2.quote(query)) 
        self.make_dir_for_downloads()
        self.initiate_downloads()

    def make_dir_for_downloads(self):
        print "Creating necessary directories"
        if not os.path.exists(self._BASE_DIR):
            os.mkdir(self._BASE_DIR)

        if not os.path.exists(self.dir_name):
            os.mkdir(self.dir_name)

    def initiate_downloads(self):
        src_list = []
        soup = BeautifulSoup(urllib2.urlopen(urllib2.Request(self.url,headers=self._HEADERS)),'html.parser')
        for img in soup.find_all('img'):
            if img.has_attr("data-src"):
                src_list.append(img['data-src'])
        print "{} of images collected for downloads".format(len(src_list))
        self.save_images(src_list)

    def save_images(self, src_list):
        print "Saving Images..."
        for i , src in enumerate(src_list):
            try:
                req = urllib2.Request(src, headers=self._HEADERS)
                raw_img = urllib2.urlopen(req).read()
                with open(os.path.join(self.dir_name , str(i)+".jpg"), 'wb') as f:
                    f.write(raw_img)
            except Exception as e:
                print ("could not save image")
                raise e


if __name__ == "__main__":
    GoogleeImageDownloader()

Answer 9

我知道這個問題很舊，但我最近遇到了它，以前的答案都不再有效。 所以我寫了這個腳本來從谷歌收集圖像。 截至目前，它可以下載盡可能多的可用圖像。

這里還有一個 github 鏈接https://github.com/CumminUp07/imengine/blob/master/get_google_images.py

免責聲明：由於版權問題，收集的圖像只能用於研究和教育目的

from bs4 import BeautifulSoup as Soup
import urllib2
import json
import urllib

#programtically go through google image ajax json return and save links to list#
#num_images is more of a suggestion                                            #  
#it will get the ceiling of the nearest 100 if available                       #
def get_links(query_string, num_images):
    #initialize place for links
    links = []
    #step by 100 because each return gives up to 100 links
    for i in range(0,num_images,100):
        url = 'https://www.google.com/search?ei=1m7NWePfFYaGmQG51q7IBg&hl=en&q='+query_string+'\
        &tbm=isch&ved=0ahUKEwjjovnD7sjWAhUGQyYKHTmrC2kQuT0I7gEoAQ&start='+str(i)+'\
        &yv=2&vet=10ahUKEwjjovnD7sjWAhUGQyYKHTmrC2kQuT0I7gEoAQ.1m7NWePfFYaGmQG51q7IBg.i&ijn=1&asearch=ichunk&async=_id:rg_s,_pms:s'

        #set user agent to avoid 403 error
        request = urllib2.Request(url, None, {'User-Agent': 'Mozilla/5.0'}) 

        #returns json formatted string of the html
        json_string = urllib2.urlopen(request).read() 

        #parse as json
        page = json.loads(json_string) 

        #html found here
        html = page[1][1] 

        #use BeautifulSoup to parse as html
        new_soup = Soup(html,'lxml')

        #all img tags, only returns results of search
        imgs = new_soup.find_all('img')

        #loop through images and put src in links list
        for j in range(len(imgs)):
            links.append(imgs[j]["src"])

    return links

#download images                              #
#takes list of links, directory to save to    # 
#and prefix for file names                    #
#saves images in directory as a one up number #
#with prefix added                            #
#all images will be .jpg                      #
def get_images(links,directory,pre):
    for i in range(len(links)):
        urllib.urlretrieve(links[i], "./"+directory+"/"+str(pre)+str(i)+".jpg")

#main function to search images                 #
#takes two lists, base term and secondary terms #
#also takes number of images to download per    #
#combination                                    #
#it runs every combination of search terms      #
#with base term first then secondary            #
def search_images(base,terms,num_images):
    for y in range(len(base)):
        for x in range(len(terms)):
            all_links = get_links(base[y]+'+'+terms[x],num_images)
            get_images(all_links,"images",x)

if __name__ == '__main__':
    terms = ["cars","numbers","scenery","people","dogs","cats","animals"]
    base = ["animated"]
    search_images(base,terms,1000)

Answer 10

不要使用谷歌圖片搜索，嘗試其他圖片搜索，如ecosia或bing 。

這是一個用於從ecosia搜索引擎檢索圖像的示例代碼。

from bs4 import BeautifulSoup
import requests
import urllib

user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
headers = {'User-Agent':user_agent} 
urls = ["https://www.ecosia.org/images?q=india%20pan%20card%20example"]
#The url's from which the image is to be extracted.
index = 0

for url in urls:
    request = urllib.request.Request(url,None,headers) #The assembled request
    response = urllib.request.urlopen(request)
    data = response.read() # Read the html result page

    soup = BeautifulSoup(data, 'html.parser')
    
    for link in soup.find_all('img'):   
        #The images are enclosed in 'img' tag and the 'src' contains the url of the image.
        img_url = link.get('src')
        dest = str(index) + ".jpg"  #Destination to store the image.
        try:
            urllib.request.urlretrieve(img_url)
            index += 1
        except:
            continue

該代碼適用於谷歌圖像搜索，但無法檢索圖像，因為谷歌以加密格式存儲圖像，很難從圖像 url 中檢索。

解決方案自 2021 年 2 月 1 日起生效。

Answer 11

您可以使用上面提到的其他人使用selenium來實現這一點。
或者，您可以嘗試使用來自 SerpApi 的Google Images API 。 看看操場。

代碼和示例。 下載圖像的功能取自這個答案：

import os, time, shutil, httpx, asyncio
from urllib.parse import urlparse
from serpapi import GoogleSearch

# https://stackoverflow.com/a/39217788/1291371
async def download_file(url):
    print(f'Downloading {url}')

    # https://stackoverflow.com/a/18727481/1291371
    parsed_url = urlparse(url)
    local_filename = os.path.basename(parsed_url.path)

    os.makedirs('images', exist_ok=True)

    async with httpx.AsyncClient() as client:
        async with client.stream('GET', url) as response:
            async with open(f'images/{local_filename}', 'wb') as f:
                await asyncio.to_thread(shutil.copyfileobj, response.raw, f)

    return local_filename

async def main():
    start = time.perf_counter()

    params = {
        "engine": "google",
        "ijn": "0",
        "q": "lasagna",
        "tbm": "isch",
        "api_key": os.getenv("API_KEY"),
    }

    search = GoogleSearch(params)
    results = search.get_dict()

    download_files_tasks = [
        download_file(image['original']) for image in results['images_results']
    ]

    await asyncio.gather(*download_files_tasks, return_exceptions=True)

    print(
        f"Downloaded {len(download_files_tasks)} images in {time.perf_counter() - start:0.4f} seconds")

asyncio.run(main())

免責聲明，我為 SerpApi 工作。

Answer 12

好的，所以不是從你那里編碼，我會告訴你你做錯了什么，它可能會引導你走向正確的方向。 通常，大多數現代網站通過 javascript 動態呈現 html，因此如果您只是發送 GET 請求（使用 urllib/CURL/fetch/axios），您將無法獲得通常在瀏覽器中看到的相同 URL/網址的內容。您需要的是呈現 javascript 代碼以創建您在瀏覽器上看到的相同 HTML/網頁的東西，您可以使用諸如 selenium gecko 驅動程序之類的東西 for firefox 來執行此操作，並且那里有 python 模塊可以讓您執行此操作。

我希望這會有所幫助，如果你仍然感到迷茫，這里有一個簡單的腳本，我寫了一段時間來從你的谷歌照片中提取類似的東西

from selenium import webdriver
import re
url="https://photos.app.goo.gl/xxxxxxx"
driver = webdriver.Firefox()
driver.get(url)
regPrms="^background-image\:url\(.*\)$"
regPrms="^The.*Spain$"
html = driver.page_source

urls=re.findall("(?P<url>https?://[^\s\"$]+)", html)

fin=[]
for url in urls:
        if "video-downloads" in url:
            fin.append(url)
print("The Following ZIP contains all your pictures")
for url in fin:
        print("-------------------")
        print(url)

Answer 13

這在 Windows 10、Python 3.9.7 中對我有用：

pip 安裝 bing-image-downloader

下面的代碼從 Bing 搜索引擎下載10 張印度圖像到所需的輸出文件夾：

from bing_image_downloader import downloader
downloader.download('India', limit=10,  output_dir='dataset', adult_filter_off=True, force_replace=False, timeout=60, verbose=True)

文檔： https : //pypi.org/project/bing-image-downloader/

Answer 14

我使用的是：

https:\/\/github.com\/hick\/icrawler<\/a>

這個包是一個網絡爬蟲的迷你框架。 采用模塊化設計，易於使用和擴展。 它很好地支持圖像和視頻等媒體數據，也可以應用於文本和其他類型的文件。 Scrapy 沉重而強大，而 irawler 小巧靈活。

def main():
    parser = ArgumentParser(description='Test built-in crawlers')
    parser.add_argument(
        '--crawler',
        nargs='+',
        default=['google', 'bing', 'baidu', 'flickr', 'greedy', 'urllist'],
        help='which crawlers to test')
    args = parser.parse_args()
    for crawler in args.crawler:
        eval('test_{}()'.format(crawler))
        print('\n')

"Python - 從谷歌圖片搜索下載圖片？"

問題描述

14 個解決方案

解決方案1
46 2015-02-12 20:49:59

解決方案2
25 2013-12-21 08:20:24

解決方案3
7 2020-05-24 06:29:11

解決方案4
5 2016-03-09 15:34:40

解決方案5
3 2016-09-27 21:56:13

解決方案6
3 2017-04-17 14:53:49

解決方案7
2 2017-04-04 19:26:25

解決方案8
2 2020-12-02 15:29:57

解決方案9
0 2017-09-29 12:57:34

解決方案10
0 2021-02-01 09:15:14

解決方案11
0 2021-04-06 17:07:43

解決方案12
0 2021-04-06 17:16:52

解決方案13
0 2022-01-03 11:42:23

解決方案14
0 2022-01-31 16:36:19

"Python - 從谷歌圖片搜索下載圖片？"

問題描述

14 個解決方案

解決方案1 46 2015-02-12 20:49:59

解決方案2 25 2013-12-21 08:20:24

解決方案3 7 2020-05-24 06:29:11

解決方案4 5 2016-03-09 15:34:40

解決方案5 3 2016-09-27 21:56:13

解決方案6 3 2017-04-17 14:53:49

解決方案7 2 2017-04-04 19:26:25

解決方案8 2 2020-12-02 15:29:57

解決方案9 0 2017-09-29 12:57:34

解決方案10 0 2021-02-01 09:15:14

解決方案11 0 2021-04-06 17:07:43

解決方案12 0 2021-04-06 17:16:52

解決方案13 0 2022-01-03 11:42:23

解決方案14 0 2022-01-31 16:36:19

解決方案1
46 2015-02-12 20:49:59

解決方案2
25 2013-12-21 08:20:24

解決方案3
7 2020-05-24 06:29:11

解決方案4
5 2016-03-09 15:34:40

解決方案5
3 2016-09-27 21:56:13

解決方案6
3 2017-04-17 14:53:49

解決方案7
2 2017-04-04 19:26:25

解決方案8
2 2020-12-02 15:29:57

解決方案9
0 2017-09-29 12:57:34

解決方案10
0 2021-02-01 09:15:14

解決方案11
0 2021-04-06 17:07:43

解決方案12
0 2021-04-06 17:16:52

解決方案13
0 2022-01-03 11:42:23

解決方案14
0 2022-01-31 16:36:19