简体   繁体   English

如何将 web 抓取的图片保存到文件夹中? (Python)

[英]How to save images to a folder from web scraping? (Python)

How do I make it so that each image I garnered from web scraping is then stored to a folder?我如何才能将我从 web 抓取中获得的每张图像存储到一个文件夹中? I use Google Colab currently since I am just practicing stuff.我目前使用 Google Colab,因为我只是在练习。 I want to store them in my Google Drive folder.我想将它们存储在我的 Google Drive 文件夹中。

This is my code for web scraping:这是我的 web 抓取代码:

import requests 
from bs4 import BeautifulSoup 

def getdata(url):
  r = requests.get(url)
  return r.text

htmldata = getdata('https://www.yahoo.com/')
soup = BeautifulSoup(htmldata, 'html.parser')

imgdata = []
for i in soup.find_all('img'):
  imgdata = i['src']
  print(imgdata)

I created a pics folder manually in the folder where the script is running to store the pictures in it.我在脚本运行的文件夹中手动创建了一个pics文件夹,用于将图片存储在其中。 Than i changed your code in the for loop so its appending urls to the imgdata list.比起我在 for 循环中更改了您的代码,以便将其附加 url 到imgdata列表。 The try except block is there because not every url in the list is valid. try except块在那里是因为并非列表中的每个 url 都是有效的。

import requests 
from bs4 import BeautifulSoup 

def getdata(url):
    r = requests.get(url)
    return r.text

htmldata = getdata('https://www.yahoo.com/')
soup = BeautifulSoup(htmldata, 'html.parser')

imgdata = []
for i in soup.find_all('img'):
    imgdata.append(i['src']) # made a change here so its appendig to the list
    


filename = "pics/picture{}.jpg"
for i in range(len(imgdata)):
    print(f"img {i+1} / {len(imgdata)+1}")
    # try block because not everything in the imgdata list is a valid url
    try:
        r = requests.get(imgdata[i], stream=True)
        with open(filename.format(i), "wb") as f:
            f.write(r.content)
    except:
        print("Url is not an valid")
foo.write('whatever')
foo.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM