简体   繁体   English

通过URL从Excel下载Web图像并保存到Python文件夹中

[英]Download web images by URL from excel and save to folders in Python

I have an excel file as follows: 我有一个Excel文件,如下所示:

import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
pd.options.display.max_colwidth

df = pd.read_excel("./test.xlsx")
print(df)

Output: 输出:

  city buildingName  buildingID  imgType                 imgUrl
0   bj     LG tower      123456   inside  http://pic3.nipic.com/20090629/827780_144001014_2.jpg
1   bj     LG tower      123456  outside  http://pic.baike.soso.com/p/20140321/20140321160157-391052318.jpg
2   sh          LXD      123457   inside  http://pic10.nipic.com/20101008/2634566_104534032717_2.jpg
3   gz           GM      123458   inside  http://pic1.to8to.com/case/day_120720/20120720_fb680a57416b8d16bad2kO1kOUIzkNxO.jpg

I need to download images by reading and iterating column imgUrl and save the images to the path combine by columns city, buildingName, buildingId, imgType. 我需要通过读取和迭代imgUrl列来下载图像,然后将图像保存到按列city, buildingName, buildingId, imgType.组合的路径city, buildingName, buildingId, imgType.

The final output folders and subfolders' structure will be like this, they will be saved in a folder named output : 最终的输出文件夹和子文件夹的结构将是这样,它们将保存在名为output的文件夹中:

├── bj
│   └── LG tower_123456
│       ├── inside
│       │   └── 827780_144001014_2.jpg
│       └── outside
│           └── 20140321160157-391052318.jpg
├── gz
│   └── GM_123458
│       └── inside
│           └── 2634566_104534032717_2.jpg
├── sh
│   └── LXD_123457
│       └── inside
│           └── 20120720_fb680a57416b8d16bad2kO1kOUIzkNxO.jpg

How can I have do this in Python? 如何在Python中做到这一点? Thanks for your help at advance. 感谢您的帮助。

I have tried to download one image: 我尝试下载一张图片:

import requests

r = requests.get("http://pic1.to8to.com/case/day_120720/20120720_fb680a57416b8d16bad2kO1kOUIzkNxO.jpg")
if r.status_code == 200:
    with open("test.jpg", "wb") as f:
        f.write(r.content)

You can do something like this assuming you have the dataframe loaded. 假设已加载数据帧,则可以执行类似的操作。

    import requests
    from os.path import join
    for index, row in df.iterrows():
        url = row['url']
        file_name = url.split('/')[-1]
        r = requests.get(url)
        abs_file_name = join(row['city'],row['buildingName']+str(row['buildingId']),row['imgType'],file_name)
        if r.status_code == 200:
            with open(abs_file_name, "wb") as f:
                f.write(r.content)

Edited code: 编辑代码:

    import requests
    from os.path import join,expanduser
    import os

    home = expanduser("~")
    df = pd.DataFrame()
    # df.append({})
    for index, row in df.iterrows():
        url = row['url']
        file_name = url.split('/')[-1]
        r = requests.get(url)
        filepath = join(home,row['city'],row['buildingName']+str(row['buildingId']),row['imgType'])
        if not os.path.exists(filepath):
            os.makedirs(filepath)
        filepath = join(filepath, file_name)
        # print(filepath)
        if r.status_code == 200:
            with open(filepath, "wb") as f:
                f.write(r.content)
import pandas as pd
import requests


def download_urls(csv_path):
    df = pd.read_csv(csv_path,encoding='utf-8',error_bad_lines=False)
    for index, row in df.iterrows():
        folder  = row[0]
        sub_folder = row[1]
        url = row[3]
        r = requests.get(url)
        if r.status_code == 200:
            with open("/{0}/{1}/{2}".format(folder, sub_folder, url.split("/")[-1]), "wb") as f:
                f.write(r.content)

path = r"C:\path\your_csv_path"
download_urls(path)

try this assuming you have csv file as input , there is no elegant way of iterate rows with pandas so you can use csv libary instead 尝试此操作,假设您将csv文件作为输入,没有用pandas迭代行的优雅方法,因此您可以使用csv libary代替

import pandas as pd
import requests
import os

def download_urls(csv_path):
    df = pd.read_csv(csv_path,encoding='utf-8',error_bad_lines=False)
    for index, row in df.iterrows():
        folder  = row[0]
        sub_folder = row[1]
        url = row[3]
        r = requests.get(url)
        if r.status_code == 200:
            if not os.path.exists(folder):
                os.makedirs(folder)
                if not os.path.exists(sub_folder):
                    os.makedirs(sub_folder)

            with open("/{0}/{1}/{2}".format(folder, sub_folder, url.split("/")[-1]), "wb") as f:
                f.write(r.content)

path = r"C:\path\your_csv_path"
download_urls(path)

try this with open folder if not exist (will open directory first only run ) 尝试使用打开文件夹(如果不存在)进行操作(将首先打开目录)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM