简体   繁体   English

使用 Python 从共享的 Google 驱动器文件夹中读取多个 csv

[英]Read multiple csv from Shared Google drive folder using Python

I would like το create a function to read files from a shared Google Drive folder and concatanate them into one df.我想创建一个 function 从共享的 Google Drive 文件夹中读取文件并将它们连接到一个 df 中。 I would prefer to do it without using any authenticators if it would be possible.如果可能的话,我宁愿在不使用任何身份验证器的情况下这样做。

I used this code i found here:我使用了我在这里找到的代码:

url = 'https://drive.google.com/drive/folders/1_Hwl0tm6rfechhTF-YGVsOA0cbzu4jUd'
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
df = pd.read_csv(path)

I want to read all files in the folder using glob and concatanate them in one df but i get the HTTPError: HTTP Error 404: Not Found, error.我想使用 glob 读取文件夹中的所有文件并将它们连接到一个 df 中,但我得到 HTTPError: HTTP 错误 404:未找到,错误。 Any help would be apreciated任何帮助将不胜感激

IIUC use -1 , but also url for me raise error: IIUC 使用-1 ,但我也使用url引发错误:

path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-1]

You cannot download a folder directly, the folders within the Drive API are considered as files, with the difference of the MIME type application/vnd.google-apps.folder不能直接下载文件夹,驱动器 API 内的文件夹被视为文件,不同的是 MIME 类型application/vnd.google-apps.folder

As the Drive API documentation says:正如Drive API 文档所说:

A container you can use to organize other types of files on Drive.可用于整理云端硬盘上其他类型文件的容器。 Folders are files that only contain metadata, and have the MIME type application/vnd.google-apps.folder .文件夹是仅包含元数据的文件,并且具有 MIME 类型application/vnd.google-apps.folder

Note : A single file stored on My Drive can be contained in multiple folders.注意:存储在“我的云端硬盘”中的单个文件可以包含在多个文件夹中。 A single file stored on a shared drive can only have one parent folder.存储在共享驱动器上的单个文件只能有一个父文件夹。

As a workaround, you can list all the files contained within a folder and download them one by one.作为一种解决方法,您可以列出文件夹中包含的所有文件并逐个下载它们。 To build the following example I have based on this :要构建以下示例,我基于

do.py
def list_and_download():
    service = drive_service()
    folder_id = FOLDER_ID
    # List all files within the folder
    results = service.files().list(q="'{}' in parents".format(folder_id), includeItemsFromAllDrives= true, supportsAllDrive=true).execute()
    items = results.get("files", [])
    print(items)
    fh = io.BytesIO()
    for item in items:
        # download file one by one using MediaIoBaseDownload
        if item["mimeType"] != "text/csv":
            return
        request = service.files().get_media(fileId=item["id"])
        fh = io.BytesIO()
        downloader = MediaIoBaseDownload(fh, request)
        done = False
        while done is False:
            status, done = downloader.next_chunk()
            print("Download {}%.".format(int(status.progress() * 100)))
        print("Download Complete!")
        with open(item["name"], "wb") as f:
            f.write(fh.read())

    # Do whatever you want with the csv
Documentation文档
Documentation文档

You should use Google-API to list your files in shared folder.您应该使用 Google-API 列出共享文件夹中的文件。 https://developers.google.com/drive/api/v2/reference/children/list https://developers.google.com/drive/api/v2/reference/children/list

Example usage of API to list files https://i.ibb.co/pyx8mKG/drive-list.png API 的示例用法来列出文件https://i.ibb.co/pyx8mKG/drive-list.png

After than if you get children list from json file you can read and concat dataframe之后,如果您从 json 文件中获取子列表,您可以阅读并连接 dataframe



import pandas as pd

response = {
 "kind": "drive#childList",
 "etag": "\"9NuiSicPg_3yRScMQO3pipPxwvs\"",
 "selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children",
 "items": [
  {
   "kind": "drive#childReference",
   "id": "1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1",
   "selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children/1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1",
   "childLink": "https://www.googleapis.com/drive/v2/files/1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1"
  },
  {
   "kind": "drive#childReference",
   "id": "14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO",
   "selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children/14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO",
   "childLink": "https://www.googleapis.com/drive/v2/files/14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO"
  }
 ]
}

item_arr = []
for item in response["items"]:
    print(item["id"])
    download_url = 'https://drive.google.com/uc?id=' + item["id"]
    item_arr.append(pd.read_csv(download_url))
df = pd.concat(item_arr, axis=0)
print(df.head())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Python 将 xlsx 文档添加到 Google Drive 共享文件夹 - Add xlsx documents to Google Drive shared folder using Python 使用 Python 下载共享的 Google Drive 文件夹 - Download Shared Google Drive Folder with Python 如何将csv文件上传到谷歌驱动器并将其从同一个文件中读取到python中 - How to upload csv file into google drive and read it from same into python 在python中如何读取Google云端硬盘中的大型CSV? - In python how to read a large CSV that is in Google Drive? 使用 Python 从 Google-Drive 下载大文件夹 - Downloading a Large Folder From Google-Drive Using Python Python,大型csv文件上的pandas.read_csv,具有来自Google云端硬盘文件的1000万行 - Python, pandas.read_csv on large csv file with 10 Million rows from Google Drive file 使用 python,我无法从 Google Drive API v3 访问共享驱动器文件夹 - Using python, I can't access shared drive folders from Google Drive API v3 python上传文件到与我共享的google drive文件夹 - python upload file to google drive folder that is shared with me 通过 Python 在 Google Drive 中访问“与我共享文件夹” - Access "Shared with me folder" in Google Drive via Python Google Drive 在共享云端硬盘中创建新文件夹 - Google Drive create new folder in shared drive
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM