使用 Python 从共享的 Google 驱动器文件夹中读取多个 csv

Question

I would like το create a function to read files from a shared Google Drive folder and concatanate them into one df.我想创建一个 function 从共享的 Google Drive 文件夹中读取文件并将它们连接到一个 df 中。 I would prefer to do it without using any authenticators if it would be possible.如果可能的话，我宁愿在不使用任何身份验证器的情况下这样做。

I used this code i found here:我使用了我在这里找到的代码：

url = 'https://drive.google.com/drive/folders/1_Hwl0tm6rfechhTF-YGVsOA0cbzu4jUd'
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
df = pd.read_csv(path)

I want to read all files in the folder using glob and concatanate them in one df but i get the HTTPError: HTTP Error 404: Not Found, error.我想使用 glob 读取文件夹中的所有文件并将它们连接到一个 df 中，但我得到 HTTPError: HTTP 错误 404：未找到，错误。 Any help would be apreciated任何帮助将不胜感激

Answer 1

IIUC use -1 , but also url for me raise error: IIUC 使用-1 ，但我也使用url引发错误：

path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-1]

Answer 2

You cannot download a folder directly, the folders within the Drive API are considered as files, with the difference of the MIME type application/vnd.google-apps.folder不能直接下载文件夹，驱动器 API 内的文件夹被视为文件，不同的是 MIME 类型application/vnd.google-apps.folder

As the Drive API documentation says:正如Drive API 文档所说：

A container you can use to organize other types of files on Drive.可用于整理云端硬盘上其他类型文件的容器。 Folders are files that only contain metadata, and have the MIME type application/vnd.google-apps.folder .文件夹是仅包含元数据的文件，并且具有 MIME 类型application/vnd.google-apps.folder 。

Note : A single file stored on My Drive can be contained in multiple folders.注意：存储在“我的云端硬盘”中的单个文件可以包含在多个文件夹中。 A single file stored on a shared drive can only have one parent folder.存储在共享驱动器上的单个文件只能有一个父文件夹。

As a workaround, you can list all the files contained within a folder and download them one by one.作为一种解决方法，您可以列出文件夹中包含的所有文件并逐个下载它们。 To build the following example I have based on this :要构建以下示例，我基于此：

`do.py`

def list_and_download():
    service = drive_service()
    folder_id = FOLDER_ID
    # List all files within the folder
    results = service.files().list(q="'{}' in parents".format(folder_id), includeItemsFromAllDrives= true, supportsAllDrive=true).execute()
    items = results.get("files", [])
    print(items)
    fh = io.BytesIO()
    for item in items:
        # download file one by one using MediaIoBaseDownload
        if item["mimeType"] != "text/csv":
            return
        request = service.files().get_media(fileId=item["id"])
        fh = io.BytesIO()
        downloader = MediaIoBaseDownload(fh, request)
        done = False
        while done is False:
            status, done = downloader.next_chunk()
            print("Download {}%.".format(int(status.progress() * 100)))
        print("Download Complete!")
        with open(item["name"], "wb") as f:
            f.write(fh.read())

    # Do whatever you want with the csv

Documentation文档

MediaIOBaseDownload MediaIOBase下载
Implement Shared Support 实施共享支持

Documentation文档

MediaIOBaseDownload MediaIOBase下载

Answer 3

You should use Google-API to list your files in shared folder.您应该使用 Google-API 列出共享文件夹中的文件。 https://developers.google.com/drive/api/v2/reference/children/list https://developers.google.com/drive/api/v2/reference/children/list

Example usage of API to list files https://i.ibb.co/pyx8mKG/drive-list.png API 的示例用法来列出文件https://i.ibb.co/pyx8mKG/drive-list.png

After than if you get children list from json file you can read and concat dataframe之后，如果您从 json 文件中获取子列表，您可以阅读并连接 dataframe



import pandas as pd

response = {
 "kind": "drive#childList",
 "etag": "\"9NuiSicPg_3yRScMQO3pipPxwvs\"",
 "selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children",
 "items": [
  {
   "kind": "drive#childReference",
   "id": "1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1",
   "selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children/1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1",
   "childLink": "https://www.googleapis.com/drive/v2/files/1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1"
  },
  {
   "kind": "drive#childReference",
   "id": "14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO",
   "selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children/14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO",
   "childLink": "https://www.googleapis.com/drive/v2/files/14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO"
  }
 ]
}

item_arr = []
for item in response["items"]:
    print(item["id"])
    download_url = 'https://drive.google.com/uc?id=' + item["id"]
    item_arr.append(pd.read_csv(download_url))
df = pd.concat(item_arr, axis=0)
print(df.head())

使用 Python 从共享的 Google 驱动器文件夹中读取多个 csv

问题描述

2 个解决方案

解决方案1
0 2021-11-19 12:01:49

解决方案2
0 2021-11-19 15:50:08

`do.py`

Documentation文档

Documentation文档

解决方案3
0 2021-11-19 15:52:45

使用 Python 从共享的 Google 驱动器文件夹中读取多个 csv

问题描述

2 个解决方案

解决方案1 0 2021-11-19 12:01:49

解决方案2 0 2021-11-19 15:50:08

do.py

Documentation文档

Documentation文档

解决方案3 0 2021-11-19 15:52:45

解决方案1
0 2021-11-19 12:01:49

解决方案2
0 2021-11-19 15:50:08

`do.py`

解决方案3
0 2021-11-19 15:52:45