[英]Read multiple csv from Shared Google drive folder using Python
I would like το create a function to read files from a shared Google Drive folder and concatanate them into one df.我想创建一个 function 从共享的 Google Drive 文件夹中读取文件并将它们连接到一个 df 中。 I would prefer to do it without using any authenticators if it would be possible.如果可能的话,我宁愿在不使用任何身份验证器的情况下这样做。
I used this code i found here:我使用了我在这里找到的代码:
url = 'https://drive.google.com/drive/folders/1_Hwl0tm6rfechhTF-YGVsOA0cbzu4jUd'
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
df = pd.read_csv(path)
I want to read all files in the folder using glob and concatanate them in one df but i get the HTTPError: HTTP Error 404: Not Found, error.我想使用 glob 读取文件夹中的所有文件并将它们连接到一个 df 中,但我得到 HTTPError: HTTP 错误 404:未找到,错误。 Any help would be apreciated任何帮助将不胜感激
IIUC use -1
, but also url
for me raise error: IIUC 使用-1
,但我也使用url
引发错误:
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-1]
You cannot download a folder directly, the folders within the Drive API are considered as files, with the difference of the MIME type application/vnd.google-apps.folder
不能直接下载文件夹,驱动器 API 内的文件夹被视为文件,不同的是 MIME 类型application/vnd.google-apps.folder
As the Drive API documentation says:正如Drive API 文档所说:
A container you can use to organize other types of files on Drive.可用于整理云端硬盘上其他类型文件的容器。 Folders are files that only contain metadata, and have the MIME type
application/vnd.google-apps.folder
.文件夹是仅包含元数据的文件,并且具有 MIME 类型application/vnd.google-apps.folder
。Note : A single file stored on My Drive can be contained in multiple folders.注意:存储在“我的云端硬盘”中的单个文件可以包含在多个文件夹中。 A single file stored on a shared drive can only have one parent folder.存储在共享驱动器上的单个文件只能有一个父文件夹。
As a workaround, you can list all the files contained within a folder and download them one by one.作为一种解决方法,您可以列出文件夹中包含的所有文件并逐个下载它们。 To build the following example I have based on this :要构建以下示例,我基于此:
do.py
def list_and_download():
service = drive_service()
folder_id = FOLDER_ID
# List all files within the folder
results = service.files().list(q="'{}' in parents".format(folder_id), includeItemsFromAllDrives= true, supportsAllDrive=true).execute()
items = results.get("files", [])
print(items)
fh = io.BytesIO()
for item in items:
# download file one by one using MediaIoBaseDownload
if item["mimeType"] != "text/csv":
return
request = service.files().get_media(fileId=item["id"])
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download {}%.".format(int(status.progress() * 100)))
print("Download Complete!")
with open(item["name"], "wb") as f:
f.write(fh.read())
# Do whatever you want with the csv
You should use Google-API to list your files in shared folder.您应该使用 Google-API 列出共享文件夹中的文件。 https://developers.google.com/drive/api/v2/reference/children/list https://developers.google.com/drive/api/v2/reference/children/list
Example usage of API to list files https://i.ibb.co/pyx8mKG/drive-list.png API 的示例用法来列出文件https://i.ibb.co/pyx8mKG/drive-list.png
After than if you get children list from json file you can read and concat dataframe之后,如果您从 json 文件中获取子列表,您可以阅读并连接 dataframe
import pandas as pd
response = {
"kind": "drive#childList",
"etag": "\"9NuiSicPg_3yRScMQO3pipPxwvs\"",
"selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children",
"items": [
{
"kind": "drive#childReference",
"id": "1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1",
"selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children/1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1",
"childLink": "https://www.googleapis.com/drive/v2/files/1YtG84A9ZJNM7A3OgD3nOQk8V9bz_mYQ1"
},
{
"kind": "drive#childReference",
"id": "14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO",
"selfLink": "https://www.googleapis.com/drive/v2/files/1IkO_nB83mUfKLopEtYsNT7RbMOIcWDAK/children/14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO",
"childLink": "https://www.googleapis.com/drive/v2/files/14P3NAdGid-iJl1JSFDHLb8U3BRb0tOQO"
}
]
}
item_arr = []
for item in response["items"]:
print(item["id"])
download_url = 'https://drive.google.com/uc?id=' + item["id"]
item_arr.append(pd.read_csv(download_url))
df = pd.concat(item_arr, axis=0)
print(df.head())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.