簡體 English 中英

使用 python 從 GCP 存儲桶中遞歸讀取所有子文件夾中的 csv 個文件

[英]Read csv files recursively in all sub folders from a GCP bucket using python

原文 2022-09-27 16:09:25 8 2 python/ pandas/ csv/ google-cloud-platform

我試圖使用 python pandas 從 GCP 存儲桶中可用的所有子文件夾遞歸加載所有 csv 文件。

目前我正在使用dask加載數據，但速度很慢。

import dask
path = "gs://mybucket/parent_path + "*/*.csv"
getAllDaysData = dask.dataframe.read_csv(path).compute()

有人可以用更好的方法幫助我。

2 個解決方案

我建議改為閱讀鑲木地板文件。 並使用pd.read_parquet(file, engine = 'pyarrow')將其轉換為 pandas dataframe。

或者，您可能需要考慮先將數據加載到 BigQuery 中。 只要所有 csv 文件都具有某種結構，您就可以這樣做。

uri = f"gs://mybucket/parent_path/*.csv"
job_config = bigquery.LoadJobConfig(
    source_format=bigquery.SourceFormat.CSV
)

load_job = client.load_table_from_uri(
    uri,
    'destination_table',
    job_config=job_config,
    location=GCP_LOCATION
)
load_job_result = load_job.result()

使用 python 遞歸刪除 S3 存儲桶下的文件而不刪除文件夾

[英]Delete files under S3 bucket recursively without deleting folders using python

如何使用 python 將文件夾從本地上傳到 GCP 存儲桶

[英]How to upload folder from local to GCP bucket using python

使用 PowerShell 遞歸刪除 S3 存儲桶下超過 30 天的文件而不刪除文件夾

[英]Delete files older than 30 days under S3 bucket recursively without deleting folders using PowerShell

從 GCP 中的存儲桶中讀取圖像用於 ML

[英]Read images from a bucket in GCP for ML

如何使用 AWS SDK 為 Python 遞歸列出 AWS S3 存儲桶中的文件？

[英]How to recursively list files in AWS S3 bucket using AWS SDK for Python?

通過文件夾遞歸 go 並將每個文件夾中的 csv 個文件加載到 BigQuery 中

[英]Recursively go through folders and load the csv files in each folder into BigQuery

從 s3 存儲桶中讀取與 python 中的模式匹配的文件

[英]Read files from s3 bucket that match a pattern in python

如何使用 gcloud python 庫或使用請求庫列出所有 GCP 文件夾？

[英]How to list all GCP folders by using gcloud python libraries or by using request library?

如何從 S3 存儲桶中遞歸刪除文件

[英]How to delete files recursively from an S3 bucket

僅列出 Cloud Function 存儲桶 API 中 GCP GCS 中的頂級文件夾？

[英]List only top level folders in GCP GCS from Cloud Function bucket API?

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 使用 python 遞歸刪除 S3 存儲桶下的文件而不刪除文件夾如何使用 python 將文件夾從本地上傳到 GCP 存儲桶使用 PowerShell 遞歸刪除 S3 存儲桶下超過 30 天的文件而不刪除文件夾從 GCP 中的存儲桶中讀取圖像用於 ML 如何使用 AWS SDK 為 Python 遞歸列出 AWS S3 存儲桶中的文件？通過文件夾遞歸 go 並將每個文件夾中的 csv 個文件加載到 BigQuery 中從 s3 存儲桶中讀取與 python 中的模式匹配的文件如何使用 gcloud python 庫或使用請求庫列出所有 GCP 文件夾？如何從 S3 存儲桶中遞歸刪除文件僅列出 Cloud Function 存儲桶 API 中 GCP GCS 中的頂級文件夾？

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM