简体繁体 English

使用 Apache Airflow 编辑存储在 AWS S3 中的 CSV 无需下载

[英]Use Apache Airflow to edit CSV stored in AWS S3 without download

原文 2019-11-20 03:22:14 2 1 amazon-web-services/ ubuntu/ amazon-s3/ amazon-ec2/ airflow

I have a project that requires large amounts of CSV data to be transformed regularly.我有一个项目需要定期转换大量 CSV 数据。 This data will be stored in S3 and I am using an EC2 instance running Ubuntu server 16.04 to perform edits to the data and Apache Airflow to route the data.此数据将存储在 S3 中，我正在使用运行 Ubuntu 服务器 16.04 的 EC2 实例对数据和 Apache ZD1662521E6B89809B85A825FCEB7B 数据执行编辑。 Downloading and reuploading this data to S3 is quite expensive, is there a way I can edit this CSV data in memory without downloading the file to local storage on the Ubuntu instance?将这些数据下载并重新上传到 S3 非常昂贵，有没有办法可以在 memory 中编辑此 ZCC8D68C551C4A9A9A6D5313E07DE4DEAFDZ 数据，而无需将文件下载到 Z3D945423F8E9496C449A5D8C65B46 实例上的本地存储？

Thank you in advance先感谢您

1 个解决方案

In general you could think of program that will fetch the CSV file from s3 (using s3 sdk) and store it and transform in memory and then save back to s3.一般来说，您可以考虑从 s3 获取 CSV 文件（使用 s3 sdk）并将其存储并转换到 memory 然后保存回 s3 的程序。 But it will still require "downloading and reuploading".但它仍然需要“下载和重新上传”。 The only difference is that file won't be physically stored to local disk but kept in program memory.唯一的区别是文件不会物理存储到本地磁盘，而是保存在程序 memory 中。

You could also use s3fs to mount s3 bucket to a directory on server and perform requested operations directly on the files.您还可以使用s3fs将 s3 存储桶挂载到服务器上的目录并直接对文件执行请求的操作。 But they still need to be downloaded from s3 and reuploaded there (although it will be on-the-fly and invisible to you).但是它们仍然需要从 s3 下载并重新上传到那里（尽管它是即时的并且对您不可见）。

Hope that helps.希望有帮助。

Apache Airflow - 连接到 AWS S3 错误 - Apache Airflow - connecting to AWS S3 error

更新存储在 AWS S3 存储桶中的 csv 表中的数据 - Update data in csv table which is stored in AWS S3 bucket

AWS S3仅下载没有文件夹的txt文件吗？ - AWS S3 download only txt files without folders?

如何在没有 AWS 帐户的情况下下载 S3 存储桶文件 - How to download S3 bucket files without AWS account

如何在不指定目的地的情况下下载AWS S3存储桶PHP - How to download AWS S3 bucket PHP without specifying destination

如何在 Airflow 中使用 Airflow AWS 连接凭证使用 BashOprator 将文件从 AWS s3 存储桶传输到 GCS - How to use Airflow AWS connection credentials in Airflow using BashOprator to transfer files from AWS s3 bucket to GCS

在AWS Lambda中使用存储在S3上的腌制文件 - Use pickled files stored on S3 in AWS Lambda

使用 AWS lambda 通过下载 URL 将视频上传到 S3 - Use AWS lambda to upload video into S3 with download URL

从AWS S3正确下载文件如何使用JS - How right download file from AWS S3 use JS

从 AWS S3 下载对象以在 Android 应用程序中使用的方法？ - Method to download object from AWS S3 for use in Android app?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Apache Airflow - 连接到 AWS S3 错误 - Apache Airflow - connecting to AWS S3 error 更新存储在 AWS S3 存储桶中的 csv 表中的数据 - Update data in csv table which is stored in AWS S3 bucket AWS S3仅下载没有文件夹的txt文件吗？ - AWS S3 download only txt files without folders? 如何在没有 AWS 帐户的情况下下载 S3 存储桶文件 - How to download S3 bucket files without AWS account 如何在不指定目的地的情况下下载AWS S3存储桶PHP - How to download AWS S3 bucket PHP without specifying destination 如何在 Airflow 中使用 Airflow AWS 连接凭证使用 BashOprator 将文件从 AWS s3 存储桶传输到 GCS - How to use Airflow AWS connection credentials in Airflow using BashOprator to transfer files from AWS s3 bucket to GCS 在AWS Lambda中使用存储在S3上的腌制文件 - Use pickled files stored on S3 in AWS Lambda 使用 AWS lambda 通过下载 URL 将视频上传到 S3 - Use AWS lambda to upload video into S3 with download URL 从AWS S3正确下载文件如何使用JS - How right download file from AWS S3 use JS 从 AWS S3 下载对象以在 Android 应用程序中使用的方法？ - Method to download object from AWS S3 for use in Android app?

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM