如何使用Pandas将文件写入S3

Question

I want to write a data frame column in .ann format to S3. 我想以.ann格式将数据帧列写入S3。

Right now I am using the following code to do that. 现在我使用以下代码来做到这一点。

df['user_input'].to_csv(ann_file_path, header=None, index=None, sep=' ')

Where ann_file_path is the full path of the .ann file on the Server. 其中ann_file_path是服务器上.ann文件的完整路径。

I am getting following error message: 我收到以下错误消息：

[Errno 22] Invalid argument: 'https://s3-eu-west-1.amazonaws.com/bucket/sub_folder/somefile.ann'

Why am I getting that? 我为什么要这样做？

Also, do I need to use Boto3 to write or can I directly write the file on S3 with full path? 另外，我是否需要使用Boto3进行编写，还是可以使用完整路径直接在S3上写入文件？

I can think of some authorization might be required for that but the error message seems different from something related to authorization. 我可以想到可能需要一些授权，但错误消息似乎与授权相关的内容不同。

Answer 1

I've resolved. 我已经解决了。 We need AWS handshake using access_key_id and secret_key for AWS. 我们需要使用AWS的access_key_id和secret_key进行AWS握手。

Get URL starting from the bucket name (not https:/...), hence get rid of whatever before it. 从存储桶名称（而不是https：/ ...）开始获取URL，从而摆脱之前的任何内容。

My URL: https://s3-eu-west-1.amazonaws.com/bucket/sub_folder/somefile.ann 我的网址： https://s3-eu-west-1.amazonaws.com/bucket/sub_folder/somefile.ann ： https://s3-eu-west-1.amazonaws.com/bucket/sub_folder/somefile.ann

Transformed to: bucket/sub_folder/somefile.ann 转换为： bucket/sub_folder/somefile.ann

Code to do that: ann_file_path = ann_file_path.split('.com/', 1)[1] 执行此操作的代码： ann_file_path = ann_file_path.split('.com/', 1)[1]

Once I got ann_file_path , I used s3fs python library to upload the ann file to the server. 一旦我得到了ann_file_path ，我用s3fs Python库上传安文件到服务器。

bytes_to_write = df['user_input'].to_csv(header=None, index=None).encode()
fs = s3fs.S3FileSystem(key=settings.AWS_ACCESS_KEY_ID, secret=settings.AWS_SECRET_ACCESS_KEY)
with fs.open(ann_file_path, 'wb') as f:
   f.write(bytes_to_write)

如何使用Pandas将文件写入S3

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-10-19 12:44:50

如何使用Pandas将文件写入S3

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-10-19 12:44:50

解决方案1
3 已采纳 2018-10-19 12:44:50