[英]download file using s3fs
I am trying to download a csv file from an s3 bucket using the s3fs library.我正在尝试使用 s3fs 库从 s3 存储桶下载 csv 文件。 I have noticed that writing a new csv using pandas has altered data in some way.我注意到使用 pandas 编写一个新的 csv 在某种程度上改变了数据。 So I want to download the file directly in its raw state.所以我想直接在它的原始 state 中下载文件。
The documentation has a download function but I do not understand how to use it:该文档有下载 function 但我不明白如何使用它:
download(self, rpath, lpath[, recursive])
: Alias of FilesystemSpec.get.
download(self, rpath, lpath[, recursive])
: Alias of FilesystemSpec.get.
Here's what I tried:这是我尝试过的:
import pandas as pd
import datetime
import os
import s3fs
import numpy as np
#Creds for s3
fs = s3fs.S3FileSystem(key=mykey, secret=mysecretkey)
bucket = "s3://mys3bucket/mys3bucket"
files = fs.ls(bucket)[-3:]
#download files:
for file in files:
with fs.open(file) as f:
fs.download(f,"test.csv")
AttributeError: 'S3File' object has no attribute 'rstrip'
for file in files:
fs.download(file,'test.csv')
Modified to download all files in the directory:修改为下载目录下的所有文件:
import pandas as pd
import datetime
import os
import s3fs
import numpy as np
#Creds for s3
fs = s3fs.S3FileSystem(key=mykey, secret=mysecretkey)
bucket = "s3://mys3bucket/mys3bucket"
#files references the entire bucket.
files = fs.ls(bucket)
for file in files:
fs.download(file,'test.csv')
I'm going to copy my answer here as well since I used this in a more general case:我也将在这里复制我的答案,因为我在更一般的情况下使用了它:
# Access Pando
import s3fs
#Blocked out url as "enter url here" for security reasons
fs = s3fs.S3FileSystem(anon=True, client_kwargs={'endpoint_url':"enter url here"})
# List objects in a path and import to array
# -3 limits output for testing purposes to prevent memory overload
files = fs.ls('hrrr/sfc/20190101')[-3:]
#Make a staging directory that can hold data as a medium
os.mkdir("Staging")
#Copy files into that directory (specific directory structure requires splitting strings)
for file in files:
item = str(file)
lst = item.split("/")
name = lst[3]
path = "Staging\\" + name
print(path)
fs.download(file, path)
Note that the documentation is fairly barren for this particular python package.请注意,对于这个特定的 python package,文档相当贫乏。 I was able to find some documentation regarding what arguments s3fs takes here ( https://readthedocs.org/projects/s3fs/downloads/pdf/latest/ ).我能够在这里找到一些关于 arguments s3fs 的文档( https://readthedocs.org/projects/s3fs/downloads/pdf/latest/ )。 The full arguments list is toward the end, though they don't specify what the parameters mean.完整的 arguments 列表接近尾声,尽管它们没有指定参数的含义。 Here's a general guide for s3fs.download:这是 s3fs.download 的一般指南:
-arg1 (rpath) is the source directory for where you are getting the files from. -arg1 (rpath) 是您从中获取文件的源目录。 As in both above answers, the best way to obtain this is to do an fs.ls on your s3 bucket and save that to a variable与上述两个答案一样,获得此功能的最佳方法是在您的 s3 存储桶上执行 fs.ls 并将其保存到变量中
-arg2 (lpath) is the destination directory and file name. -arg2 (lpath) 是目标目录和文件名。 Note that without a valid output file, this will return the Attribute Error OP got.请注意,如果没有有效的 output 文件,这将返回 OP 得到的属性错误。 I have this defined as a path variable我将其定义为路径变量
-arg3 is an optional parameter to choose to perform the download recursively -arg3 是可选参数,用于选择以递归方式执行下载
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.