[英]Pandas read_pickle from s3 bucket
我正在使用AWS EMR的Jupyter筆記本。
我能夠做到這一點: pd.read_csv("s3:\\mypath\\xyz.csv')
。
但是,如果我嘗試打開這樣的泡菜文件, pd.read_pickle("s3:\\mypath\\xyz.pkl")
我收到此錯誤:
[Errno 2] No such file or directory: 's3://pvarma1/users/users/candidate_users.pkl'
Traceback (most recent call last):
File "/usr/local/lib64/python2.7/site-packages/pandas/io/pickle.py", line 179, in read_pickle
return try_read(path)
File "/usr/local/lib64/python2.7/site-packages/pandas/io/pickle.py", line 177, in try_read
lambda f: pc.load(f, encoding=encoding, compat=True))
File "/usr/local/lib64/python2.7/site-packages/pandas/io/pickle.py", line 146, in read_wrapper
is_text=False)
File "/usr/local/lib64/python2.7/site-packages/pandas/io/common.py", line 421, in _get_handle
f = open(path_or_buf, mode)
IOError: [Errno 2] No such file or d
但是,我可以在同一路徑中看到xyz.csv
和xyz.pkl
! 任何人都可以幫忙嗎?
Pandas read_pickle
僅支持本地路徑,與read_csv
不同。 所以你應該先把pickle文件復制到你的機器上,然后再用pandas讀取它。
由於read_pickle
不支持這一點,您可以使用smart_open
:
from smart_open import open
s3_file_name = "s3://bucket/key"
with open(s3_file_name, 'rb') as f:
df = pd.read_pickle(f)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.