简体   繁体   English

来自 s3 存储桶的 Pandas read_pickle

[英]Pandas read_pickle from s3 bucket

I am working on a Jupyter notebook from AWS EMR .我正在使用AWS EMRJupyter笔记本。

I am able to do this: pd.read_csv("s3:\\mypath\\xyz.csv') .我能够做到这一点: pd.read_csv("s3:\\mypath\\xyz.csv')

However, if I try to open a pickle file like this, pd.read_pickle("s3:\\mypath\\xyz.pkl")但是,如果我尝试打开这样的泡菜文件, pd.read_pickle("s3:\\mypath\\xyz.pkl")

I am getting this error:我收到此错误:

[Errno 2] No such file or directory: 's3://pvarma1/users/users/candidate_users.pkl'
Traceback (most recent call last):
  File "/usr/local/lib64/python2.7/site-packages/pandas/io/pickle.py", line 179, in read_pickle
    return try_read(path)
  File "/usr/local/lib64/python2.7/site-packages/pandas/io/pickle.py", line 177, in try_read
    lambda f: pc.load(f, encoding=encoding, compat=True))
  File "/usr/local/lib64/python2.7/site-packages/pandas/io/pickle.py", line 146, in read_wrapper
    is_text=False)
  File "/usr/local/lib64/python2.7/site-packages/pandas/io/common.py", line 421, in _get_handle
    f = open(path_or_buf, mode)
IOError: [Errno 2] No such file or d

However, I can see both xyz.csv and xyz.pkl in the same path!但是,我可以在同一路径中看到xyz.csvxyz.pkl Can anyone help?任何人都可以帮忙吗?

Pandas read_pickle supports only local paths, unlike read_csv . Pandas read_pickle仅支持本地路径,与read_csv不同。 So you should be copying the pickle file to your machine before reading it in pandas.所以你应该先把pickle文件复制到你的机器上,然后再用pandas读取它。

Since read_pickle does not support this, you can use smart_open :由于read_pickle不支持这一点,您可以使用smart_open

from smart_open import open 
s3_file_name = "s3://bucket/key"
with open(s3_file_name, 'rb') as f:
   df = pd.read_pickle(f)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM