简体   繁体   English

pandas 从 BytesIO 读取_csv

[英]pandas read_csv from BytesIO

I have a BytesIO file-like object, containing a CSV.我有一个类似于 object 的 BytesIO 文件,其中包含一个 CSV。 I want to read it into a Pandas dataframe, without writing to disk in between.我想将它读入 Pandas dataframe,中间不写入磁盘。

MWE MWE

In my use case I downloaded the file straight into BytesIO.在我的用例中,我将文件直接下载到 BytesIO 中。 For this MWE I'll have a file on disk, read it into BytesIO, then read that into Pandas.对于这个 MWE,我将在磁盘上有一个文件,将其读入 BytesIO,然后将其读入 Pandas。 The disk step is just to make a MWE.磁盘步骤只是制作一个 MWE。

file.csv

a,b
1,2
3,4

Script:脚本:

import pandas as pd
from io import BytesIO
bio = BytesIO()
with open('file.csv', 'rb') as f:
   bio.write(f.read())

# now we have a BytesIO with a CSV
df = pd.read_csv(bio)

Result:结果:

Traceback (most recent call last):
  File "pandas-io.py", line 8, in <module>
    df = pd.read_csv(bio)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 685, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 457, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in __init__
    self._make_engine(self.engine)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1135, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/ec2-user/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1917, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 545, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file

Note that this sounds like a similar problem to the title of this post , but the error messages are different, and that post has the XY problem.请注意,这听起来与这篇文章的标题类似,但错误信息不同,并且该文章有 XY 问题。

The error says the file is empty.错误说文件是空的。

That's because after writing to a BytesIO object, the file pointer is at the end of the file, ready to write more.那是因为在写入BytesIO object 后,文件指针位于文件末尾,准备写入更多内容。 So when Pandas tries to read it, it starts reading after the last byte that was written.因此,当 Pandas 尝试读取它时,它会在写入的最后一个字节之后开始读取。

So you need to move the pointer back to the start, for Pandas to read.因此,您需要将指针移回起点,以便 Pandas 读取。

bio.seek(0)
df = pd.read_csv(bio)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM