简体   繁体   English

熊猫:读取Excel nrows无效,并且dtype不会保留0填充

[英]Pandas: read excel nrows not working, and dtype does not preserve 0 padding

I have difficulties in defining the parameters when reading in the excel, when specifying the dtype and nrows. 在读取excel,指定dtype和nrows时,很难定义参数。

Let's take an example this small table saved in excel .xlsx format. 让我们以这个以excel .xlsx格式保存的小表为例。 'col1' numbers are padded with 0. 'col1'数字填充为0。

col1    col2
01  a
02  b
03  c
04  d

First question, I want to read the entire table but preserve the padding. 第一个问题,我想阅读整个表格,但保留填充。 I tried using dtype to define as object or str, and using converters too (below). 我尝试使用dtype定义为object或str,并且也使用了转换器(如下)。 The dtype is converted to object, however, padding is not preserved. dtype将转换为对象,但是,不会保留填充。 Is there anyway to do this? 反正有这样做吗?

pd.read_excel(path, sheetname=0, dtype={'col1': object}, nrows=5)
pd.read_excel(path, sheetname=0, converters={'col1':lambda x: str(x)}, nrows=5)

Second question, I tried to pull a subset of the dataframe, using nrows (below). 第二个问题,我尝试使用nrows(如下所示)提取数据框的子集。 However, this does not work at all and still pull out the entire table. 但是,这根本不起作用,仍然会拉出整个表。

pd.read_excel(path, sheetname=0, nrows=2)

For both instances, it work perfectly fine in pd.read_csv 对于这两种情况,它在pd.read_csv都可以正常工作

I am using pandas v0.20.3. 我正在使用pandas v0.20.3。

The reason the formatting doesn't work, is because Excel's formatting only changes the way data is displayed, not how its stored. 格式化不起作用的原因是因为Excel的格式化仅更改数据的显示方式,而不更改数据的存储方式。

To change the way data is stored; 更改数据存储方式; you need to change the native format of the file; 您需要更改文件的本机格式; or format the data the way you want. 或以所需方式格式化数据。

In your case, you are converting it to a string, what you should do is convert it to a zero padded string ; 在您的情况下,您要将其转换为字符串,那么您应该将其转换为零填充字符串 for which there is a special function called str.zfill() . 为此,有一个称为str.zfill()的特殊函数。

The second part of your question is much simpler - nrows argument for read_excel was added in pandas version 0.23.0 问题的第二部分要简单得多-在熊猫版本0.23.0中添加了read_excel nrows参数

If you format something in excel that doesn't mean that the value stored in the excel file is actually '01'. 如果您在excel中设置格式,这并不意味着excel文件中存储的值实际上是'01'。 Save it as a csv and open it in notepad. 将其另存为csv,然后在记事本中打开它。 My guess is that you shouldn't see a '01' but a '1' 我的猜测是您不应该看到“ 01”,而是看到“ 1”

nrows is for pandas v 23 and you're on v 20 nrows适用于熊猫v 23,而您使用v 20

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM