简体   繁体   English

为什么 pandas read_excel 不能正确读取 xls 文件?

[英]Why pandas read_excel not reading correctly xls file?

I am just trying to open the xls file with pandas with the following code:我只是想用下面的代码用 pandas 打开 xls 文件:

import pandas as pd

frame = pd.read_excel('15_6z_12N_11.xlsx', skiprows=3)
df = pd.DataFrame(frame)
#pd.read_excel('your_excel.xlsx', , skip_blank_lines=False)

print(df)

and return is回报是

     Unnamed: 0  185  ...  Unnamed: 254  Unnamed: 255
0           NaN  NaN  ...           NaN           NaN
1           NaN  NaN  ...           NaN           NaN
2           NaN  NaN  ...           NaN           NaN
3           NaN  NaN  ...           NaN           NaN
4           NaN  NaN  ...           NaN           NaN
..          ...  ...  ...           ...           ...
993         NaN  NaN  ...           NaN           NaN
994         NaN  NaN  ...           NaN           NaN
995         NaN  NaN  ...           NaN           NaN
996         NaN  NaN  ...           NaN           NaN
997         NaN  NaN  ...           NaN           NaN

when my file contains following data: Data from xls当我的文件包含以下数据时:来自 xls 的数据

Any idea why output is incorrect?知道为什么 output 不正确吗? Thanks谢谢

Here is xls file But sorry it is in russian language 这是 xls 文件但很抱歉它是俄语

Try this:尝试这个:

df = pd.read_excel('15_6z_12N_11.xlsx', header=[0,1,2]) #Read file, use 3 rows as header

First create DataFrame with specify sheetname, omit first 3 rows and next 3 rows convert to MultiIndex :首先创建DataFrame并指定 sheetname,省略前 3 行,后 3 行转换为MultiIndex

df = pd.read_excel('15_6z_12N_11.xls', sheet_name='PRINT', skiprows=3, header=[0,1,2])

Ant then is possible flatten Multiindex with remove Unnamed strings: Multiindex然后可以通过删除Unnamed的字符串来展平多索引:

df.columns = ['_'.join(y for y in x if not 'Unnamed' in y) for x in df.columns.tolist()]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM