[英]Why pandas read_excel not reading correctly xls file?
I am just trying to open the xls file with pandas with the following code:我只是想用下面的代码用 pandas 打开 xls 文件:
import pandas as pd
frame = pd.read_excel('15_6z_12N_11.xlsx', skiprows=3)
df = pd.DataFrame(frame)
#pd.read_excel('your_excel.xlsx', , skip_blank_lines=False)
print(df)
and return is回报是
Unnamed: 0 185 ... Unnamed: 254 Unnamed: 255
0 NaN NaN ... NaN NaN
1 NaN NaN ... NaN NaN
2 NaN NaN ... NaN NaN
3 NaN NaN ... NaN NaN
4 NaN NaN ... NaN NaN
.. ... ... ... ... ...
993 NaN NaN ... NaN NaN
994 NaN NaN ... NaN NaN
995 NaN NaN ... NaN NaN
996 NaN NaN ... NaN NaN
997 NaN NaN ... NaN NaN
when my file contains following data: Data from xls当我的文件包含以下数据时:来自 xls 的数据
Any idea why output is incorrect?知道为什么 output 不正确吗? Thanks谢谢
Here is xls file But sorry it is in russian language 这是 xls 文件但很抱歉它是俄语
Try this:尝试这个:
df = pd.read_excel('15_6z_12N_11.xlsx', header=[0,1,2]) #Read file, use 3 rows as header
First create DataFrame
with specify sheetname, omit first 3 rows and next 3 rows convert to MultiIndex
:首先创建DataFrame
并指定 sheetname,省略前 3 行,后 3 行转换为MultiIndex
:
df = pd.read_excel('15_6z_12N_11.xls', sheet_name='PRINT', skiprows=3, header=[0,1,2])
Ant then is possible flatten Multiindex
with remove Unnamed
strings: Multiindex
然后可以通过删除Unnamed
的字符串来展平多索引:
df.columns = ['_'.join(y for y in x if not 'Unnamed' in y) for x in df.columns.tolist()]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.