[英]Harmonise pandas MultiIndex to string when reading Excel files
I am reading data from an Excel file into a pandas DataFrame using read_excel()
. 我正在使用read_excel()
数据从Excel文件读取到pandas DataFrame中。 Unfortunately, it seems difficult to ensure the formatting of cells in Excel and so it happens that a table like this: 不幸的是,似乎很难确保Excel中单元格的格式,因此碰巧出现了这样的表:
2018 2019
a b a b
0 1.295666 -0.544973 0.845973 -0.874668
1 0.590123 0.284364 -1.482706 -0.859350
2 0.832228 0.469992 0.994865 0.480301
3 0.098671 0.198643 0.878323 -0.119761
...actually has surprising indices or columns: ...实际上具有令人惊讶的索引或列:
df.columns
MultiIndex(levels=[[2018, 2019, '2019'], ['a', 'b']],
labels=[[0, 0, 1, 2], [0, 1, 0, 1]])
As you can see, the primary index of the last column actually has a string for 2019 and not an integer like the others. 如您所见,最后一列的主索引实际上具有2019年的字符串,而不是其他整数。
To be on the safe side, I would like to convert all indices to str
ing, but pandas wont let me: 为了安全起见,我想将所有索引都转换为str
,但是熊猫不会让我:
df.columns.set_levels(df.columns.levels[0].astype(str), level=0)
ValueError: Level values must be unique: ['2018', '2019', '2019'] on level 0
I see two approaches to solve this: 我看到两种解决方法:
read_excel()
convert column headers to string or 让read_excel()
将列标题转换为字符串或 set_levels()
as in my example above to work. set_levels()
例所示,获取set_levels()
即可工作。 But I can't get either to work - any hints? 但是我都无法工作-有任何提示吗?
You can re-create your multiple index
for columns 您可以为列重新创建多个index
idx=pd.MultiIndex.from_product([df.columns.levels[0].astype(int).unique(),df.columns.levels[1]])
df.columns=idx
df.columns
MultiIndex(levels=[[2018, 2019], ['a', 'b']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]])
From op better layout format 从op更好的布局格式
df.columns = pd.MultiIndex.from_product([c.astype(str).unique() for c in df.columns.levels])
Update / Caveat 更新/警告
This solution can lead to some headache. 此解决方案可能会导致一些头痛。 data.columns.codes
(formally know as data.columns.labels
) does not necessarily come in an increasing order from read_excel()
, eg FrozenList([[3, 3, 2, 2, 1, 1, 0, 0], [1, 0, 1, 0, 1, 0, 1, 0]])
can occur. data.columns.codes
( data.columns.codes
称为data.columns.labels
)不一定按从read_excel()
开始的read_excel()
排列,例如FrozenList([[3, 3, 2, 2, 1, 1, 0, 0], [1, 0, 1, 0, 1, 0, 1, 0]])
可以出现[ FrozenList([[3, 3, 2, 2, 1, 1, 0, 0], [1, 0, 1, 0, 1, 0, 1, 0]])
。 When using the .from_product()
approach here, this will cause trouble and change the order of the column names... A workaround is to save its state and write it back after the deed: 在此处使用.from_product()
方法时,这会造成麻烦并更改列名的顺序。一种解决方法是保存其状态并将其写在契约之后:
old_col_codes = df.columns.codes
df.columns = pd.MultiIndex.from_product([c.astype(str).unique() for c in df.columns.levels])
df.columns.set_codes(old_df_codes, inplace=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.