使用pandas read_csv方法进行Python多级索引

Question

I want to read the following table as a pandas dataframe 我想将下表作为熊猫数据框阅读

Say the dataframe is df, the purpose is to query df['acct_id']['A']['0-3_mon] should give me 10 假设数据帧为df，目的是查询df ['acct_id'] ['A'] ['0-3_mon]应该给我10

I have done it for panel data, where everything is a column and then you create a multi-level-index for both cross-section and time-series. 我已经完成了面板数据的准备工作，其中所有内容都是一列，然后您为横截面和时间序列创建了一个多级索引。

But over here, the source data itself has more than two levels of columns. 但是在这里，源数据本身具有两个以上级别的列。 How do I read this csv as a multi-level index? 如何将此csv读为多级索引？ I am stuck here, any idea. 我被困在这里，任何想法。

Some of the similar work if you want to look at - https://lectures.quantecon.org/py/pandas_panel.html 如果您想看一些类似的作品-https://lectures.quantecon.org/py/pandas_panel.html

Thanks a lot. 非常感谢。

Answer 1

Create DataFrame with MultiIndex , because deprecate panel : 使用MultiIndex创建DataFrame ，因为deprecate panel ：

df = pd.read_csv(file, header=[0,1], index_col=[0])

And then select by slicers : 然后通过切片器选择：

idx = pd.IndexSlice
print (df.loc[1, idx['A', '0-3_mon']])

Sample : with no Multindex names: Sample ：没有Multindex名称：

import pandas as pd

temp=u"""A;A;B;B
0-3_mon;3-6_mon;0-3_mon;3-6_mon
1;10;12;14;18
2;11;15;17;19
3;13;16;21;20"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", header=[0,1])
print (df)
        A               B        
  0-3_mon 3-6_mon 0-3_mon 3-6_mon
1      10      12      14      18
2      11      15      17      19
3      13      16      21      20

print (df.columns)
MultiIndex(levels=[['A', 'B'], ['0-3_mon', '3-6_mon']],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]])

idx = pd.IndexSlice
print (df.loc[1, idx['A', '0-3_mon']])
10

Sample with specified names of MultiIndex: 与多指标的指定名称示例：

import pandas as pd

temp=u"""acct_id;A;A;B;B
level;0-3_mon;3-6_mon;0-3_mon;3-6_mon
1;10;12;14;18
2;11;15;17;19
3;13;16;21;20"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", index_col=[0], header=[0,1])
print (df)
acct_id       A               B        
level   0-3_mon 3-6_mon 0-3_mon 3-6_mon
1            10      12      14      18
2            11      15      17      19
3            13      16      21      20

print (df.columns)

MultiIndex(levels=[['A', 'B'], ['0-3_mon', '3-6_mon']],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=['acct_id', 'level'])

idx = pd.IndexSlice
print (df.loc[1, idx['A', '0-3_mon']])
10

使用pandas read_csv方法进行Python多级索引

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-08-29 12:32:06

使用pandas read_csv方法进行Python多级索引

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-08-29 12:32:06

解决方案1
2 已采纳 2018-08-29 12:32:06