简体   繁体   English

在python中将单个熊猫索引转换为三级MultiIndex

[英]Converting a single pandas index into a three level MultiIndex in python

I have some data in a pandas dataframe which looks like this: 我在pandas数据框中有一些数据,如下所示:

gene                                  VIM  
time:2|treatment:TGFb|dose:0.1  -0.158406  
time:2|treatment:TGFb|dose:1     0.039158  
time:2|treatment:TGFb|dose:10   -0.052608  
time:24|treatment:TGFb|dose:0.1  0.157153  
time:24|treatment:TGFb|dose:1    0.206030  
time:24|treatment:TGFb|dose:10   0.132580  
time:48|treatment:TGFb|dose:0.1 -0.144209  
time:48|treatment:TGFb|dose:1   -0.093910  
time:48|treatment:TGFb|dose:10  -0.166819  
time:6|treatment:TGFb|dose:0.1   0.097548  
time:6|treatment:TGFb|dose:1     0.026664  
time:6|treatment:TGFb|dose:10   -0.008032  

where the left is an index. 左边是索引。 This is just a subsection of the data which is actually much larger. 这只是数据的一部分,实际上要大得多。 The index is composed of three components, time, treatment and dose. 该指标由时间,治疗和剂量三部分组成。 I want to reorganize this data such that I can access it easily by slicing. 我想重新组织这些数据,以便可以通过切片轻松访问它。 The way to do this is to use pandas MultiIndexing but I don't know how to convert my DataFrame with one index into another with three. 这样做的方法是使用pandas MultiIndexing,但是我不知道如何将具有一个索引的DataFrame转换为具有三个索引的DataFrame。 Does anybody know how to do this? 有人知道怎么做这个吗?

To clarify, the desired output here is the same data with a three level index, the outer being treatment, middle is dose and the inner being time. 为了明确起见,此处所需的输出是具有三级索引的相同数据,外部是治疗,中间是剂量,内部是时间。 This would be useful so then I could access the data with something like df['time']['dose'] or 'df[0]` (or something to that effect at least). 这将很有用,因此我可以使用df['time']['dose']或'df [0]`之类的数据(或至少可以达到此目的的数据)来访问数据。

You can first replace unnecessary strings (index has to be converted to Series by to_series , because replace doesnt work with index yet) and then use split . 您可以首先replace不必要的字符串(索引必须由to_series转换为Series ,因为replace尚不适用于index ),然后使用split Last set index names by rename_axis (new in pandas 0.18.0 ) 通过最后一组的目录名称rename_axis (新的pandas 0.18.0

df.index = df.index.to_series().replace({'time:':'','treatment:': '','dose:':''}, regex=True)
df.index = df.index.str.split('|', expand=True)
df = df.rename_axis(('time','treatment','dose'))

print (df)
                          VIM
time treatment dose          
2    TGFb      0.1  -0.158406
               1     0.039158
               10   -0.052608
24   TGFb      0.1   0.157153
               1     0.206030
               10    0.132580
48   TGFb      0.1  -0.144209
               1    -0.093910
               10   -0.166819
6    TGFb      0.1   0.097548
               1     0.026664
               10   -0.008032

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM