[英]Converting a single pandas index into a three level MultiIndex in python
I have some data in a pandas dataframe which looks like this: 我在pandas数据框中有一些数据,如下所示:
gene VIM
time:2|treatment:TGFb|dose:0.1 -0.158406
time:2|treatment:TGFb|dose:1 0.039158
time:2|treatment:TGFb|dose:10 -0.052608
time:24|treatment:TGFb|dose:0.1 0.157153
time:24|treatment:TGFb|dose:1 0.206030
time:24|treatment:TGFb|dose:10 0.132580
time:48|treatment:TGFb|dose:0.1 -0.144209
time:48|treatment:TGFb|dose:1 -0.093910
time:48|treatment:TGFb|dose:10 -0.166819
time:6|treatment:TGFb|dose:0.1 0.097548
time:6|treatment:TGFb|dose:1 0.026664
time:6|treatment:TGFb|dose:10 -0.008032
where the left is an index. 左边是索引。 This is just a subsection of the data which is actually much larger.
这只是数据的一部分,实际上要大得多。 The index is composed of three components, time, treatment and dose.
该指标由时间,治疗和剂量三部分组成。 I want to reorganize this data such that I can access it easily by slicing.
我想重新组织这些数据,以便可以通过切片轻松访问它。 The way to do this is to use pandas MultiIndexing but I don't know how to convert my DataFrame with one index into another with three.
这样做的方法是使用pandas MultiIndexing,但是我不知道如何将具有一个索引的DataFrame转换为具有三个索引的DataFrame。 Does anybody know how to do this?
有人知道怎么做这个吗?
To clarify, the desired output here is the same data with a three level index, the outer being treatment, middle is dose and the inner being time. 为了明确起见,此处所需的输出是具有三级索引的相同数据,外部是治疗,中间是剂量,内部是时间。 This would be useful so then I could access the data with something like
df['time']['dose']
or 'df[0]` (or something to that effect at least). 这将很有用,因此我可以使用
df['time']['dose']
或'df [0]`之类的数据(或至少可以达到此目的的数据)来访问数据。
You can first replace
unnecessary strings (index has to be converted to Series
by to_series
, because replace
doesnt work with index
yet) and then use split
. 您可以首先
replace
不必要的字符串(索引必须由to_series
转换为Series
,因为replace
尚不适用于index
),然后使用split
。 Last set index names by rename_axis
(new in pandas
0.18.0
) 通过最后一组的目录名称
rename_axis
(新的pandas
0.18.0
)
df.index = df.index.to_series().replace({'time:':'','treatment:': '','dose:':''}, regex=True)
df.index = df.index.str.split('|', expand=True)
df = df.rename_axis(('time','treatment','dose'))
print (df)
VIM
time treatment dose
2 TGFb 0.1 -0.158406
1 0.039158
10 -0.052608
24 TGFb 0.1 0.157153
1 0.206030
10 0.132580
48 TGFb 0.1 -0.144209
1 -0.093910
10 -0.166819
6 TGFb 0.1 0.097548
1 0.026664
10 -0.008032
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.