[英]How to select nested columns in a multi-indexed pandas dataframe
I created a 3D Pandas dataframe like this: 我创建了一个像这样的3D Pandas数据帧:
A= ['ECFP', 'ECFP', 'ECFP', 'FCFP', 'FCFP', 'FCFP', 'RDK5', 'RDK5', 'RDK5']
B = ['R', 'tau', 'RMSEc', 'R', 'tau', 'RMSEc', 'R', 'tau', 'RMSEc']
C = array([[ 0.1 , 0.3 , 0.5 , nan, 0.6 , 0.4 ],
[ 0.4 , 0.3 , 0.3 , nan, 0.4 , 0.3 ],
[ 1.2 , 1.3 , 1.1 , nan, 1.5 , 1. ],
[ 0.4 , 0.3 , 0.4 , 0.8 , 0.1 , 0.2 ],
[ 0.2 , 0.3 , 0.3 , 0.3 , 0.5 , 0.6 ],
[ 1. , 1.2 , 1. , 0.9 , 1.2 , 1. ],
[ 0.4 , 0.7 , 0.5 , 0.4 , 0.6 , 0.6 ],
[ 0.6 , 0.5 , 0.3 , 0.3 , 0.3 , 0.5 ],
[ 1.2 , 1.5 , 1.3 , 0.97, 1.5 , 1. ]])
df = pd.DataFrame(data=C.T, columns=pd.MultiIndex.from_tuples(zip(A,B)))
df = df.dropna(axis=0, how='any')
The final Dataframe looks like this: 最终的Dataframe如下所示:
ECFP FCFP RDK5
R tau RMSEc R tau RMSEc R tau RMSEc
0 0.1 0.4 1.2 0.4 0.2 1.0 0.4 0.6 1.2
1 0.3 0.3 1.3 0.3 0.3 1.2 0.7 0.5 1.5
2 0.5 0.3 1.1 0.4 0.3 1.0 0.5 0.3 1.3
4 0.6 0.4 1.5 0.1 0.5 1.2 0.6 0.3 1.5
5 0.4 0.3 1.0 0.2 0.6 1.0 0.6 0.5 1.0
How can I get the correlation matrix only between 'R' values for all types of data ('ECFP', 'FCFP', 'RDK5')? 如何才能在所有类型的数据('ECFP','FCFP','RDK5')的'R'值之间获得相关矩阵?
use IndexSlice : 使用IndexSlice :
In [53]: df.loc[:, pd.IndexSlice[:, 'R']]
Out[53]:
ECFP FCFP RDK5
R R R
0 0.1 0.4 0.4
1 0.3 0.3 0.7
2 0.5 0.4 0.5
4 0.6 0.1 0.6
5 0.4 0.2 0.6
By using slice
通过使用
slice
df.loc[:,(slice(None),'R')]
Out[375]:
ECFP FCFP RDK5
R R R
0 0.1 0.4 0.4
1 0.3 0.3 0.7
2 0.5 0.4 0.5
4 0.6 0.1 0.6
5 0.4 0.2 0.6
Both answers work, but first I must lexstort, otherwise I get this error: 这两个答案都有效,但首先我必须lexstort,否则我会收到此错误:
KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (1)'
The solution is: 解决方案是:
df.sortlevel(axis=1, inplace=True)
print "Correlation matrix of Pearson's R values among all feature vector types:"
df.loc[:, pd.IndexSlice[:, 'R']].corr()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.