![](/img/trans.png)
[英]How to find correlation between two columns of pandas dataframe of one column in float64 and other column is object(string)
[英]pandas: how to compute correlation of between one column with multiple other columns?
import pandas as pd
import numpy as np
df = pd.DataFrame({'group': ['a'] * 5 + ['b'] * 5, 'x1': np.random.normal(0, 1, 10), 'x2': np.random.normal(0, 1, 10), 'y': np.random.normal(0, 1, 10)})
df
Out[4]:
group x1 x2 y
0 a -0.468746 1.254817 -1.629483
1 a -1.849347 -2.776032 1.413563
2 a 1.186306 0.766866 0.163395
3 a -0.314397 -0.531984 0.473665
4 a 0.278961 0.510429 1.484343
5 b 2.240489 0.856263 0.369464
6 b 2.029284 1.020894 -0.042139
7 b 1.571930 -0.415627 0.865577
8 b 0.609133 1.370543 0.450230
9 b -1.820421 -0.211467 0.704480
我想計算y
和同一數據幀的某些特定(非全部)列之間的相關性,以生成一個輸出數據框,如下所示:
Out[5]:
x1 x2
a -0.168390 -0.622155
b -0.467561 -0.771757
我試過使用單線像:
df.groupby('group')[['x1', 'x2']].apply(...some function here that takes y as argument...)
但是,我在如何編寫函數時遇到困難,以便它將遍歷指定的列( x1
和x2
)以及如何將y
指定為固定列。
有誰知道一個優雅的單行程,可以實現這一目標?
使用groupby
+ corrwith
df.groupby('group').apply(lambda d: d.filter(like='x').corrwith(d.y))
x1 x2
group
a 0.127141 0.434080
b -0.892755 0.524215
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.