[英]Groupby + correlation between DataFrame and Series
I have a DataFrame a
and Series b
. 我有一个DataFrame a
和Series b
。 I want to find conditional correlation of each column of a
to b
, conditional on the value of b
. 我想找到的每列的条件相关a
到b
,上值条件b
。 Specifically, I'm using pd.cut
to break up b
into 5 groups. 具体来说,我正在使用pd.cut
将b
分成5组。 But instead of a standard quantile, I'm using standard deviations of b
above or below the mean. 但是我使用的不是标准分位数,而是使用均值之上或之下的b
标准偏差。
np.random.seed(123)
a = (pd.DataFrame(np.random.randn(1000,3))
.add_prefix('col'))
b = pd.Series(np.random.randn(1000))
mu, sigma = b.mean(), b.std()
breakpoints = mu + np.array([-2., -1., 1., 2.]) * sigma
breakpoints = np.append(np.insert(breakpoints, 0, -np.inf), np.inf)
# There are now 6 breakpoints to create 5 groupings:
# array([ -inf, -1.91260048, -0.9230609 , 1.05601827, 2.04555785,
# inf])
labels = ['[-inf,-2]', '(-2,-1]', '(-1,1]', '(1,2]', '(2,inf]']
groups = pd.cut(b, bins=breakpoints, labels=labels)
All is good through here. 通过这里一切都很好。 I'm hung up on the final line, using .corrwith
with .groupby
, which throws a ValueError
: 我挂在最后一行,将.corrwith
与.groupby
.corrwith
使用,这会引发ValueError
:
a.groupby(groups).corrwith(b.groupby(groups))
Any ideas? 有任何想法吗? The result of a.corrwith(b)
is a Series, so I'm thinking the result here should be a DataFrame with the groups/buckets as columns. a.corrwith(b)
的结果是一个Series,所以我认为这里的结果应该是一个以组/存储桶为列的DataFrame。 For example, one column would be: 例如,一列将是:
print(a[b < breakpoints[1]].corrwith(b[b < breakpoints[1]]))
# Correlation conditional on that `b` is [-inf, -2 stdev]
col0 0.43708
col1 -0.08440
col2 -0.02923
dtype: float64
One solution that's functional but not pretty: 一种有效但不美观的解决方案:
full = a.join(b.to_frame(name='_drop'))
corrs = (full.groupby(groups)
.corr()
.loc[(slice(None), a.columns), '_drop']
.unstack()
.T)
print(corrs)
[-inf,-2] (-2,-1] (-1,1] (1,2] (2,inf]
col0 0.43708 0.06716 0.02437 0.01695 0.05384
col1 -0.08440 0.04208 0.05529 -0.07146 0.14766
col2 -0.02923 -0.19672 0.01519 -0.02290 -0.17101
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.