[英]How to create new column in pandas dataframe of a calculation that happens to every other row except the one where the calculation will go into
For example let's say I got dataframe df with series A.1 and A.2 like so:例如,假设我得到了 A.1 和 A.2 系列的 dataframe df,如下所示:
A.1 A.2
2 8
3 2
5 1
And I want to calculate let's say the difference of the means of all other rows like so:我想计算让我们说所有其他行的平均值的差异,如下所示:
A.1 A.2 B
2 8 (3+5)/2 - (2+1)/2
3 2 (2+5)/2-(8+1)/2
5 1 (2+3)/2-(8+2)/2
My code looks like this and doesn't work, how should I write it correctly?我的代码看起来像这样并且不起作用,我应该如何正确编写它?
df['B'] = mean(df['A.1'].drop(df['B'].index)))-mean(df['A.2'].drop(df['B'].index)))
I MUST totally avoid loops and do it in a panda-ish way as I'm working with huge datasets.在处理庞大的数据集时,我必须完全避免循环并以熊猫式的方式进行。
Try:尝试:
df.apply(lambda r : df.loc[df.index!=r.name,'A.1'].mean() - df.loc[df.index!=r.name,'A.2'].mean(), axis = 1)
result set is:结果集是:
0 2.5
1 -1.0
2 -2.5
dtype: float64
Note that r.name
inside lambda function is just index of current row.请注意,
r.name
function 中的 r.name 只是当前行的索引。
Another approach with no lambda at all:另一种完全没有 lambda 的方法:
(df['A.1'].sum()-df['A.1'])/(len(df)-1) - (df['A.2'].sum()-df['A.2'])/(len(df)-1)
result is the same as above.结果和上面一样。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.