[英]Panda(Python): add a new column in a data frame which depends on its row value and aggregated value from another data frame
I am new to python and pandas, so my doubt can be silly also.我是 python 和 Pandas 的新手,所以我的怀疑也很愚蠢。
Problem:问题:
So I have two data frames let's say df1
and df2
where所以我有两个数据框让我们说df1
和df2
在哪里
df1
is like df1
就像
treatment1 treatment2 value comparision test adjustment statsig p_value
0 Treatment Control 0.795953 Treatment:Control t-test Benjamini-Hochberg False 0.795953
1 Treatment2 Control 0.795953 Treatment2:Control t-test Benjamini-Hochberg False 0.795953
2 Treatment2 Treatment 0.795953 Treatment2:Treatment t-test Benjamini-Hochberg False 0.795953
and df2
is like和df2
就像
group_type metric
0 Treatment 31.0
1 Treatment2 83.0
2 Treatment 51.0
3 Treatment 20.0
4 Control 41.0
.. ... ...
336 Treatment3 35.0
337 Treatment3 9.0
338 Treatment3 35.0
339 Treatment3 9.0
340 Treatment3 35.0
I want to add a column mean_percentage_lift
in df1
where我想在df1
中添加一列mean_percentage_lift
lift_mean_percentage = (mean(treatment1)/mean(treatment2) -1) * 100
where `treatment1` and `treatment2` can be anything in `[Treatment, Control, Treatment2]`
My Approach:我的方法:
I am using the assign
function of the data frame.我正在使用数据框的assign
功能。
df1.assign(mean_percentage_lift = lambda dataframe: lift_mean_percentage(df2, dataframe['treatment1'], dataframe['treatment2']))
where在哪里
def lift_mean_percentage(df, treatment1, treatment2):
treatment1_data = df[df[group_type_col] == treatment1]
treatment2_data = df[df[group_type_col] == treatment2]
mean1 = treatment1_data['metric'].mean()
mean2 = treatment2_data['metric'].mean()
return (mean1/mean2 -1) * 100
But I am getting this error Can only compare identically-labeled Series objects
for line treatment1_data = df[df[group_type_col] == treatment1]
.但我收到此错误Can only compare identically-labeled Series objects
for treatment1_data = df[df[group_type_col] == treatment1]
。 Is there something I am doing wrong is there any alternative to this.有什么我做错了吗?
For dataframe df2:对于数据帧 df2:
group_type metric
0 Treatment 31.0
1 Treatment2 83.0
2 Treatment 51.0
3 Treatment 20.0
4 Control 41.0
5 Treatment3 35.0
6 Treatment3 9.0
7 Treatment 35.0
8 Treatment3 9.0
9 Control 5.0
You can try:你可以试试:
def lift_mean_percentage(df, T1, T2):
treatment1= df['metric'][df['group_type']==T1].mean()
treatment2= df['metric'][df['group_type']==T2].mean()
return (treatment1/treatment2 -1) * 100
runing:运行:
lift_mean_percentage(df2,'Treatment2','Control')
the result:结果:
260.8695652173913
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.