简体   繁体   English

Panda(Python):在数据框中添加一个新列,该列取决于其行值和来自另一个数据框的聚合值

[英]Panda(Python): add a new column in a data frame which depends on its row value and aggregated value from another data frame

I am new to python and pandas, so my doubt can be silly also.我是 python 和 Pandas 的新手,所以我的怀疑也很愚蠢。

Problem:问题:

So I have two data frames let's say df1 and df2 where所以我有两个数据框让我们说df1df2在哪里

df1 is like df1就像

   treatment1 treatment2     value           comparision    test          adjustment  statsig   p_value
0   Treatment    Control  0.795953     Treatment:Control  t-test  Benjamini-Hochberg    False  0.795953
1  Treatment2    Control  0.795953    Treatment2:Control  t-test  Benjamini-Hochberg    False  0.795953
2  Treatment2  Treatment  0.795953  Treatment2:Treatment  t-test  Benjamini-Hochberg    False  0.795953

and df2 is likedf2就像

     group_type  metric
0     Treatment    31.0
1    Treatment2    83.0
2     Treatment    51.0
3     Treatment    20.0
4       Control    41.0
..          ...     ...
336  Treatment3    35.0
337  Treatment3     9.0
338  Treatment3    35.0
339  Treatment3     9.0
340  Treatment3    35.0

I want to add a column mean_percentage_lift in df1 where我想在df1中添加一列mean_percentage_lift

lift_mean_percentage = (mean(treatment1)/mean(treatment2) -1) * 100

where `treatment1` and `treatment2` can be anything in `[Treatment, Control, Treatment2]`

My Approach:我的方法:

I am using the assign function of the data frame.我正在使用数据框的assign功能。

df1.assign(mean_percentage_lift = lambda dataframe: lift_mean_percentage(df2, dataframe['treatment1'], dataframe['treatment2']))

where在哪里

def lift_mean_percentage(df, treatment1, treatment2):
    treatment1_data = df[df[group_type_col] == treatment1]
    treatment2_data = df[df[group_type_col] == treatment2]
    mean1 = treatment1_data['metric'].mean()
    mean2 = treatment2_data['metric'].mean()
    return (mean1/mean2 -1) * 100

But I am getting this error Can only compare identically-labeled Series objects for line treatment1_data = df[df[group_type_col] == treatment1] .但我收到此错误Can only compare identically-labeled Series objects for treatment1_data = df[df[group_type_col] == treatment1] Is there something I am doing wrong is there any alternative to this.有什么我做错了吗?

For dataframe df2:对于数据帧 df2:

   group_type   metric
0   Treatment   31.0
1   Treatment2  83.0
2   Treatment   51.0
3   Treatment   20.0
4   Control     41.0
5   Treatment3  35.0
6   Treatment3  9.0
7   Treatment   35.0
8   Treatment3  9.0
9   Control     5.0

You can try:你可以试试:

def lift_mean_percentage(df, T1, T2):
      treatment1= df['metric'][df['group_type']==T1].mean()
      treatment2= df['metric'][df['group_type']==T2].mean()
      return (treatment1/treatment2 -1) * 100

runing:运行:

lift_mean_percentage(df2,'Treatment2','Control')

the result:结果:

260.8695652173913

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从数据框中提取行并使其成为新数据框并将其索引更改为列值 - Extract row from a data frame and make it a new data frame and change its index as like column value 熊猫数据框架绘图 - 列值作为索引 - Panda Data Frame plotting - column value as index Python:使用取决于先前值的值填充数据框中的列 - Python: Populate column in data frame with a value that depends on the previous value 从分组数据框中的行值创建新列? - Create new column from a row value in a grouped data frame? 熊猫:根据另一个数据框中的值在数据框中添加新列 - Pandas: Add a new column in a data frame based on a value in another data frame 如果值使用pandas落在另一个数据框的范围内,则从另一个数据框添加列 - add column from another data frame if the value falls under the range from the other data frame using pandas 需要为 Panda 数据框打印一个新列 - Need to print a new column for Panda data frame 在熊猫数据框中创建新列 - Create new column in panda data frame 根据 Python 中另一个数据框的另一列输入列值 - Enter column value based on another column of another data frame in Python 如果一个数据框的行值在另一数据框的列中,则创建一个新列并获取该索引 - Create a new column if one dataframe's row value is in another data frame's column and get that index
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM