如何在Pandas groupby对象中过滤1个值并使用它来计算新列？

Question

I'm currently working with a DataFrame (df) like this: 我目前正在使用这样的DataFrame（df）：

df = pd.DataFrame({'fc_group': ['A', 'A', 'A','B', 'B', 'B', 'B', 'A', 'A', 'B','B'], 
                    'dt': ['2015-05-08', '2015-05-08', '2015-05-08', '2015-05-08', 
                           '2015-05-08', '2015-05-08', '2015-05-08', '2015-05-09', 
                           '2015-05-09', '2015-05-09', '2015-05-09'], 
                    'day': [0,1,2,0,1,2,3,1,2,0,1],
                    'value' : [50,150,200,60,170,220,378,140,240,700,1700]})

   fc_group          dt  day  value
0         A  2015-05-08    0     50
1         A  2015-05-08    1    150
2         A  2015-05-08    2    200
3         B  2015-05-08    0     60
4         B  2015-05-08    1    170
5         B  2015-05-08    2    220
6         B  2015-05-08    3    378
7         A  2015-05-09    1    140
8         A  2015-05-09    2    240
9         B  2015-05-09    0    700
10        B  2015-05-09    1   1700

I want to group this by "fc_group" and "dt" and create a new column named "new_column" that is calculated by 我想通过“ fc_group”和“ dt”对它进行分组，并创建一个名为“ new_column”的新列，该列由

df[value] / df[df[day] == 0][value] df [value] / df [df [day] == 0] [value]

or 要么

np.nan if there is no day 0 row in a group. np.nan如果组中没有第0天行。

The result should look like this (I've highlighted the resulting groups) 结果应如下所示（我突出显示了结果组）

   fc_group          dt  day  value  new_column
0         A  2015-05-08    0     50        1.00
1         A  2015-05-08    1    150        3.00
2         A  2015-05-08    2    200        4.00

3         B  2015-05-08    0     60        1.00
4         B  2015-05-08    1    170        2.83
5         B  2015-05-08    2    220        3.67
6         B  2015-05-08    3    378        6.30

7         A  2015-05-09    1    140        NaN
8         A  2015-05-09    2    240        NaN

9         B  2015-05-09    0    700        1.00
10        B  2015-05-09    1   1700        2.43

Is there a sleek pythonic way to achieve this? 是否有一种时尚的pythonic方法来实现这一目标？ Either a custom function called by .apply or even in a lambda function? 是由.apply调用的自定义函数，还是lambda函数？ I have tried several approaches but none seem to work (eg with lambda functions I fail to get the one specific value of day 0, with customs functions and apply I get "incompatible index" errors) 我已经尝试了几种方法，但是似乎都没有用（例如，使用lambda函数时，我无法获得第0天的一个特定值，而使用海关函数并应用时，出现“不兼容索引”错误）

The only working solution I found is to create a groupby object, then manually iterate over each group using a for-loop, perform the column creation, then recombine all subgroups. 我发现的唯一可行的解决方案是创建一个groupby对象，然后使用for循环手动遍历每个组，执行列创建，然后重新组合所有子组。 This is quite slow and seems highly inefficient. 这相当慢，而且效率极低。 Thank you for your help :) 谢谢您的帮助：）

Answer 1

First filter only 0 values by eq with boolean indexing , then merge with left join and divide by div : 首先使用boolean indexing通过eq仅过滤0值，然后与左连接merge并除以div ：

new = df[df['day'].eq(0)].rename(columns={'value':'new'})
#if possible multiple `0` values per columns 'fc_group','dt' get first rows only
#new = df[df['day'].eq(0)].drop_duplicates(subset=['fc_group','dt']).rename(columns={'value':'new'})
df['new'] = df['value'].div(df.merge(new, how='left', on=['fc_group','dt'])['new'])
print (df)
   fc_group          dt  day  value       new
0         A  2015-05-08    0     50  1.000000
1         A  2015-05-08    1    150  3.000000
2         A  2015-05-08    2    200  4.000000
3         B  2015-05-08    0     60  1.000000
4         B  2015-05-08    1    170  2.833333
5         B  2015-05-08    2    220  3.666667
6         B  2015-05-08    3    378  6.300000
7         A  2015-05-09    1    140       NaN
8         A  2015-05-09    2    240       NaN
9         B  2015-05-09    0    700  1.000000
10        B  2015-05-09    1   1700  2.428571

如何在Pandas groupby对象中过滤1个值并使用它来计算新列？

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-09-03 14:46:07

如何在Pandas groupby对象中过滤1个值并使用它来计算新列？

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-09-03 14:46:07

解决方案1
0 已采纳 2018-09-03 14:46:07