[英]How to filter in a Pandas groupby object for 1 value and use it to calculate a new column?
I'm currently working with a DataFrame (df) like this: 我目前正在使用这样的DataFrame(df):
df = pd.DataFrame({'fc_group': ['A', 'A', 'A','B', 'B', 'B', 'B', 'A', 'A', 'B','B'],
'dt': ['2015-05-08', '2015-05-08', '2015-05-08', '2015-05-08',
'2015-05-08', '2015-05-08', '2015-05-08', '2015-05-09',
'2015-05-09', '2015-05-09', '2015-05-09'],
'day': [0,1,2,0,1,2,3,1,2,0,1],
'value' : [50,150,200,60,170,220,378,140,240,700,1700]})
fc_group dt day value
0 A 2015-05-08 0 50
1 A 2015-05-08 1 150
2 A 2015-05-08 2 200
3 B 2015-05-08 0 60
4 B 2015-05-08 1 170
5 B 2015-05-08 2 220
6 B 2015-05-08 3 378
7 A 2015-05-09 1 140
8 A 2015-05-09 2 240
9 B 2015-05-09 0 700
10 B 2015-05-09 1 1700
I want to group this by "fc_group" and "dt" and create a new column named "new_column" that is calculated by 我想通过“ fc_group”和“ dt”对它进行分组,并创建一个名为“ new_column”的新列,该列由
df[value] / df[df[day] == 0][value] df [value] / df [df [day] == 0] [value]
or 要么
np.nan if there is no day 0 row in a group. np.nan如果组中没有第0天行。
The result should look like this (I've highlighted the resulting groups) 结果应如下所示(我突出显示了结果组)
fc_group dt day value new_column
0 A 2015-05-08 0 50 1.00
1 A 2015-05-08 1 150 3.00
2 A 2015-05-08 2 200 4.00
3 B 2015-05-08 0 60 1.00
4 B 2015-05-08 1 170 2.83
5 B 2015-05-08 2 220 3.67
6 B 2015-05-08 3 378 6.30
7 A 2015-05-09 1 140 NaN
8 A 2015-05-09 2 240 NaN
9 B 2015-05-09 0 700 1.00
10 B 2015-05-09 1 1700 2.43
Is there a sleek pythonic way to achieve this? 是否有一种时尚的pythonic方法来实现这一目标? Either a custom function called by .apply or even in a lambda function?
是由.apply调用的自定义函数,还是lambda函数? I have tried several approaches but none seem to work (eg with lambda functions I fail to get the one specific value of day 0, with customs functions and apply I get "incompatible index" errors)
我已经尝试了几种方法,但是似乎都没有用(例如,使用lambda函数时,我无法获得第0天的一个特定值,而使用海关函数并应用时,出现“不兼容索引”错误)
The only working solution I found is to create a groupby object, then manually iterate over each group using a for-loop, perform the column creation, then recombine all subgroups. 我发现的唯一可行的解决方案是创建一个groupby对象,然后使用for循环手动遍历每个组,执行列创建,然后重新组合所有子组。 This is quite slow and seems highly inefficient.
这相当慢,而且效率极低。 Thank you for your help :)
谢谢您的帮助 :)
First filter only 0
values by eq
with boolean indexing
, then merge
with left join and divide by div
: 首先使用
boolean indexing
通过eq
仅过滤0
值,然后与左连接merge
并除以div
:
new = df[df['day'].eq(0)].rename(columns={'value':'new'})
#if possible multiple `0` values per columns 'fc_group','dt' get first rows only
#new = df[df['day'].eq(0)].drop_duplicates(subset=['fc_group','dt']).rename(columns={'value':'new'})
df['new'] = df['value'].div(df.merge(new, how='left', on=['fc_group','dt'])['new'])
print (df)
fc_group dt day value new
0 A 2015-05-08 0 50 1.000000
1 A 2015-05-08 1 150 3.000000
2 A 2015-05-08 2 200 4.000000
3 B 2015-05-08 0 60 1.000000
4 B 2015-05-08 1 170 2.833333
5 B 2015-05-08 2 220 3.666667
6 B 2015-05-08 3 378 6.300000
7 A 2015-05-09 1 140 NaN
8 A 2015-05-09 2 240 NaN
9 B 2015-05-09 0 700 1.000000
10 B 2015-05-09 1 1700 2.428571
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.