简体   繁体   English

将单独的函数应用于我的数据框,同时还对特定列求和

[英]Applying a seperate function to my Dataframe while also summing specific columns

I have a problem in which I have to summarize a lot of data and group it by three columns to summarize the data. 我有一个问题,我必须总结很多数据并将其按三列分组以总结数据。 A problem is that one specific column needs to have a specific formula applied to it as well. 一个问题是,一个特定的列也需要应用特定的公式。

My data looks like this: 我的数据如下所示:

Account Format  Network  Impressions Clicks Cost    Avg. position
Health1 Text     Search        2       0      0.5       1
Health1 Picture  Search        5       2      1        1.5
Health1 Picture  Search        1       2      3        2.4
Health1 Text     Search        1       0      0        2.3
Health1 Text     Display       2       0      0.7      1.7
Health2 Text     Display       0       0      0        3.3
Health2 Text     Display       2       2      4        3.3
Health2 Picture  Search        2       0      0        3.4
.....

So I need to group by the Account, Format and Network and summarize Impressions, Cost and Clicks like so, and summarize for each group: 因此,我需要按“帐户”,“格式”和“网络”进行分组,并像这样总结展示次数,费用和点击次数,并对每个组进行总结:

Account Format  Network  Impressions Clicks Cost    Avg. position
Health1 Text     Search        3       0      0.5       x
Health1 Picture  Search        6       4      4         x
Health2 Text     Display       2       2      4         x
Health2 Picture  Search        2       0      0         x
.....

However, to calculate Avg. 但是,要计算平均 Position I need to apply a formula, my brain is kind of fried from working on stuff like this all day so any help would be a lifesaver. 职位我需要运用一个公式,整天从事此类工作使我的大脑有些发麻,因此任何帮助都将挽救生命。 The Avg. 平均 Position column needs to have this formula applied to it: 排名列需要对其应用以下公式:

sum(impressions*Avg. Position)/sum(impressions) 总和(展示次数*平均排名)/总和(展示次数)

My attempt was to multiply the Avg. 我的尝试是提高平均水平。 Position column by impressions on a row to row basis, summarize the column along with the others and then divide by the summed impressions. 按展示次数逐行放置列,将列与其他列一起汇总,然后除以总展示次数。 This is not correct apparently, as it returns values < 1, which is not a possible output in the context of the data I am using. 这显然是不正确的,因为它返回的值<1,这在我正在使用的数据的上下文中是不可能的输出。

frame['Avg. position'] = frame.apply(lambda x: (x['Impressions']*x['Avg. position']), axis=1)
frame = frame.groupby(['Account', 'Format', 'Network'])['Impressions','Clicks','Cost','Avg. position'].sum().reset_index()

frame['Avg. position'] = frame.apply(lambda x: (x['Avg. position']/x['Impressions']) if x['Impressions'] > 0 else '', axis=1)

frame.to_csv(yesterday_date+'.csv', index=False)

The correct way to use apply to update your "Avg. position" column is this: 申请更新“平均排名”列的正确方法是:

denominator = frame['Impressions'].sum()
frame['Avg. position'] = frame[['Impressions', 'Avg. position']].apply(lambda x: x[0] * x[1] / denominator, axis=1)

BUT ... with Series objects you can use element-by-element operations: 但是...通过Series对象,您可以使用逐个元素的操作:

frame['Avg. position'] = frame['Impressions'] * frame['Avg. position'] / frame['Impressions'].sum()

which means you could also do something like this: 这意味着您还可以执行以下操作:

frame['Cost'] = frame['Cost'] / 1000000

I'm not exactly sure what you are trying to do with the groupby but it seems like you should have all the tools to figure it out now. 我不确定您要对groupby做什么,但似乎您应该拥有所有工具才能解决。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM