[英]How to split field value based on percentage of total
I have sum of transactions grouped by date_month
, device
and channel
like so我有按date_month
、 device
和channel
分组的交易总和,如下所示
date_month device channel transactions
2017-01-01 desktop AFFILIATES 413
2017-01-01 mobile AFFILIATES 501
2017-01-01 other AFFILIATES 22
2017-01-01 tablet AFFILIATES 250
2017-01-01 desktop DIRECT 13979
etc... etc... etc... etc...
date_month range is from 2017-01-01
to current date date_month 范围是从2017-01-01
到当前日期
What I'm trying to do is split the device
's other
field into either mobile
, desktop
or tablet
我正在尝试做的是将device
的other
领域拆分为mobile
、 desktop
或tablet
Example process:示例流程:
'other'
with its value transactions
as an extra column ( other_transactions
) Pivot 设备'other'
,其价值transactions
作为额外列 ( other_transactions
)transactions
partitioned/grouped by date_month
and channel
( total_transactions
)获取按date_month
和channel
( total_transactions
) 分区/分组的transactions
总数transactions
by total_transactions
to get percent total ( percent_total
)然后将transactions
除以total_transactions
以获得总百分比( percent_total
)other_transactions
and percent_total
to get other_split
将other_transactions
和other_split
相乘得到percent_total
other_split
to transactions
to get an updated transactions field将other_split
添加到transactions
以获取更新的 transactions 字段Getting the totals and applying simple math operations shouldn't be a problem.获取总数并应用简单的数学运算应该不是问题。 I would do something along the lines of df['total_transactions']=df.groupby(['date_month', 'channel'])['transactions'].transform('sum')
to get total_transactions
but the issue I'm having is getting the other
transactions into a separate column like so我会按照df['total_transactions']=df.groupby(['date_month', 'channel'])['transactions'].transform('sum')
的方式做一些事情来获得total_transactions
但我遇到的问题拥有正在将other
交易放入单独的列中,就像这样
date_month device channel transactions other_trans
2017-01-01 desktop AFFILIATES 413 22
2017-01-01 mobile AFFILIATES 501 22
2017-01-01 tablet AFFILIATES 250 22
2017-01-01 desktop DIRECT 13979 etc
etc... etc... etc... etc...
In the end, I would like to have a data frame that removes other
devices from the device
column and uses its transactions to increase the remaining device transactions based on their share of transactions for that date_month
and channel
最后,我希望有一个数据框,它从device
列中删除other
设备,并使用其交易来根据他们在该date_month
和channel
的交易份额来增加剩余的设备交易
IIUC, you can first create another dataframe using groupby
, drop the rows with others
, and then perform a merge
: IIUC,您可以先使用groupby
创建另一个 dataframe ,将行与others
行一起删除,然后执行merge
:
import pandas as pd
df = pd.DataFrame({'date_month': {0: '2017-01-01', 1: '2017-01-01', 2: '2017-01-01', 3: '2017-01-01', 4: '2017-01-01', 5:"2017-01-01"},
'device': {0: 'desktop', 1: 'mobile', 2: 'other', 3: 'tablet', 4: 'desktop', 5:"other"},
'channel': {0: 'AFFILIATES', 1: 'AFFILIATES', 2: 'AFFILIATES', 3: 'AFFILIATES', 4: 'DIRECT', 5: 'DIRECT'},
'transactions': {0: 413, 1: 501, 2: 22, 3: 250, 4: 13979, 5: 234}})
other = df.groupby("device").get_group("other")[["date_month","channel","transactions"]]
df = df.drop(df[df["device"].str.contains("other")].index)
df = df.merge(other, on=["date_month","channel"], how="left", suffixes=("","_other"))
print (df)
Result:结果:
date_month device channel transactions transactions_other
0 2017-01-01 desktop AFFILIATES 413 22
1 2017-01-01 mobile AFFILIATES 501 22
2 2017-01-01 tablet AFFILIATES 250 22
3 2017-01-01 desktop DIRECT 13979 234
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.