I have sum of transactions grouped by date_month
, device
and channel
like so
date_month device channel transactions
2017-01-01 desktop AFFILIATES 413
2017-01-01 mobile AFFILIATES 501
2017-01-01 other AFFILIATES 22
2017-01-01 tablet AFFILIATES 250
2017-01-01 desktop DIRECT 13979
etc... etc... etc... etc...
date_month range is from 2017-01-01
to current date
What I'm trying to do is split the device
's other
field into either mobile
, desktop
or tablet
Example process:
'other'
with its value transactions
as an extra column ( other_transactions
) transactions
partitioned/grouped by date_month
and channel
( total_transactions
)transactions
by total_transactions
to get percent total ( percent_total
)other_transactions
and percent_total
to get other_split
other_split
to transactions
to get an updated transactions fieldGetting the totals and applying simple math operations shouldn't be a problem. I would do something along the lines of df['total_transactions']=df.groupby(['date_month', 'channel'])['transactions'].transform('sum')
to get total_transactions
but the issue I'm having is getting the other
transactions into a separate column like so
date_month device channel transactions other_trans
2017-01-01 desktop AFFILIATES 413 22
2017-01-01 mobile AFFILIATES 501 22
2017-01-01 tablet AFFILIATES 250 22
2017-01-01 desktop DIRECT 13979 etc
etc... etc... etc... etc...
In the end, I would like to have a data frame that removes other
devices from the device
column and uses its transactions to increase the remaining device transactions based on their share of transactions for that date_month
and channel
IIUC, you can first create another dataframe using groupby
, drop the rows with others
, and then perform a merge
:
import pandas as pd
df = pd.DataFrame({'date_month': {0: '2017-01-01', 1: '2017-01-01', 2: '2017-01-01', 3: '2017-01-01', 4: '2017-01-01', 5:"2017-01-01"},
'device': {0: 'desktop', 1: 'mobile', 2: 'other', 3: 'tablet', 4: 'desktop', 5:"other"},
'channel': {0: 'AFFILIATES', 1: 'AFFILIATES', 2: 'AFFILIATES', 3: 'AFFILIATES', 4: 'DIRECT', 5: 'DIRECT'},
'transactions': {0: 413, 1: 501, 2: 22, 3: 250, 4: 13979, 5: 234}})
other = df.groupby("device").get_group("other")[["date_month","channel","transactions"]]
df = df.drop(df[df["device"].str.contains("other")].index)
df = df.merge(other, on=["date_month","channel"], how="left", suffixes=("","_other"))
print (df)
Result:
date_month device channel transactions transactions_other
0 2017-01-01 desktop AFFILIATES 413 22
1 2017-01-01 mobile AFFILIATES 501 22
2 2017-01-01 tablet AFFILIATES 250 22
3 2017-01-01 desktop DIRECT 13979 234
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.