I have Google Analytics data which I am trying to disaggregate.
Below is a simplified version of the dataframe I am dealing with:
date | users | goal_completions
20150101| 2 | 1
20150102| 3 | 2
I would like to disaggregate the data such that each "user" has its own row. In addition, the third column, "goal_completions" will also be disaggregated with the assumption that each user can only have 1 "goal_completion".
The output I am seeking will be something like this:
date | users | goal_completions
20150101| 1 | 1
20150101| 1 | 0
20150102| 1 | 1
20150102| 1 | 1
20150102| 1 | 0
I was able to duplicate each row based on the number of users on a given date, however I can't seem to find a way to disaggregate the "goal_completion" column. Here is what I currently have after duplicating the "users" column:
date | users | goal_completions
20150101| 1 | 1
20150101| 1 | 1
20150102| 1 | 2
20150102| 1 | 2
20150102| 1 | 2
Any help will be appreciated - thanks!
IIUC using repeat
create you dfs , then we adjust the two column by cumcount
with np.where
df=df.reindex(df.index.repeat(df.users))
df=df.assign(users=1)
df.goal_completions=np.where(df.groupby(level=0).cumcount()<df.goal_completions,1,0)
df
Out[609]:
date users goal_completions
0 20150101 1 1
0 20150101 1 0
1 20150102 1 1
1 20150102 1 1
1 20150102 1 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.