简体   繁体   中英

How to efficiently disaggregate data from?

I have Google Analytics data which I am trying to disaggregate.

Below is a simplified version of the dataframe I am dealing with:

date    | users | goal_completions
20150101|  2    | 1
20150102|  3    | 2

I would like to disaggregate the data such that each "user" has its own row. In addition, the third column, "goal_completions" will also be disaggregated with the assumption that each user can only have 1 "goal_completion".

The output I am seeking will be something like this:

date    | users | goal_completions
20150101|  1    | 1
20150101|  1    | 0
20150102|  1    | 1
20150102|  1    | 1
20150102|  1    | 0

I was able to duplicate each row based on the number of users on a given date, however I can't seem to find a way to disaggregate the "goal_completion" column. Here is what I currently have after duplicating the "users" column:

date    | users | goal_completions
20150101|  1    | 1
20150101|  1    | 1
20150102|  1    | 2
20150102|  1    | 2
20150102|  1    | 2

Any help will be appreciated - thanks!

IIUC using repeat create you dfs , then we adjust the two column by cumcount with np.where

df=df.reindex(df.index.repeat(df.users))
df=df.assign(users=1)
df.goal_completions=np.where(df.groupby(level=0).cumcount()<df.goal_completions,1,0)
df
Out[609]: 
       date  users  goal_completions
0  20150101      1                 1
0  20150101      1                 0
1  20150102      1                 1
1  20150102      1                 1
1  20150102      1                 0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM