简体   繁体   中英

How to Calculate Dropoff by Unique Field in Pandas DataFrame with Duplicates

import numpy as np
import pandas as pd
df = pd.DataFrame({
  'user' : ['A', 'A', 'B', 'B', 'B', 'C', 'C'],
  'step_1' : [True, True, True, True, True, True, True],
  'step_2' : [True, False, False, True, False, True, True],
  'step_3' : [False, False, False, False, False, True, True]
})
print(df)
  user  step_1  step_2  step_3
0    A    True    True   False
1    A    True   False   False
2    B    True   False   False
3    B    True    True   False
4    B    True   False   False
5    C    True    True    True
6    C    True    True    True

I would like to run the calculation to see what fraction of users get to each step. I have multiple observations of some users, and the order cannot be counted on to simply do a df.drop_duplicates( subset = ['user'] ) .

In this case, the answer should be:

  • Step 1 = 1.00 (because A, B, and C all have a True in Step 1)
  • Step 2 = 1.00 (A, B, C)
  • Step 3 = 0.33 (C)

(I do not need to worry about any edge case in which a user goes from False in one step to True in a subsequent step within the same row.)

In your case you can do

df.groupby('user').any().mean()
Out[11]: 
step_1    1.000000
step_2    1.000000
step_3    0.333333
dtype: float64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM