简体   繁体   中英

How to find the number of not unique rows after groupby()

I have a data frame df with two features: ID_owner , ID_phone , I have to find:

  1. How many people have more than n phones.
  2. Phones shared among more owners, ID_phone having one or more ID_owner .

In order to answer the first question, I have tried:

df.groupby('`ID_owner`')['`ID_phone'].nunique().to_frame()

It seems doesn't work because I need to count the number of duplicates rows per ID_owner after the grouping. I have encountered the same issue in the second question.

I would like to know if exist a specific method or function in pandas for this kind of issues.

The output, for the first question, should be a dataframe with two columns: one showing the ID_owner and the second with the number of smartphones that ID_owner owns.

It looks like you were slicing your table prematurely though it seems like you want to keep the aggregated table. To answer your first question the following code would work.

n = 2

(df.groupby('ID_owner').agg({'ID_phone': pd.Series.nunique}).query('ID_phone > @n').shape[0]

To answer your second question you can reverse the IDs in the above query, change n, and select the "ID_phone" column.

df1.groupby('ID_owner').agg({'ID_phone': 'unique'}).reset_index()

or you can use the following way

df1.groupby('User_owner').apply(lambda x:x.zipcode.unique()).reset_index()

this will give you the output:

    User_owner  zipcode
0   Dave        [34567]
1   Donald      [34353]
2   Jae         [12345]
3   Shankar     [23456, 22222]

but for count you can use, nunique function:

df1.groupby('ID_owner').agg({'ID_phone': 'nunique'}).reset_index().rename(columns = {'zipcode':'count'})

or

df1.groupby('User_owner').apply(lambda x:x.zipcode.nunique()).reset_index(name ='count')

which will result in

    User_owner  count
0   Dave        1
1   Donald      1
2   Jae         1
3   Shankar     2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM