简体   繁体   中英

Pandas groupby on list of lists with atleast one element common

I am analyzing a CSV file with names corresponding to their mobile numbers list. 数据框

Now, I wish to group by this dataset over 'phone_number' where at least one of the numbers in the list matches with others.

For example,** if Dr. ABC has phone_number=['1234','3456','7890'] in one of the samples & Dr. ABC has phone number=['7676','1234','8765'] in other sample, these rows should be aggregated together as '1234' is common. Though rows without any match should also be retained

The required output is list of rx_id after grouping over phone_number like this.Can this be done using pandas groupby()? or some other trick. Thanks for the help!!

IIUC you can use explode and duplicated :

df = pd.DataFrame({"doctor_name":["Dr. ABC","Dr. ABC", "Dr. Who","Dr. Strange"],
                   "phone_number":[['1234','3456','7890'],['7676','1234','8765'], np.NaN, ["8697059406"]]})

df = df.explode("phone_number")

s = df["doctor_name"].value_counts()

print (df[df.duplicated("phone_number")|df["doctor_name"].isin(s[s.eq(1)].index)]) #add .groupby("doctor_name").agg(list) if you want them back into a list

   doctor_name phone_number
1      Dr. ABC         1234
2      Dr. Who          NaN
3  Dr. Strange   8697059406

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM