I have a DataFrame of patient information that is keyed by patient/visit. I want to select all patient/visit data for patients that have only one visit. In general I'd like to be able to select data based on any grouped and aggregated value of that data.
My current way to do it is to merge, but that is rather cumbersome.
dfg = dfmn.groupby(['pt_studyid']).size().to_frame("count").reset_index()
dfgu = dfg[dfg['count']>1]
dfmn_filt = dfgu.merge(dfmn, on=['pt_studyid']).drop('count', 1)
Is there a cleaner way?
Use the filter
method of the DataFrameGroupBy
object
dfmn.groupby('pt_studyid').filter(lambda x: len(x) > 1)
Example
dfmn = pd.DataFrame(dict(pt_studyid=list('AAAABBBCDEFFF'), val=range(13)))
dfmn
pt_studyid val
0 A 0
1 A 1
2 A 2
3 A 3
4 B 4
5 B 5
6 B 6
7 C 7
8 D 8
9 E 9
10 F 10
11 F 11
12 F 12
Filter
print(dfmn.groupby('pt_studyid').filter(lambda x: len(x) > 1))
pt_studyid val
0 A 0
1 A 1
2 A 2
3 A 3
4 B 4
5 B 5
6 B 6
10 F 10
11 F 11
12 F 12
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.