简体   繁体   中英

select rows from Dataframe based on aggregated value

I have a DataFrame of patient information that is keyed by patient/visit. I want to select all patient/visit data for patients that have only one visit. In general I'd like to be able to select data based on any grouped and aggregated value of that data.

My current way to do it is to merge, but that is rather cumbersome.

dfg = dfmn.groupby(['pt_studyid']).size().to_frame("count").reset_index()
dfgu = dfg[dfg['count']>1]
dfmn_filt = dfgu.merge(dfmn, on=['pt_studyid']).drop('count', 1)

Is there a cleaner way?

Use the filter method of the DataFrameGroupBy object

dfmn.groupby('pt_studyid').filter(lambda x: len(x) > 1)

Example

dfmn = pd.DataFrame(dict(pt_studyid=list('AAAABBBCDEFFF'), val=range(13)))
dfmn

   pt_studyid  val
0           A    0
1           A    1
2           A    2
3           A    3
4           B    4
5           B    5
6           B    6
7           C    7
8           D    8
9           E    9
10          F   10
11          F   11
12          F   12

Filter

print(dfmn.groupby('pt_studyid').filter(lambda x: len(x) > 1))

   pt_studyid  val
0           A    0
1           A    1
2           A    2
3           A    3
4           B    4
5           B    5
6           B    6
10          F   10
11          F   11
12          F   12

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM