简体   繁体   English

根据汇总值从数据框中选择行

[英]select rows from Dataframe based on aggregated value

I have a DataFrame of patient information that is keyed by patient/visit. 我有一个按患者/就诊方式输入的患者信息的数据框。 I want to select all patient/visit data for patients that have only one visit. 我想为仅一次就诊的患者选择所有患者/就诊数据。 In general I'd like to be able to select data based on any grouped and aggregated value of that data. 通常,我希望能够根据该数据的任何分组和汇总值来选择数据。

My current way to do it is to merge, but that is rather cumbersome. 我目前的做法是合并,但这很麻烦。

dfg = dfmn.groupby(['pt_studyid']).size().to_frame("count").reset_index()
dfgu = dfg[dfg['count']>1]
dfmn_filt = dfgu.merge(dfmn, on=['pt_studyid']).drop('count', 1)

Is there a cleaner way? 有没有更清洁的方法?

Use the filter method of the DataFrameGroupBy object 使用DataFrameGroupBy对象的filter方法

dfmn.groupby('pt_studyid').filter(lambda x: len(x) > 1)

Example

dfmn = pd.DataFrame(dict(pt_studyid=list('AAAABBBCDEFFF'), val=range(13)))
dfmn

   pt_studyid  val
0           A    0
1           A    1
2           A    2
3           A    3
4           B    4
5           B    5
6           B    6
7           C    7
8           D    8
9           E    9
10          F   10
11          F   11
12          F   12

Filter 过滤

print(dfmn.groupby('pt_studyid').filter(lambda x: len(x) > 1))

   pt_studyid  val
0           A    0
1           A    1
2           A    2
3           A    3
4           B    4
5           B    5
6           B    6
10          F   10
11          F   11
12          F   12

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM