[英]select rows from Dataframe based on aggregated value
I have a DataFrame of patient information that is keyed by patient/visit. 我有一个按患者/就诊方式输入的患者信息的数据框。 I want to select all patient/visit data for patients that have only one visit.
我想为仅一次就诊的患者选择所有患者/就诊数据。 In general I'd like to be able to select data based on any grouped and aggregated value of that data.
通常,我希望能够根据该数据的任何分组和汇总值来选择数据。
My current way to do it is to merge, but that is rather cumbersome. 我目前的做法是合并,但这很麻烦。
dfg = dfmn.groupby(['pt_studyid']).size().to_frame("count").reset_index()
dfgu = dfg[dfg['count']>1]
dfmn_filt = dfgu.merge(dfmn, on=['pt_studyid']).drop('count', 1)
Is there a cleaner way? 有没有更清洁的方法?
Use the filter
method of the DataFrameGroupBy
object 使用
DataFrameGroupBy
对象的filter
方法
dfmn.groupby('pt_studyid').filter(lambda x: len(x) > 1)
Example 例
dfmn = pd.DataFrame(dict(pt_studyid=list('AAAABBBCDEFFF'), val=range(13)))
dfmn
pt_studyid val
0 A 0
1 A 1
2 A 2
3 A 3
4 B 4
5 B 5
6 B 6
7 C 7
8 D 8
9 E 9
10 F 10
11 F 11
12 F 12
Filter 过滤
print(dfmn.groupby('pt_studyid').filter(lambda x: len(x) > 1))
pt_studyid val
0 A 0
1 A 1
2 A 2
3 A 3
4 B 4
5 B 5
6 B 6
10 F 10
11 F 11
12 F 12
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.