繁体   English   中英

以更有效的方式过滤数据帧

[英]Filtering dataframe in more efficient way

我如何以更多的熊猫方式编写以下代码:

majority_df = df[(df.voting_majority_status_fk == 4) & (df.other == True)]
minority_df = df[(df.voting_majority_status_fk == 3)]

我需要只vp_fk是在majority_df不是在 minority_df然后发现唯一采取从majority_df唯一行vp_fk

我该如何用更多的Pandas方式写作。

majority_vp_fk = set(majority_df.vp_fk)
minority_vp_fk = set(minority_df.vp_fk)

clean_majority_vp_fk = majority_vp_fk - minority_vp_fk

clean_majority_df = majority_df[majority_df.vp_fk.isin(clean_majority_vp_fk)]
clean_majority_df = clean_majority_df.drop_duplicates(subset=['probe_fk', 'vp_fk', 'masking_box_fk', 'product_fk'])

这是我的“非常理论上的”(没有示例数据集很难测试)解决方案:

minority_df = df[(df.voting_majority_status_fk == 3)]
qry = "voting_majority_status_fk == 4 and other == True and vp_fk not in @minority_df.vp_fk"
result = (df.query(qry)
            .drop_duplicates(subset=['probe_fk', 'vp_fk', 'masking_box_fk', 'product_fk']))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM