简体   繁体   English

如何根据条件在 dataframe 中的 select 行

[英]How to select rows in dataframe based on a condition

I have an emails dataframe in which I have given this query:我有一封电子邮件 dataframe 我在其中给出了以下查询:

williams = emails[emails["employee"] == "kean-s"]

This selects all the rows that have employee kean-s.这将选择所有具有员工 kean-s 的行。 Then I count the frequencies and print the top most.然后我计算频率并打印最多。 This is how it's done:这是如何完成的:

williams["X-Folder"].value_counts()[:10]

This gives output like this:这给 output 像这样:

attachments                   2026
california                     682
heat wave                      244
ferc                           188
pr-crisis management            92
federal legislation             88
rto                             78
india                           75
california - working group      72
environmental issues            71

Now, I need to print all the rows from emails that has X_Folder column equal to attachments, california, heat way etc. How do I go about it?现在,我需要打印电子邮件中 X_Folder 列等于附件、加利福尼亚、热方式等的所有行。我该如何处理它? When I print values[0] it simply returns the frequency number and not the term corresponding to it (tried printing it because if I'm able to loop through it, Ill just put a condition inside dataframe)当我打印 values[0] 时,它只返回频率数而不是与其对应的术语(尝试打印它,因为如果我能够循环遍历它,我只会在数据帧中放置一个条件)

Use Series.isin with boolean indexing for values of index:Series.isinboolean indexing用于索引值:

df = williams[williams["X-Folder"].isin(williams["X-Folder"].value_counts()[:10].index)]

Or:或者:

df = williams[williams["X-Folder"].isin(williams["X-Folder"].value_counts().index[:10])]

If need filter all rows in original DataFrame (also rows with not matched kean-s ) then use:如果需要过滤原始DataFrame中的所有行(以及不匹配 kean kean-s行),则使用:

df1 = emails[emails["X-Folder"].isin(williams["X-Folder"].value_counts().index[:10])]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM