[英]How to select rows in dataframe based on a condition
I have an emails dataframe in which I have given this query:我有一封电子邮件 dataframe 我在其中给出了以下查询:
williams = emails[emails["employee"] == "kean-s"]
This selects all the rows that have employee kean-s.这将选择所有具有员工 kean-s 的行。 Then I count the frequencies and print the top most.然后我计算频率并打印最多。 This is how it's done:这是如何完成的:
williams["X-Folder"].value_counts()[:10]
This gives output like this:这给 output 像这样:
attachments 2026
california 682
heat wave 244
ferc 188
pr-crisis management 92
federal legislation 88
rto 78
india 75
california - working group 72
environmental issues 71
Now, I need to print all the rows from emails that has X_Folder column equal to attachments, california, heat way etc. How do I go about it?现在,我需要打印电子邮件中 X_Folder 列等于附件、加利福尼亚、热方式等的所有行。我该如何处理它? When I print values[0] it simply returns the frequency number and not the term corresponding to it (tried printing it because if I'm able to loop through it, Ill just put a condition inside dataframe)当我打印 values[0] 时,它只返回频率数而不是与其对应的术语(尝试打印它,因为如果我能够循环遍历它,我只会在数据帧中放置一个条件)
Use Series.isin
with boolean indexing
for values of index:将Series.isin
与boolean indexing
用于索引值:
df = williams[williams["X-Folder"].isin(williams["X-Folder"].value_counts()[:10].index)]
Or:或者:
df = williams[williams["X-Folder"].isin(williams["X-Folder"].value_counts().index[:10])]
If need filter all rows in original DataFrame
(also rows with not matched kean-s
) then use:如果需要过滤原始DataFrame
中的所有行(以及不匹配 kean kean-s
行),则使用:
df1 = emails[emails["X-Folder"].isin(williams["X-Folder"].value_counts().index[:10])]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.