[英]Count the number of values that satisfy a condition in every column of a Pandas dataframe
I have a dataframe with several columns of data.我有一个包含几列数据的 dataframe。 In the data, a -1 is equivalent to missing data.
在数据中,-1 相当于缺失数据。 I want to count the number of -1 values in each column.
我想计算每列中 -1 值的数量。
I believe I could register -1 as a NaN/missing value when I load the data and then I saw something that used isna() and counted boolean values.我相信我可以在加载数据时将 -1 注册为 NaN/缺失值,然后我看到一些使用 isna() 并计算 boolean 值的东西。 However, what I want to do (apply a condition to each column) seems like a fundamental thing I should know how to do, so I would like to figure out how to do it this way.
但是,我想做的事情(对每一列应用条件)似乎是我应该知道如何做的基本事情,所以我想弄清楚如何做到这一点。
Here is an example.这是一个例子。 Imagine I have the following data frame:
想象一下,我有以下数据框:
row A B C D E
1 3 5 6 9 -1
2 -1 3 -1 2 0
3 -1 -1 -1 1 -1
The output I would like to get would be:我想得到的 output 是:
A B C D E
2 1 2 0 2
I have tried the following:我尝试了以下方法:
df.apply(lambda x: x == -1).count() # value returned was the count of all the rows
(df == -1).count() # also returned a count of all the rows.
I looked through several questions related to "countif", but they all seemed to apply a condition to one column to select rows.我查看了与“countif”相关的几个问题,但它们似乎都将条件应用于 select 行的一列。 And the two items I tried above were from questions related to apply functions to each column and count values that match a condition in each column.
我在上面尝试的两项来自与将函数应用于每列和计算与每列中的条件匹配的值相关的问题。
The suggested duplicate in the comments is looking for a single value for the entire dataframe and different criteria on each column.评论中建议的重复项是为整个 dataframe 和每列的不同标准寻找单个值。 I am looking to apply the same condition to every column and get a result per column, as shown in the selected answer below.
我希望将相同的条件应用于每一列并获得每列的结果,如下面的选定答案所示。
I would appreciate any thoughts or ideas on how to proceed.我将不胜感激有关如何进行的任何想法或想法。
Use DataFrame.eq
+ DataFrame.sum
:使用
DataFrame.eq
+ DataFrame.sum
:
#You can omit to_frame and T if you don't want a DataFrame.
df.eq(-1).sum().to_frame().T
#(df==-1).sum() #similar
or if it is str
:或者如果是
str
:
df.eq('-1').sum().to_frame().T
if row is a column:如果行是一列:
df[df.columns[1:]].eq(-1).sum().to_frame().T
A B C D E
0 2 1 2 0 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.