简体   繁体   English

计算 pandas dataframe 中的条目数低于 0

[英]Calculating number of entries in pandas dataframe below 0

I have a pandas dataframe with many columns of which some are numerical and other categorical.我有一个 pandas dataframe 有很多列,其中一些是数字的,而其他的则是分类的。

I want to calculate the number of negative entries in the pandas dataframe.我想计算 pandas dataframe 中的负条目数。 One way is to find which columns are numeric, subset these columns and then use simple syntax to calculate number of entries with negative values, eg (df < 0).sum()一种方法是找出哪些列是数字的,对这些列进行子集化,然后使用简单的语法来计算具有负值的条目数,例如(df < 0).sum()

Instead I tried a syntax with apply and lambda function which includes a conditional but I get a message that my syntax is erroneous.相反,我尝试使用 apply 和 lambda function 的语法,其中包含一个条件,但我收到一条消息,指出我的语法错误。 Could you please explain to me why and how this idea could be implemented?您能否向我解释一下为什么以及如何实施这个想法?

data.apply(lambda x: (if (x.dtype == 'int16' or x.dtype == 'float16'): (x<0).sum())).sum()
  File "<ipython-input-75-f329bf4e8cdd>", line 1
    data.apply(lambda x: (if (x.dtype == 'int16' or x.dtype == 'float16'): (x<0).sum())).sum()
                           ^
SyntaxError: invalid syntax

You can use a ternary operator here:您可以在此处使用三元运算符

data.apply(lambda x: (x < 0).sum() if (x.dtype in ('int16', 'float16')) else 0).sum()

We thus return 0 (the neutral element of the (ℕ, +, 0) monoid) for non-numerical values.因此,对于非数值,我们返回0 ((ℕ, +, 0) 幺半群的中性元素)。

Note that there are more numerical types than just int16 and float16 , you might want to use np.issubdtype(..., np.number) here:请注意,数字类型不仅仅是int16float16 ,您可能希望在此处使用np.issubdtype(..., np.number)

import numpy as np

data.apply(lambda x: (x < 0).sum() if np.issubdtype(x.dtype, np.number) else 0).sum()

I think, a simpler solution is:我认为,一个更简单的解决方案是:

  • use select_dtypes to get a subset of columns of numeric type,使用select_dtypes获取数字类型列的子集,
  • then use a Numpy count_nonzero function.然后使用Numpy count_nonzero function。

As this function counts non-zero values, we have to convert out DataFrame into an array of boolean values, where True values are counted as non-zero.由于此 function 计数非零值,我们必须将 DataFrame 转换为boolean值的数组,其中值计为非零。

So to sum up, the whole code can be:综上所述,整个代码可以是:

np.count_nonzero(df.select_dtypes(include=np.number) < 0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM