[英]Count non-null values in each row with pandas
I have dataframe我有数据框
site1 time1 site2 time2 site3 time3 site4 time4 site5 time5 ... time6 site7 time7 site8 time8 site9 time9 site10 time10 target
session_id
21669 56 2013-01-12 08:05:57 55.0 2013-01-12 08:05:57 NaN NaT NaN NaT NaN NaT ... NaT NaN NaT NaN NaT NaN NaT NaN NaT 0
54843 56 2013-01-12 08:37:23 55.0 2013-01-12 08:37:23 56.0 2013-01-12 09:07:07 55.0 2013-01-12 09:07:09 NaN NaT ... NaT NaN NaT NaN NaT NaN NaT NaN NaT 0
77292 946 2013-01-12 08:50:13 946.0 2013-01-12 08:50:14 951.0 2013-01-12 08:50:15 946.0 2013-01-12 08:50:15 946.0 2013-01-12 08:50:16 ... 2013-01-12 08:50:16 948.0 2013-01-12 08:50:16 784.0 2013-01-12 08:50:16 949.0 2013-01-12 08:50:17 946.0 2013-01-12 08:50:17 0
114021 945 2013-01-12 08:50:17 948.0 2013-01-12 08:50:17 949.0 2013-01-12 08:50:18 948.0 2013-01-12 08:50:18 945.0 2013-01-12 08:50:18 ... 2013-01-12 08:50:18 947.0 2013-01-12 08:50:19 945.0 2013-01-12 08:50:19 946.0 2013-01-12 08:50:19 946.0 2013-01-12 08:50:20 0
I need to count N of columns, where site != NaN.我需要计算 N 列,其中 site != NaN。 I try to use
我尝试使用
df[['site%s' % i for i in range(1, 11)]].count(axis=1)
but it returns me 10 to every id但它给我每个 id 返回 10
Also I have tried我也试过
train_df[sites].notnull().count(axis=1)
and it also didn't help.它也没有帮助。
Desire output欲望输出
21669 2
54843 4
77292 10
114021 10
I'd do this with just count
:我会用
count
做到这一点:
train_df[sites].count(axis=1)
count
specifically counts non-null values. count
专门计算非空值。 The issue with your current implementation is that notnull
yields boolean values, and bool
s are certainly not-null, meaning they are always counted.您当前实现的问题是
notnull
产生布尔值,而bool
s当然不是空的,这意味着它们总是被计算在内。
df
one two three four five
a -0.166778 0.501113 -0.355322 bar False
b NaN NaN NaN NaN NaN
c -0.337890 0.580967 0.983801 bar False
d NaN NaN NaN NaN NaN
e 0.057802 0.761948 -0.712964 bar True
f -0.443160 -0.974602 1.047704 bar False
g NaN NaN NaN NaN NaN
h -0.717852 -1.053898 -0.019369 bar False
df.count(axis=1)
a 5
b 0
c 5
d 0
e 5
f 5
g 0
h 5
dtype: int64
And...和...
df.notnull().count(axis=1)
a 5
b 5
c 5
d 5
e 5
f 5
g 5
h 5
dtype: int64
也用sum()
交易count(axis=1)
sum()
应该可以解决问题
train_df[sites].notnull().sum()
A simple way to find the number of missing values by row-wise is :按行查找缺失值数量的一种简单方法是:
df.isnull().sum(axis=1)
To find the number of rows which are having more than 3 null values:要查找具有 3 个以上空值的行数:
df[df.isnull().sum(axis=1) >=3]
In case if you need to drop rows which are having more than 3 null values then you can follow this code:如果您需要删除具有 3 个以上空值的行,则可以遵循以下代码:
df = df[df.isnull().sum(axis=1) < 3]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.