[英]How to get the count of row from two dataframe based on some conditions
count_0 = (X_train['school_state']== 'nc' & y_train['project_is_approved'] == 0).apply(len)
X_train
is one numpy array and y_train
is another numpy array. X_train
是一个 numpy 数组, y_train
是另一个 numpy 数组。
X_train
has a column school_state
which total 51 state names and one of the State name is 'nc' and y_train
has a single column ie project_is_approved
which have two value which can be either 0 or 1. X_train
有一列school_state
,其中一共有 51 个州名,其中一个州名是“nc”,而y_train
有一个列,即project_is_approved
,它有两个值,可以是 0 或 1。
I want to find out number where state name is 'nc' and project_is_approved
is 0.我想找出状态名称为 'nc' 且
project_is_approved
为 0 的数字。
Through above code i am getting error:通过上面的代码,我收到错误:
IndexError: only integers, slices (
:
), ellipsis (...
), numpy.newaxis (None
) and integer or boolean arrays are valid indicesIndexError:只有整数,切片(
:
),省略号(...
),numpy.newaxis(None
)和整数或布尔数组是有效的索引
Sample y_train : array([0, 1, 1, ..., 1, 1, 0], dtype=int64) Sample X_train['school_state']:样本 y_train : array([0, 1, 1, ..., 1, 1, 0], dtype=int64) 样本 X_train['school_state']:
47418 nc 47418 数控
49054 ca 49054 ca
35919 wi 35919无线
34248 ca 34248 ca
15492 sd 15492 标清
31525 ks 31525 秒
36090 fl 36090 液量
43569 ny 43569 纽约
9290 pa 9290帕
12848 me 12848 我
46189 la 46189 拉
33364 dc 33364 直流
您需要添加括号,然后才能使用sum
方法:
((X_train['school_state'] == 'nc') & (y_train == 0)).sum()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.