简体   繁体   English

如何根据某些条件从两个数据框中获取行数

[英]How to get the count of row from two dataframe based on some conditions

count_0 = (X_train['school_state']== 'nc' & y_train['project_is_approved'] == 0).apply(len)

X_train is one numpy array and y_train is another numpy array. X_train是一个 numpy 数组, y_train是另一个 numpy 数组。

X_train has a column school_state which total 51 state names and one of the State name is 'nc' and y_train has a single column ie project_is_approved which have two value which can be either 0 or 1. X_train有一列school_state ,其中一共有 51 个州名,其中一个州名是“nc”,而y_train有一个列,即project_is_approved ,它有两个值,可以是 0 或 1。

I want to find out number where state name is 'nc' and project_is_approved is 0.我想找出状态名称为 'nc' 且project_is_approved为 0 的数字。

Through above code i am getting error:通过上面的代码,我收到错误:

IndexError: only integers, slices ( : ), ellipsis ( ... ), numpy.newaxis ( None ) and integer or boolean arrays are valid indices IndexError:只有整数,切片( : ),省略号( ... ),numpy.newaxis( None )和整数或布尔数组是有效的索引

Sample y_train : array([0, 1, 1, ..., 1, 1, 0], dtype=int64) Sample X_train['school_state']:样本 y_train : array([0, 1, 1, ..., 1, 1, 0], dtype=int64) 样本 X_train['school_state']:

47418 nc 47418 数控

49054 ca 49054 ca

35919 wi 35919无线

34248 ca 34248 ca

15492 sd 15492 标清

31525 ks 31525 秒

36090 fl 36090 液量

43569 ny 43569 纽约

9290 pa 9290帕

12848 me 12848 我

46189 la 46189 拉

33364 dc 33364 直流

您需要添加括号,然后才能使用sum方法:

((X_train['school_state'] == 'nc') & (y_train == 0)).sum()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM