[英]How to sum the total of all categorical variables across several columns in Pandas
I have a dataset that looks something like this: 我有一个看起来像这样的数据集:
id AttA AttB AttC
1 Y Y
2 Y
I would like to create another column which has the total number of attributes for each case, as follows: 我想创建另一列,其中包含每种情况下的属性总数,如下所示:
id AttA AttB AttC TotalAtts
1 Y Y 2
2 Y 1
It's not obvious to me how I should approach this problem, since I'm fairly new to Pandas. 对我来说,如何解决这个问题并不明显,因为我对Pandas并不陌生。
Thanks in advance 提前致谢
You could check which cells in the dataframe are not empty with ne('')
, and take the sum
setting axis
to 1
: 您可以使用
ne('')
检查数据框中的哪些单元格不为空,并将sum
设置axis
设为1
:
df['TotalAtts'] = df.ne('').sum(1)
AttA AttB AttC TotalAtts
0 Y Y 2
1 Y 1
If you want the count of Y
, you can do (df == 'Y').sum(1)
. 如果想要
Y
的计数,则可以执行(df == 'Y').sum(1)
。 If you want to count non-null values, then you can do df.count(1)
, but empty strings will be counted by this. 如果要计算非空值,则可以执行
df.count(1)
,但是空字符串将由此计数。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.