简体   繁体   English

如何获取熊猫数据框中行值不为零的列数计数

[英]How to get count of number of columns where the value is not zero row-wise in a pandas dataframe

I have the following data: 我有以下数据:

device_id   class   Channel A   Channel B   Channel C   Channel D   Channel E   Channel F   Channel G   Channel H   Channel I   Channel J
28          S           2           4           23          45          6           6           8           9           0           0
54          P           34          56          21          0           76          45          0           0           0           0
97          S           24          45          76          0           0           35          76          87          6           20
22          V           0           0           32          76          89          0           0           0           0           0

The channels occur in groups as per a mapping which I have defined in a dictionary as below: 根据我在字典中定义的映射,通道按组出现,如下所示:

The dictionary: 词典:

di = {              
'S' : ['Channel A','Channel B'],                
'P' : ['Channel C','Channel D','Channel E'],                
'V' : ['Channel F','Channel G','Channel H','Channel I',' Channel J']
}

I need to count the number of channels being watched under each device row-wise from the pandas dataframe. 我需要从熊猫数据框中按行计算每个设备下正在监视的通道数。

Expected output: 预期产量:

device_id   class   Channels_S  Channels_P  Channels_V
28           S          2           3           3
54           P          2           2           1
97           S          2           1           5
22           V          0           3           0

Can someone please guide me with this? 有人可以指导我吗?

Here's a trick you can use : 您可以使用以下技巧:

mask = df.set_index(['device_id','class']) != 0

d1 = mask.groupby({i:k for k,v in di.items() for i in v},axis=1).sum()

ndf = d1.add_prefix('Channel_').reset_index()

   device_id class  Channel_P  Channel_S  Channel_V
0         28     S        3.0        2.0        3.0
1         54     P        2.0        2.0        1.0
2         97     S        1.0        2.0        5.0
3         22     V        3.0        0.0        0.0

Explanation : 说明:

  1. mask will give you a dataframe with booleans with values not equal to zero true and the rest false. mask将为您提供一个布尔值的数据mask ,其值不等于零,true,其余为false。 we are setting device and class as index so they will not be considered. 我们将设备和类设置为索引,因此将不考虑它们。

  2. Expand the list inside the dict so we can group the columns based on the key {i:k for k,v in di.items() for i in v} 展开字典中的列表,以便我们可以基于键{i:k for k,v in di.items() for i in v}


    { 'Channel_F': 'V', 'Channel_J': 'V', 'Channel_E': 'P', 
     'Channel_G': 'V', 'Channel_D': 'P', 'Channel_B': 'S', 
     'Channel_I': 'V', 'Channel_A': 'S', 'Channel_C': 'P', 
     'Channel_H': 'V' }
  1. Groupby axis 1 and then sum. 按轴1分组,然后求和。

  2. Add prefix to columns and reset the index. 在列中添加前缀并重置索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM