如何根据条件对多索引数据框中的列值进行计数

Question

I use Python 3.6.1. 我使用Python 3.6.1。 I have a data frame like this: 我有一个像这样的数据框：

        a  k     b     c   
          X1 X2 X1 X2 X1 X2
    0  AB  1  2  .  o      
    1  CD  2  1  .  o      
    2  EF  3  .        o  .
    3  GH  .  3  .  o  .  o

I would like to count values that are different from blank ('') and dot ('.') for each column on 2nd level. 我想为第二级上的每一列计算不同于空白（''）和点（'。'）的值。 When I use count() I get this: 当我使用count()我得到了：

But I need to have this: 但我需要这样：

And the best would be to get a data frame with a new row (just above or just below the data) containing the counts, like this: 最好的方法是获取一个包含新行的数据帧（位于数据的上方或下方），如下所示：

        a  k     b     c   
          X1 X2 X1 X2 X1 X2
        4  3  3  0  3  1  1
    0  AB  1  2  .  o      
    1  CD  2  1  .  o      
    2  EF  3  .        o  .
    3  GH  .  3  .  o  .  o

Here is the code to create the initial data frame: 这是创建初始数据帧的代码：

    import numpy
    import pandas
    X1 = pandas.DataFrame(data=[['AB',1,'.','o'],['CD',2,'.','o'],['EF',3,'.','o']],
                          columns=['a','k','b','c'])
    X2 = pandas.DataFrame(data=[['CD',1,'o','o'],['AB',2,'o','o'],['GH',3,'o','o']],
                          columns=['a','k','b','c'])
    myDF = pandas.concat([X1.set_index('a'), X2.set_index('a')],
                         axis='columns', keys=['X1','X2'])
    myDF = myDF.swaplevel(axis='columns')[X1.columns[1:]]
    myDF = myDF.reset_index(col_level=1, col_fill='a')
    myDF = myDF.fillna('.')
    kDF = myDF[['k']]
    operDF = myDF.drop('k', axis=1, level=0).set_index('a').stack(0)\
            .pipe(lambda d: d.mask(d.X1 == d.X2, '')).unstack()\
            .swaplevel(0,1,axis=1).sort_index(axis=1,level=0)\
            .reset_index()
    finDF = pandas.concat([kDF, operDF], axis=1)
    cols = list(finDF)
    cols[0], cols[1], cols[2] = cols[2], cols[0], cols[1]
    finDF = finDF.ix[:,cols]
    finDF['a'] = finDF['a'].map(lambda x: x[0])

I would appreciate any hint ;) 我将不胜感激;）

Answer 1

A simple sum by masking would be enough ie 一个简单的掩盖之和就足够了，即

count = ((finDF != '') & (finDF != '.')).sum()

Output : 输出：

a        4
k  X1    3
   X2    3
b  X1    0
   X2    3
c  X1    1
   X2    1
dtype: int64

如何根据条件对多索引数据框中的列值进行计数

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-11-20 13:54:43

如何根据条件对多索引数据框中的列值进行计数

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-11-20 13:54:43

解决方案1
3 已采纳 2017-11-20 13:54:43