简体   繁体   English

如何根据条件对多索引数据框中的列值进行计数

[英]How to count column values in multiindex data frame based on condition

I use Python 3.6.1. 我使用Python 3.6.1。 I have a data frame like this: 我有一个像这样的数据框:

        a  k     b     c   
          X1 X2 X1 X2 X1 X2
    0  AB  1  2  .  o      
    1  CD  2  1  .  o      
    2  EF  3  .        o  .
    3  GH  .  3  .  o  .  o

I would like to count values that are different from blank ('') and dot ('.') for each column on 2nd level. 我想为第二级上的每一列计算不同于空白('')和点('。')的值。 When I use count() I get this: 当我使用count()我得到了:

    a        4
    k  X1    4
       X2    4
    b  X1    4
       X2    4
    c  X1    4
       X2    4

But I need to have this: 但我需要这样:

    a        4
    k  X1    3
       X2    3
    b  X1    0
       X2    3
    c  X1    1
       X2    1

And the best would be to get a data frame with a new row (just above or just below the data) containing the counts, like this: 最好的方法是获取一个包含新行的数据帧(位于数据的上方或下方),如下所示:

        a  k     b     c   
          X1 X2 X1 X2 X1 X2
        4  3  3  0  3  1  1
    0  AB  1  2  .  o      
    1  CD  2  1  .  o      
    2  EF  3  .        o  .
    3  GH  .  3  .  o  .  o

Here is the code to create the initial data frame: 这是创建初始数据帧的代码:

    import numpy
    import pandas
    X1 = pandas.DataFrame(data=[['AB',1,'.','o'],['CD',2,'.','o'],['EF',3,'.','o']],
                          columns=['a','k','b','c'])
    X2 = pandas.DataFrame(data=[['CD',1,'o','o'],['AB',2,'o','o'],['GH',3,'o','o']],
                          columns=['a','k','b','c'])
    myDF = pandas.concat([X1.set_index('a'), X2.set_index('a')],
                         axis='columns', keys=['X1','X2'])
    myDF = myDF.swaplevel(axis='columns')[X1.columns[1:]]
    myDF = myDF.reset_index(col_level=1, col_fill='a')
    myDF = myDF.fillna('.')
    kDF = myDF[['k']]
    operDF = myDF.drop('k', axis=1, level=0).set_index('a').stack(0)\
            .pipe(lambda d: d.mask(d.X1 == d.X2, '')).unstack()\
            .swaplevel(0,1,axis=1).sort_index(axis=1,level=0)\
            .reset_index()
    finDF = pandas.concat([kDF, operDF], axis=1)
    cols = list(finDF)
    cols[0], cols[1], cols[2] = cols[2], cols[0], cols[1]
    finDF = finDF.ix[:,cols]
    finDF['a'] = finDF['a'].map(lambda x: x[0])

I would appreciate any hint ;) 我将不胜感激;)

A simple sum by masking would be enough ie 一个简单的掩盖之和就足够了,即

count = ((finDF != '') & (finDF != '.')).sum()

Output : 输出:

a        4
k  X1    3
   X2    3
b  X1    0
   X2    3
c  X1    1
   X2    1
dtype: int64

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据条件统计所有数据框列值并将列转置为 Python 中的行 - How to count all data frame column values based on condition and transpose the columns into rows in Python 使用数据框的列值来索引多索引数据框的行 - Using column values of a data frame to index rows of a multiindex data frame 如何使用 pandas 根据同一数据帧中另一列的条件获取列值的连续平均值 - How to get consecutive averages of the column values based on the condition from another column in the same data frame using pandas 如何根据其他列的某些条件在数据框列中填充新值 - How to fill new values in a data frame column based on some condition from other column 根据if条件为熊猫数据框中的列分配值 - assigning values to column in pandas data-frame based on if condition Pandas 数据框根据条件替换列中的值 - Pandas data frame replace values in column based on condition Python Pandas 数据帧 基于另一列的计数值 - Python Pandas Data Frame Count values of one column based on another 如何在pydatatable数据框中根据条件(ifelse)创建列并填写值? - How to create a column and fill in values based on condition(ifelse) in pydatatable data frame? 如果满足基于同一数据帧中其他2列的行值的条件,则在数据帧的列行中填充值 - Filling values in rows of column in a data frame, if condition based on 2 other columns row values in the same data frame is met 在多索引数据框中插入列 - Insert column in multiindex data frame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM