[英]How to count column values in multiindex data frame based on condition
I use Python 3.6.1. 我使用Python 3.6.1。 I have a data frame like this: 我有一个像这样的数据框:
a k b c
X1 X2 X1 X2 X1 X2
0 AB 1 2 . o
1 CD 2 1 . o
2 EF 3 . o .
3 GH . 3 . o . o
I would like to count values that are different from blank ('') and dot ('.') for each column on 2nd level. 我想为第二级上的每一列计算不同于空白('')和点('。')的值。 When I use count()
I get this: 当我使用count()
我得到了:
a 4
k X1 4
X2 4
b X1 4
X2 4
c X1 4
X2 4
But I need to have this: 但我需要这样:
a 4
k X1 3
X2 3
b X1 0
X2 3
c X1 1
X2 1
And the best would be to get a data frame with a new row (just above or just below the data) containing the counts, like this: 最好的方法是获取一个包含新行的数据帧(位于数据的上方或下方),如下所示:
a k b c
X1 X2 X1 X2 X1 X2
4 3 3 0 3 1 1
0 AB 1 2 . o
1 CD 2 1 . o
2 EF 3 . o .
3 GH . 3 . o . o
Here is the code to create the initial data frame: 这是创建初始数据帧的代码:
import numpy
import pandas
X1 = pandas.DataFrame(data=[['AB',1,'.','o'],['CD',2,'.','o'],['EF',3,'.','o']],
columns=['a','k','b','c'])
X2 = pandas.DataFrame(data=[['CD',1,'o','o'],['AB',2,'o','o'],['GH',3,'o','o']],
columns=['a','k','b','c'])
myDF = pandas.concat([X1.set_index('a'), X2.set_index('a')],
axis='columns', keys=['X1','X2'])
myDF = myDF.swaplevel(axis='columns')[X1.columns[1:]]
myDF = myDF.reset_index(col_level=1, col_fill='a')
myDF = myDF.fillna('.')
kDF = myDF[['k']]
operDF = myDF.drop('k', axis=1, level=0).set_index('a').stack(0)\
.pipe(lambda d: d.mask(d.X1 == d.X2, '')).unstack()\
.swaplevel(0,1,axis=1).sort_index(axis=1,level=0)\
.reset_index()
finDF = pandas.concat([kDF, operDF], axis=1)
cols = list(finDF)
cols[0], cols[1], cols[2] = cols[2], cols[0], cols[1]
finDF = finDF.ix[:,cols]
finDF['a'] = finDF['a'].map(lambda x: x[0])
I would appreciate any hint ;) 我将不胜感激;)
A simple sum by masking would be enough ie 一个简单的掩盖之和就足够了,即
count = ((finDF != '') & (finDF != '.')).sum()
Output : 输出:
a 4 k X1 3 X2 3 b X1 0 X2 3 c X1 1 X2 1 dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.