[英]Count number of values of multiple columns per each column and additional category column
I have a dataframe containing multiple columns with 0's and 1's (A, B) as well as one column (C) indicating the category of the row.我有一个数据框,其中包含多个带有 0 和 1(A、B)的列以及一列(C)指示行的类别。 Now, I would like to count the 0 and 1 values per column and category.现在,我想计算每列和类别的 0 和 1 值。
import pandas as pd
test_data = {'A': [0,0,1,1,1,0],
'B': [0,1,0,1,0,1],
'C': ['a','a','b','b', 'c', 'c']}
df = pd.DataFrame(test_data)
I tried to figure out how I could rearrange the dataframe using pd.piovt_table, however I wasn't successful getting the right transformation.我试图弄清楚如何使用 pd.piovt_table 重新排列数据帧,但是我没有成功获得正确的转换。 I tried the following:我尝试了以下方法:
table = pd.pivot_table(df, columns = ['C'], index=['A'], aggfunc='count')
print('0', table)
which will result in the following output:这将导致以下输出:
0 B
C a b c
A
0 2.0 NaN 1.0
1 NaN 2.0 1.0
My goal is to get the following output:我的目标是获得以下输出:
0 B | A # columns A and B
C a a b b c c | a a b b c c # row category based on C
0 1 0 1 0 1 | 0 1 0 1 0 1 # 0 and 1 values of the columns A and B
1 1 1 1 1 1 | 2 0 0 2 1 1 # counts
[Edit] or the following output: [编辑]或以下输出:
0 B | A # columns A and B
C a b c | a b c # row category based on C
0| 1 1 1 | 2 0 1
1| 1 1 1 | 0 2 1
Could anyone help me with this?有人可以帮我解决这个问题吗? Thank you!谢谢!
I think you need DataFrame.melt
previously我认为你以前需要DataFrame.melt
First case it is the second with unstack()第一种情况是第二种情况,使用 unstack()
new_df = (df.melt('C')
.groupby(['variable','C'])['value']
.value_counts().unstack(fill_value=0)
.stack()
.to_frame().T
.rename_axis(index=None,columns=[0,'C',None])
.sort_index(axis=1, ascending=[False,True,True]))
print(new_df)
0 B A
C a b c a b c
0 1 0 1 0 1 0 1 0 1 0 1
0 1 1 1 1 1 1 2 0 0 2 1 1
Second Case it is the first with stack()第二种情况它是第一个使用 stack()
new_df = (df.melt('C').groupby(['C','variable'])['value']
.value_counts().unstack(['variable','C'],fill_value=0)
.sort_index(axis=1, ascending=[False, True])
.rename_axis(columns=[0,'C'],index=None))
print(new_df)
or或者
new_df = (df.melt('C')
.pivot_table(columns=['variable','C'],
index='value',
aggfunc='size',
fill_value=0)
.rename_axis(index=None, columns=[0,'C'])
.sort_index(axis=1, ascending=[False, True]))
Output输出
0 B A
C a b c a b c
0 1 1 1 2 0 1
1 1 1 1 0 2 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.