I have a data frame with three different columns, A, B and C. I have applied a group by command on Column A, B and C. I have also counted the no. of rows each group of three values possesses.
Now, I want to make 0 and 1 (cell values in column C) as columns themselves. Also, I want to add them and display their sum in a separate column (alongside 0 and 1 columns). Desired output:
A B Count0 Count1 Sum of Counts Count1/Sum of Counts
1000 1000 38 538 567 538/567
1000 1001 9 90 99 90/99
1000 1002 8 16 24 16/24
1000 1003 2 10 12 10/12
(I am not an active Python user. I have searched a lot on this but can't seem to find the right words to search it) If I learn how to do the sum of counts 0 and 1 and display alongside other columns in the dataframe, I will do the division myself.
Thanks in advance.
Use SeriesGroupBy.value_counts
or size
with unstack
:
df = pd.DataFrame({
'A': [1000] * 10,
'B': [1000] * 2 + [1001] * 3 + [1002] * 5,
'C':[0,1] * 5
})
print (df)
A B C
0 1000 1000 0
1 1000 1000 1
2 1000 1001 0
3 1000 1001 1
4 1000 1001 0
5 1000 1002 1
6 1000 1002 0
7 1000 1002 1
8 1000 1002 0
9 1000 1002 1
df = df.groupby(['A','B'])['C'].value_counts().unstack(fill_value=0).reset_index()
#another solution
#df = pd.crosstab([df['A'], df['B']], df['C']).reset_index()
#solution 2
#df = df.groupby(['A','B','C']).size().unstack(fill_value=0).reset_index()
print (df)
C A B 0 1
0 1000 1000 1 1
1 1000 1001 2 1
2 1000 1002 2 3
And then sum and divide:
df = df.rename(columns={0:'Count0',1:'Count1'})
df['Sum of Counts'] = df['Count0'] + df['Count1']
df['Count1/Sum of Counts'] = df['Count1'] / df['Sum of Counts']
print (df)
C A B Count0 Count1 Sum of Counts Count1/Sum of Counts
0 1000 1000 1 1 2 0.500000
1 1000 1001 2 1 3 0.333333
2 1000 1002 2 3 5 0.600000
Try:
df1 = df.pivot_table(values='counts', index=['A', 'B'], columns=['C'], aggfunc='sum', fill_value=None, margins=True, dropna=True, margins_name='Sum of Counts').reset_index()
df1 = df1.rename(columns={0:'Count0',1:'Count1'})
df1['Count1/Sum of Counts'] = df1['Count1'] / df1['Sum of Counts']
You can do a reset_index()
to structure it better. Also, Count1/Sum of Counts
is just df['Count1'] / df['Sum of Counts']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.