Sumarize the occurrence number of multiple columns values as a pandas dataframe

Question

Considering that I have a data frame with the following structure:

    Symbol   CS
0     A1    MSI
1     A2    MSI
2     A3    GS
3     A2    EBV
4     A2    CIN
5     A1    GS    
6     A2    MSI
...

I would like to have a dataframe that categorizes the column CS as the header and the column Symbol as the index. The values would be the occurrence of each symbol by the CS type. As an example, the output expected from the above dataframe would be:

    MSI   EBV   CIN   GS
A1   1     0     0    1
A2   2     1     1    0
A3   0     0     0    1

What would be the best efficient approach to resolve this particular case?

Answer 1

You can using crosstab

pd.crosstab(df.Symbol,df.CS)
Out[862]: 
CS      CIN  EBV  GS  MSI
Symbol                   
A1        0    0   1    1
A2        1    1   0    2
A3        0    0   1    0

Answer 2

One way is to use pd.value_counts followed by pd.pivot_table .

Edit: But really you should use @Wen's solution .

counts = df.groupby('Symbol')['CS'].apply(pd.value_counts).reset_index()

res = counts.pivot_table(index='Symbol', columns='level_1', values='CS',
                         fill_value=0, aggfunc=lambda x: x)

print(res)

# level_1  CIN  EBV  GS  MSI
# Symbol                    
# A1         0    0   1    1
# A2         1    1   0    2
# A3         0    0   1    0

Sumarize the occurrence number of multiple columns values as a pandas dataframe

Question

2 answers

solution1
5 ACCPTED 2018-04-22 02:17:21

solution2
1 2018-04-22 02:16:57

Sumarize the occurrence number of multiple columns values as a pandas dataframe

Question

2 answers

solution1 5 ACCPTED 2018-04-22 02:17:21

solution2 1 2018-04-22 02:16:57

solution1
5 ACCPTED 2018-04-22 02:17:21

solution2
1 2018-04-22 02:16:57