简体   繁体   中英

count occurrence of a value in multiple columns of a dataframe Pandas

I want to count occurrence of a string value in multiple columns in the entire dataframe, for example:

df = pd.DataFrame({'type1':['A11','B11','A11','A'],
                     'type2':['A12','C','A12','D11'],
                     'type3':['C','D11','B','D12'],
                     'type4':[np.nan,'E12','C','E']
                    })
df

       type1    type2   type3   type4 
    0   A11      A12      C      NaN     
    1   B11       C      D11     E12     
    2   A11      A12      B       C      
    3    A       D11     D12      E     

I want to be able to count each distinct value in those 4 columns, like:

value count
 A11    2
 A12    2
 C      3
 B11    1
 D11    2
 D12    1
 E12    3
 B      1
 A      1
 E      1
df.stack().value_counts()

C      3
A11    2
A12    2
D11    2
B11    1
E12    1
B      1
A      1
D12    1
E      1

if you need the names:

df.stack().value_counts().reset_index(name='count').rename({'index':'value'}, axis = 1)
 
  value  count
0     C      3
1   A11      2
2   A12      2
3   D11      2
4   B11      1
5   E12      1
6     B      1
7     A      1
8   D12      1
9     E      1

You could do something like this:

df1 = df.apply(pd.Series.value_counts).dropna(how='all').sum(1).reset_index()
df1.rename(columns = {'index':'value', 0:'count'}, inplace = True)
df1

Output:

index value count
0 A 1.0
1 A11 2.0
2 A12 2.0
3 B 1.0
4 B11 1.0
5 C 3.0
6 D11 2.0
7 D12 1.0
8 E 1.0
9 E12 1.0

Another way: The advantage of this method is, if one wants to keep understand the data based on type1 , type2 , etc one just need to do reset_index() in step 2.

Original Data

df = pd.DataFrame({'type1':['A11','B11','A11','A'],
                     'type2':['A12','C','A12','D11'],
                     'type3':['C','D11','B','D12'],
                     'type4':[np.nan,'E12','C','E']
                    })

Stack the data:

df1 = pd.DataFrame(df.stack(), columns = ['colnam'])

Then groupby based on required column:

print(df1.groupby('colnam')['colnam'].count())

You can flatten your values using numpy.reshape , then create a new dataframe and use pd.DataFrame.value_counts to get your output:

pd.DataFrame(df.values.reshape(1,-1)[0], columns = ['type']).value_counts()  

Output:

type
C       3
A11     2
A12     2
D11     2
A       1
B       1
B11     1
D12     1
E       1
E12     1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM