Count occurance of unique values in a pandas dataframe across multiple columns

Question

I have the following dataframe in pandas

df = pd.DataFrame({'a' : ['hello', 'world', 'great', 'hello'], 'b' : ['world', None, 'hello', 'world'], 'c' : [None, 'hello', 'great', None]})

i would like to count the occurrence of the unique values in column 'a' across all the other columns and column 'a' too and save that into new columns for the dataframe with appropriate naming that take on the values in column 'a' such as 'hello_count', 'world_count' and so on. Hence the end result would be something like

 df = pd.DataFrame({'a' : ['hello', 'world', 'great', 'hello'], 'b' : ['world', None, 'hello', 'world'], 'c' : [None, 'hello', 'great', None], 'hello_count' : [1,1,1,1], 'world_count' : [1,1,0,1], 'great_count' : [0,0,2,0]})

i tried

df['a', 'b', 'a'].groupby('a').agg(['count])

but that did not work. Any help is really appreciated

Answer 1

Let's use pd.get_dummies and groupby :

(df1.assign(**pd.get_dummies(df1)
                .pipe(lambda x: x.groupby(x.columns.str[2:], axis=1)
                .sum())))

Output:

       a      b      c  great  hello  world
0  hello  world   None      0      1      1
1  world   None  hello      0      1      1
2  great  hello  great      2      1      0
3  hello  world   None      0      1      1

Here is the above solution in steps.

Step 1: pd.get_dummies

df_gd = pd.get_dummies(df1)
print(df_gd)

   a_great  a_hello  a_world  b_hello  b_world  c_great  c_hello
0        0        1        0        0        1        0        0
1        0        0        1        0        0        0        1
2        1        0        0        1        0        1        0
3        0        1        0        0        1        0        0

Step 2: groupby column names ignoring the first two letters

df_gb = df_gd.groupby(df_gd.columns.str[2:], axis=1).sum()
print(df_gb)

   great  hello  world
0      0      1      1
1      0      1      1
2      2      1      0
3      0      1      1

Step 3: Join back to original dataframe

df_out = df1.join(df_gb)
print(df_out)

Ouput:

       a      b      c  great  hello  world
0  hello  world   None      0      1      1
1  world   None  hello      0      1      1
2  great  hello  great      2      1      0
3  hello  world   None      0      1      1

Answer 2

Using df.apply in a loop simplifies the job. Each row is then tested how many of its elements are same as the required string:

for ss in df.a.unique():
    df[ss+"_count"] = df.apply(lambda row: sum(map(lambda x: x==ss, row)), axis=1)

print(df)

Output:

       a      b      c  hello_count  world_count  great_count
0  hello  world   None            1            1            0
1  world   None  hello            1            1            0
2  great  hello  great            1            0            2
3  hello  world   None            1            1            0

Answer 3

You can create dictionary d_unique={} and assign all the unique values as key pair in to it, consider the dataframe named as data_rnr:

d_unique={}
for col in data_rnr.columns:
    print(data_rnr[col].name)
    print(len(data_rnr[col].unique()))
    d_unique[data_rnr[col].name]=len(data_rnr[col].unique())

Count occurance of unique values in a pandas dataframe across multiple columns

Question

3 answers

solution1
3 ACCPTED 2018-02-02 22:32:00

Step 1: pd.get_dummies

Step 2: groupby column names ignoring the first two letters

Step 3: Join back to original dataframe

solution2
0 2018-02-03 01:33:44

solution3
0 2019-08-05 11:40:22

Count occurance of unique values in a pandas dataframe across multiple columns

Question

3 answers

solution1 3 ACCPTED 2018-02-02 22:32:00

Step 1: pd.get_dummies

Step 2: groupby column names ignoring the first two letters

Step 3: Join back to original dataframe

solution2 0 2018-02-03 01:33:44

solution3 0 2019-08-05 11:40:22

solution1
3 ACCPTED 2018-02-02 22:32:00

solution2
0 2018-02-03 01:33:44

solution3
0 2019-08-05 11:40:22