简体   繁体   中英

How to find most frequent word which comes in column in pandas dataframe

import pandas as pd
import numpy as np
df = pd.DataFrame({'City': ['Pune', 'Mumbai', 'Pune', 'Mumbai', 'Pune'],
        'Name': ['John', 'Boby', 'John', 'Boby', 'Nicky'], 
           'Competition': ['Chess,Drawing,Chess', 'Table Tennis,Table Tennis,Chess,Carrom', 'Chess,Carrom', 'Table Tennis,Chess,Chess,Chess', 'Carrom'] })
     City   Name    Competition
0   Pune    John    Chess,Drawing,Chess
1   Mumbai  Boby    Table Tennis,Table Tennis,Chess,Carrom
2   Pune    John    Chess,Carrom
3   Mumbai  Boby    Table Tennis,Chess,Chess,Chess
4   Pune    Nicky   Carrom
Required output
    City    Name    Competition                                  Most Frequent
0   Pune    John    Chess,Drawing,Chess                            Chess
1   Mumbai  Boby    Table Tennis,Table Tennis,Chess,Carrom         Table Tennis
2   Pune    John    Chess,Carrom,Chess,Carrom                      Carrom,Chess
3   Mumbai  Boby    Table Tennis,Chess,Chess,Chess                 Chess
4   Pune    Nicky   Carrom                                         Carrom

if equal number of words then add both words.Otherwise mmost frequent word

First split values in columns with DataFrame.explode , so possible get Series.mode with join for all top values:

f = lambda x: ','.join(x.mode())
df['Most Frequent'] = (df.assign(Competition = df['Competition'].str.split(','))
                         .explode('Competition')
                         .groupby(level=0)['Competition'] 
                         .agg(f))
print (df)
     City   Name                             Competition Most Frequent
0    Pune   John                     Chess,Drawing,Chess         Chess
1  Mumbai   Boby  Table Tennis,Table Tennis,Chess,Carrom  Table Tennis
2    Pune   John                            Chess,Carrom  Carrom,Chess
3  Mumbai   Boby          Table Tennis,Chess,Chess,Chess         Chess
4    Pune  Nicky                                  Carrom        Carrom

Use statistics.multimode in apply :

import pandas as pd
from statistics import multimode

df = pd.DataFrame({'City': ['Pune', 'Mumbai', 'Pune', 'Mumbai', 'Pune'],
                   'Name': ['John', 'Boby', 'John', 'Boby', 'Nicky'],
                   'Competition': ['Chess,Drawing,Chess', 'Table Tennis,Table Tennis,Chess,Carrom', 'Chess,Carrom',
                                   'Table Tennis,Chess,Chess,Chess', 'Carrom']})

df["Most Frequent"] = df["Competition"].apply(lambda x: ",".join(multimode(x.split(","))[:2]))
print(df)

Output

     City   Name                             Competition Most Frequent
0    Pune   John                     Chess,Drawing,Chess         Chess
1  Mumbai   Boby  Table Tennis,Table Tennis,Chess,Carrom  Table Tennis
2    Pune   John                            Chess,Carrom  Chess,Carrom
3  Mumbai   Boby          Table Tennis,Chess,Chess,Chess         Chess
4    Pune  Nicky                                  Carrom        Carrom

Here is a simple yet comprehensive solution using Counter .

from collections import Counter

def keywithmaxval(d):
    itemMaxValue = max(d.values())
    return ','.join([k for k, v in d.items() if v == itemMaxValue])

df["Most Frequent"] = df['Competition'].str.split(',').apply(Counter).apply(keywithmaxval)

Output :

This gives us :

df
     City   Name                             Competition Most Frequent
0    Pune   John                     Chess,Drawing,Chess         Chess
1  Mumbai   Boby  Table Tennis,Table Tennis,Chess,Carrom  Table Tennis
2    Pune   John                            Chess,Carrom  Chess,Carrom
3  Mumbai   Boby          Table Tennis,Chess,Chess,Chess         Chess
4    Pune  Nicky                                  Carrom        Carrom

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM