[英]find and select the most frequent data of column in pandas DataFrame
[英]How to find most frequent word which comes in column in pandas dataframe
import pandas as pd
import numpy as np
df = pd.DataFrame({'City': ['Pune', 'Mumbai', 'Pune', 'Mumbai', 'Pune'],
'Name': ['John', 'Boby', 'John', 'Boby', 'Nicky'],
'Competition': ['Chess,Drawing,Chess', 'Table Tennis,Table Tennis,Chess,Carrom', 'Chess,Carrom', 'Table Tennis,Chess,Chess,Chess', 'Carrom'] })
City Name Competition
0 Pune John Chess,Drawing,Chess
1 Mumbai Boby Table Tennis,Table Tennis,Chess,Carrom
2 Pune John Chess,Carrom
3 Mumbai Boby Table Tennis,Chess,Chess,Chess
4 Pune Nicky Carrom
Required output
City Name Competition Most Frequent
0 Pune John Chess,Drawing,Chess Chess
1 Mumbai Boby Table Tennis,Table Tennis,Chess,Carrom Table Tennis
2 Pune John Chess,Carrom,Chess,Carrom Carrom,Chess
3 Mumbai Boby Table Tennis,Chess,Chess,Chess Chess
4 Pune Nicky Carrom Carrom
如果词数相等,则添加两个词。否则最常见的词
首先使用DataFrame.explode
拆分列中的值,因此可以获取Series.mode
并连接所有顶部值:
f = lambda x: ','.join(x.mode())
df['Most Frequent'] = (df.assign(Competition = df['Competition'].str.split(','))
.explode('Competition')
.groupby(level=0)['Competition']
.agg(f))
print (df)
City Name Competition Most Frequent
0 Pune John Chess,Drawing,Chess Chess
1 Mumbai Boby Table Tennis,Table Tennis,Chess,Carrom Table Tennis
2 Pune John Chess,Carrom Carrom,Chess
3 Mumbai Boby Table Tennis,Chess,Chess,Chess Chess
4 Pune Nicky Carrom Carrom
在apply
中使用statistics.multimode
:
import pandas as pd
from statistics import multimode
df = pd.DataFrame({'City': ['Pune', 'Mumbai', 'Pune', 'Mumbai', 'Pune'],
'Name': ['John', 'Boby', 'John', 'Boby', 'Nicky'],
'Competition': ['Chess,Drawing,Chess', 'Table Tennis,Table Tennis,Chess,Carrom', 'Chess,Carrom',
'Table Tennis,Chess,Chess,Chess', 'Carrom']})
df["Most Frequent"] = df["Competition"].apply(lambda x: ",".join(multimode(x.split(","))[:2]))
print(df)
输出
City Name Competition Most Frequent
0 Pune John Chess,Drawing,Chess Chess
1 Mumbai Boby Table Tennis,Table Tennis,Chess,Carrom Table Tennis
2 Pune John Chess,Carrom Chess,Carrom
3 Mumbai Boby Table Tennis,Chess,Chess,Chess Chess
4 Pune Nicky Carrom Carrom
这是一个使用Counter
的简单而全面的解决方案。
from collections import Counter
def keywithmaxval(d):
itemMaxValue = max(d.values())
return ','.join([k for k, v in d.items() if v == itemMaxValue])
df["Most Frequent"] = df['Competition'].str.split(',').apply(Counter).apply(keywithmaxval)
输出 :
这给了我们:
df
City Name Competition Most Frequent
0 Pune John Chess,Drawing,Chess Chess
1 Mumbai Boby Table Tennis,Table Tennis,Chess,Carrom Table Tennis
2 Pune John Chess,Carrom Chess,Carrom
3 Mumbai Boby Table Tennis,Chess,Chess,Chess Chess
4 Pune Nicky Carrom Carrom
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.