[英]Finding the most frequent combination in DataFrame
I have a DataFrame with two columns From
and To
, and I need to know the most frequent combination of locations From
and To
.我有一个 DataFrame,其中包含两列
From
和To
,我需要知道位置From
和To
的最常见组合。
Example:例子:
From To
------------------
Home Office
Home Office
Home Office
Airport Home
Restaurant Office
if the order does matter:如果顺序很重要:
df['FROM_TO'] = df['FROM'] + df['TO']
df['COUNT'] = 1
df.groupby(['FROM_TO'])['COUNT'].sum()
gives you all the occurrences in one go. Simply take the max to find the largest occurrence.为您提供一个 go 中的所有事件。只需取最大值即可找到最大的事件。
If the order does matter first sort the values before:如果顺序确实重要,请先对值进行排序:
df.loc[:,:] = np.sort(df.values,axis=1) # if the df only consists of the FROM adn TO columns. df.loc[:,:] = np.sort(df.values,axis=1) # 如果 df 只包含 FROM 和 TO 列。
You can group by the two columns together and count the number of occurrences of each pair, then sort the pairs by this count.您可以将两列组合在一起并计算每对出现的次数,然后按此计数对这些对进行排序。
The following code does the job:以下代码完成了这项工作:
df.groupby(["From", "To"]).size().sort_values(ascending=False)
and, for the example of the question, it returns:并且,对于问题的示例,它返回:
From To
-----------------------
Home Office 3
Restaurant Office 1
Airport Home 1
IIUC, SeriesGroupBy.value_counts
and Series.idxmax
IIUC、
SeriesGroupBy.value_counts
和Series.idxmax
df.groupby('From')['To'].value_counts().idxmax()
Output Output
('Home', 'Office')
in general groupby.value_counts
is faster than groupby.size
一般来说
groupby.value_counts
比groupby.size
快
Another way:其它的办法:
df.apply(tuple, axis=1).value_counts().idxmax()
Or要么
df.apply(tuple, axis=1).mode()
Output Output
0 (Home, Office)
dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.