在 DataFrame 中找到最频繁的组合

Question

I have a DataFrame with two columns From and To , and I need to know the most frequent combination of locations From and To .我有一个 DataFrame，其中包含两列From和To ，我需要知道位置From和To的最常见组合。

Example:例子：

From        To
------------------
Home        Office
Home        Office
Home        Office
Airport     Home
Restaurant  Office

Answer 1

if the order does matter:如果顺序很重要：

df['FROM_TO'] = df['FROM'] + df['TO']

df['COUNT'] = 1

df.groupby(['FROM_TO'])['COUNT'].sum()

gives you all the occurrences in one go. Simply take the max to find the largest occurrence.为您提供一个 go 中的所有事件。只需取最大值即可找到最大的事件。

If the order does matter first sort the values before:如果顺序确实重要，请先对值进行排序：

df.loc[:,:] = np.sort(df.values,axis=1) # if the df only consists of the FROM adn TO columns. df.loc[:,:] = np.sort(df.values,axis=1) # 如果 df 只包含 FROM 和 TO 列。

Answer 2

You can group by the two columns together and count the number of occurrences of each pair, then sort the pairs by this count.您可以将两列组合在一起并计算每对出现的次数，然后按此计数对这些对进行排序。

The following code does the job:以下代码完成了这项工作：

df.groupby(["From", "To"]).size().sort_values(ascending=False)

and, for the example of the question, it returns:并且，对于问题的示例，它返回：

From        To
-----------------------
Home        Office    3
Restaurant  Office    1
Airport     Home      1

Answer 3

IIUC, SeriesGroupBy.value_counts and Series.idxmax IIUC、 SeriesGroupBy.value_counts和Series.idxmax

df.groupby('From')['To'].value_counts().idxmax()

Output Output

('Home', 'Office')

in general groupby.value_counts is faster than groupby.size一般来说groupby.value_counts比groupby.size快

Another way:其它的办法：

df.apply(tuple, axis=1).value_counts().idxmax()

Or要么

df.apply(tuple, axis=1).mode()

Output Output

0    (Home, Office)
dtype: object

在 DataFrame 中找到最频繁的组合

问题描述

3 个解决方案

解决方案1
2 2020-08-03 12:19:45

解决方案2
1 已采纳 2020-08-03 12:24:22

解决方案3
1 2020-08-03 12:29:34

在 DataFrame 中找到最频繁的组合

问题描述

3 个解决方案

解决方案1 2 2020-08-03 12:19:45

解决方案2 1 已采纳 2020-08-03 12:24:22

解决方案3 1 2020-08-03 12:29:34

解决方案1
2 2020-08-03 12:19:45

解决方案2
1 已采纳 2020-08-03 12:24:22

解决方案3
1 2020-08-03 12:29:34