简体   繁体   English

在 DataFrame 中找到最频繁的组合

[英]Finding the most frequent combination in DataFrame

I have a DataFrame with two columns From and To , and I need to know the most frequent combination of locations From and To .我有一个 DataFrame,其中包含两列FromTo ,我需要知道位置FromTo的最常见组合。

Example:例子:

From        To
------------------
Home        Office
Home        Office
Home        Office
Airport     Home
Restaurant  Office

if the order does matter:如果顺序很重要:

df['FROM_TO'] = df['FROM'] + df['TO']

df['COUNT'] = 1

df.groupby(['FROM_TO'])['COUNT'].sum()

gives you all the occurrences in one go. Simply take the max to find the largest occurrence.为您提供一个 go 中的所有事件。只需取最大值即可找到最大的事件。

If the order does matter first sort the values before:如果顺序确实重要,请先对值进行排序:

df.loc[:,:] = np.sort(df.values,axis=1) # if the df only consists of the FROM adn TO columns. df.loc[:,:] = np.sort(df.values,axis=1) # 如果 df 只包含 FROM 和 TO 列。

You can group by the two columns together and count the number of occurrences of each pair, then sort the pairs by this count.您可以将两列组合在一起并计算每对出现的次数,然后按此计数对这些对进行排序。

The following code does the job:以下代码完成了这项工作:

df.groupby(["From", "To"]).size().sort_values(ascending=False)

and, for the example of the question, it returns:并且,对于问题的示例,它返回:

From        To
-----------------------
Home        Office    3
Restaurant  Office    1
Airport     Home      1

IIUC, SeriesGroupBy.value_counts and Series.idxmax IIUC、 SeriesGroupBy.value_countsSeries.idxmax

df.groupby('From')['To'].value_counts().idxmax()

Output Output

('Home', 'Office')

in general groupby.value_counts is faster than groupby.size一般来说groupby.value_countsgroupby.size

Another way:其它的办法:

df.apply(tuple, axis=1).value_counts().idxmax()

Or要么

df.apply(tuple, axis=1).mode()

Output Output

0    (Home, Office)
dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM