计算 dataframe 列中最常见的值组合

Question

I have DataFrame in the following form:我有以下形式的 DataFrame：

ID Product
1   A
1   B
2   A 
3   A
3   C 
3   D 
4   A
4   B

I would like to count the most common combination of two values from Product column grouped by ID .我想计算按ID分组的Product列中两个值的最常见组合。 So for this example expected result would be:因此，对于此示例，预期结果将是：

Combination Count
A-B          2
A-C          1
A-D          1
C-D          1

Is this output possible with pandas?这个 output 是否可以与 pandas 一起使用？

Answer 1

We can merge within ID and filter out duplicate merges (I assume you have a default RangeIndex ).我们可以在 ID 内merge并过滤掉重复的合并（我假设你有一个默认的RangeIndex ）。 Then we sort so that the grouping is regardless of order:然后我们排序，使分组不分先后：

import pandas as pd
import numpy as np

df1 = df.reset_index()
df1 = df1.merge(df1, on='ID').query('index_x > index_y')

df1 = pd.DataFrame(np.sort(df1[['Product_x', 'Product_y']].to_numpy(), axis=1))
df1.groupby([*df1]).size()

0  1
A  B    2
   C    1
   D    1
C  D    1
dtype: int64

Answer 2

You can use combinations from itertools along with groupby and apply您可以使用itertools中的combinations以及groupby并apply

from itertools import combinations

def get_combs(x):
    return pd.DataFrame({'Combination': list(combinations(x.Product.values, 2))})

(df.groupby('ID').apply(get_combs)
 .reset_index(level=0)
 .groupby('Combination')
 .count()
)

             ID
Combination    
(A, B)        2
(A, C)        1
(A, D)        1
(C, D)        1

Answer 3

Use itertools.combinations , explode and value_counts使用itertools.combinations 、 explode和value_counts

import itertools

(df.groupby('ID').Product.agg(lambda x: list(itertools.combinations(x,2)))
                 .explode().str.join('-').value_counts())

Out[611]:
A-B    2
C-D    1
A-D    1
A-C    1
Name: Product, dtype: int64

Or:或者：

import itertools

(df.groupby('ID').Product.agg(lambda x: list(map('-'.join, itertools.combinations(x,2))))
                 .explode().value_counts())

Out[597]:
A-B    2
C-D    1
A-D    1
A-C    1
Name: Product, dtype: int64

Answer 4

Using itertools and Counter .使用itertools和Counter 。

import itertools
from collections import Counter

agg_ = lambda x: tuple(itertools.combinations(x, 2))
product = list(itertools.chain(*df.groupby('ID').agg({'Product': lambda x: agg_(sorted(x))}).Product))
# You actually do not need to wrap product with list. The generator is ok
counts = Counter(product)

Output Output

Counter({('A', 'B'): 2, ('A', 'C'): 1, ('A', 'D'): 1, ('C', 'D'): 1})

You could also do the following to get a dataframe您还可以执行以下操作以获得 dataframe

pd.DataFrame(list(counts.items()), columns=['combination', 'count'])

  combination  count
0      (A, B)      2
1      (A, C)      1
2      (A, D)      1
3      (C, D)      1

Answer 5

Another trick with itertools.combinations function: itertools.combinations function 的另一个技巧：

from itertools import combinations
import pandas as pd

test_df = ... # your df
counts_df = test_df.groupby('ID')['Product'].agg(lambda x: list(combinations(x, 2)))\
    .apply(pd.Series).stack().value_counts().to_frame()\
    .reset_index().rename(columns={'index': 'Combination', 0:'Count'})
print(counts_df)

The output: output：

  Combination  Count
0      (A, B)      2
1      (A, C)      1
2      (A, D)      1
3      (C, D)      1

计算 dataframe 列中最常见的值组合

问题描述

5 个解决方案

解决方案1
5 2019-09-19 19:57:26

解决方案2
2 2019-09-19 20:12:25

解决方案3
2 已采纳 2019-09-19 20:13:48

解决方案4
2 2019-09-19 20:14:57

解决方案5
1 2019-09-19 20:18:07

计算 dataframe 列中最常见的值组合

问题描述

5 个解决方案

解决方案1 5 2019-09-19 19:57:26

解决方案2 2 2019-09-19 20:12:25

解决方案3 2 已采纳 2019-09-19 20:13:48

解决方案4 2 2019-09-19 20:14:57

解决方案5 1 2019-09-19 20:18:07

解决方案1
5 2019-09-19 19:57:26

解决方案2
2 2019-09-19 20:12:25

解决方案3
2 已采纳 2019-09-19 20:13:48

解决方案4
2 2019-09-19 20:14:57

解决方案5
1 2019-09-19 20:18:07