[英]Counting most common combination of values in dataframe column
I have DataFrame in the following form:我有以下形式的 DataFrame:
ID Product
1 A
1 B
2 A
3 A
3 C
3 D
4 A
4 B
I would like to count the most common combination of two values from Product
column grouped by ID
.我想计算按
ID
分组的Product
列中两个值的最常见组合。 So for this example expected result would be:因此,对于此示例,预期结果将是:
Combination Count
A-B 2
A-C 1
A-D 1
C-D 1
Is this output possible with pandas?这个 output 是否可以与 pandas 一起使用?
We can merge
within ID and filter out duplicate merges (I assume you have a default RangeIndex
).我们可以在 ID 内
merge
并过滤掉重复的合并(我假设你有一个默认的RangeIndex
)。 Then we sort so that the grouping is regardless of order:然后我们排序,使分组不分先后:
import pandas as pd
import numpy as np
df1 = df.reset_index()
df1 = df1.merge(df1, on='ID').query('index_x > index_y')
df1 = pd.DataFrame(np.sort(df1[['Product_x', 'Product_y']].to_numpy(), axis=1))
df1.groupby([*df1]).size()
0 1
A B 2
C 1
D 1
C D 1
dtype: int64
You can use combinations
from itertools
along with groupby
and apply
您可以使用
itertools
中的combinations
以及groupby
并apply
from itertools import combinations
def get_combs(x):
return pd.DataFrame({'Combination': list(combinations(x.Product.values, 2))})
(df.groupby('ID').apply(get_combs)
.reset_index(level=0)
.groupby('Combination')
.count()
)
ID
Combination
(A, B) 2
(A, C) 1
(A, D) 1
(C, D) 1
Use itertools.combinations
, explode
and value_counts
使用
itertools.combinations
、 explode
和value_counts
import itertools
(df.groupby('ID').Product.agg(lambda x: list(itertools.combinations(x,2)))
.explode().str.join('-').value_counts())
Out[611]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
Or:或者:
import itertools
(df.groupby('ID').Product.agg(lambda x: list(map('-'.join, itertools.combinations(x,2))))
.explode().value_counts())
Out[597]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
Using itertools
and Counter
.使用
itertools
和Counter
。
import itertools
from collections import Counter
agg_ = lambda x: tuple(itertools.combinations(x, 2))
product = list(itertools.chain(*df.groupby('ID').agg({'Product': lambda x: agg_(sorted(x))}).Product))
# You actually do not need to wrap product with list. The generator is ok
counts = Counter(product)
Output Output
Counter({('A', 'B'): 2, ('A', 'C'): 1, ('A', 'D'): 1, ('C', 'D'): 1})
You could also do the following to get a dataframe您还可以执行以下操作以获得 dataframe
pd.DataFrame(list(counts.items()), columns=['combination', 'count'])
combination count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
Another trick with itertools.combinations
function: itertools.combinations
function 的另一个技巧:
from itertools import combinations
import pandas as pd
test_df = ... # your df
counts_df = test_df.groupby('ID')['Product'].agg(lambda x: list(combinations(x, 2)))\
.apply(pd.Series).stack().value_counts().to_frame()\
.reset_index().rename(columns={'index': 'Combination', 0:'Count'})
print(counts_df)
The output: output:
Combination Count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.