[英]how to create a matrix when two values are in the same groupby column pandas?
So i basically have a dataframe of products and orders:所以我基本上有一个产品和订单的数据框:
product order
apple 111
orange 111
apple 121
beans 121
rice 131
orange 131
apple 141
orange 141
What i need to do is, groupby the products based on the id of the order, and generate this matrix with the value of times they appeared together in the same order.我需要做的是,根据订单的 id 对产品进行分组,并使用它们以相同顺序一起出现的次数来生成这个矩阵。 I don't know any efficient way of doing this, if someone could help me!
我不知道这样做的任何有效方法,如果有人可以帮助我!
apple orange beans rice
apple x 2 1 0
orange 2 x 0 1
beans 1 0 x 0
rice 0 1 0 x
One option is to join the dataframe with itself on order
and then calculate the cooccurrences using crosstab
on the two product
columns:一种选择是按
order
将数据框与自身连接,然后在两个product
列上使用crosstab
计算共现:
df.merge(df, on='order').pipe(lambda df: pd.crosstab(df.product_x, df.product_y))
product_y apple beans orange rice
product_x
apple 3 1 2 0
beans 1 1 0 0
orange 2 0 3 1
rice 0 0 1 1
Another way is to perform a crosstab
between product and order, then do a matrix multiplication @
with the transpose so:另一种方法是在产品和订单之间执行
crosstab
,然后使用转置进行矩阵乘法@
,这样:
a_ = pd.crosstab(df['product'], df['order'])
res = a_@a_.T
print(res)
product apple beans orange rice
product
apple 3 1 2 0
beans 1 1 0 0
orange 2 0 3 1
rice 0 0 1 1
or using pipe
to do a one liner:或使用
pipe
做单衬:
res = pd.crosstab(df['product'], df['order']).pipe(lambda x: x@x.T)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.