简体   繁体   English

用Python建立列联表

[英]Build Contingency Table in Python

I am trying to build a contingency table in python using pandas. 我正在尝试使用pandas在python中建立一个列联表。 Here is my data looks like in pandas dataframe 这是我的数据看起来像熊猫数据框

InvoiceNo Item Quantity
123        a     1
123        b     2
123        c     1
124        a     1
124        d     3
125        c     1
125        b     2

So, I need to build a table where I can easily pick what are the items bought together like below 因此,我需要建立一个表格,在这里我可以轻松地选择一起购买的物品,如下所示

Item Bought Together: 一起购买的物品:

   a  b  c  d
a  2  1  1  1
b  1  2  2  0
c  1  2  2  0
d  1  0  0  1

Here, the diagonal elements represent the frequency of the item across all the invoices. 在这里,对角线元素代表所有发票中物料的频率。

How can I build this structure efficiently? 如何有效地构建此结构?

Use DataFrame.merge with cross join with crosstab and for cleaning index and columns names DataFrame.rename_axis : DataFrame.merge与交叉crosstab交叉连接一起使用,并用于清理索引和列名DataFrame.rename_axis

df = df.merge(df, on='InvoiceNo')
df = pd.crosstab(df['Item_x'], df['Item_y']).rename_axis(None).rename_axis(None, axis=1)
print (df)
   a  b  c  d
a  2  1  1  1
b  1  2  2  0
c  1  2  2  0
d  1  0  0  1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM