[英]Build Contingency Table in Python
I am trying to build a contingency table in python using pandas. 我正在尝试使用pandas在python中建立一个列联表。 Here is my data looks like in pandas dataframe
这是我的数据看起来像熊猫数据框
InvoiceNo Item Quantity
123 a 1
123 b 2
123 c 1
124 a 1
124 d 3
125 c 1
125 b 2
So, I need to build a table where I can easily pick what are the items bought together like below 因此,我需要建立一个表格,在这里我可以轻松地选择一起购买的物品,如下所示
Item Bought Together: 一起购买的物品:
a b c d
a 2 1 1 1
b 1 2 2 0
c 1 2 2 0
d 1 0 0 1
Here, the diagonal elements represent the frequency of the item across all the invoices. 在这里,对角线元素代表所有发票中物料的频率。
How can I build this structure efficiently? 如何有效地构建此结构?
Use DataFrame.merge
with cross join with crosstab
and for cleaning index and columns names DataFrame.rename_axis
: 将
DataFrame.merge
与交叉crosstab
交叉连接一起使用,并用于清理索引和列名DataFrame.rename_axis
:
df = df.merge(df, on='InvoiceNo')
df = pd.crosstab(df['Item_x'], df['Item_y']).rename_axis(None).rename_axis(None, axis=1)
print (df)
a b c d
a 2 1 1 1
b 1 2 2 0
c 1 2 2 0
d 1 0 0 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.