简体   繁体   English

将两个 DataFrame 的列之间的重叠计算为方形共现矩阵

[英]Compute overlap between columns of two DataFrames as a square co-occurrence matrix

I am looking for the overlap between two dataframes, column by column.我正在逐列寻找两个数据框之间的重叠。

df1 = pd.DataFrame({'V1':['a', 'b', 'c'], 'V2':['d', 'e','f'],'V3':['g','h','i'})
df2 = pd.DataFrame({'X1':['e', 'b', 'd'], 'X2':['a', 'h','i'],'X3':['c','f','g'})

Logic:逻辑:

  • V1,X1 = 1 (because b occurs once in X1,a occurs 0 an c occurs 0) V1,X1 = 1(因为 b 在 X1 中出现一次,a 出现 0 和 c 出现 0)
  • V1,X2 = 1 (a occurs once in X2, etc) V1,X2 = 1(a 在 X2 中出现一次,依此类推)
  • V1,X3 = 1 (c occurs once in X3, etc.) V1,X3 = 1(c 在 X3 中出现一次,等等)
  • V2,X1 = 2 (d and e co-occur) V2,X1 = 2(d 和 e 同时出现)
  • V2,X2 = 0 V2,X2 = 0
  • V2,X3 = 1 (f only) V2,X3 = 1(仅 f)
  • V3,X1 = 0 V3,X1 = 0
  • V3,X2 = 2 (h and i co-occur) V3,X2 = 2(h 和 i 同时出现)
  • V3,X3 = 1 (g) V3,X3 = 1 (克)

with one row per V and Xs as columns.每个 V 和 Xs 一行作为列。

Expected result:预期结果:

    X1  X2  X3
V1   1   1   1
V2   2   0   1
V3   0   2   1

I have tried a couple of variations of intersection trying to iterate over columns.我尝试了几种交叉的变体,试图迭代列。 Seems like wrong path.好像走错了路。

You can do this with an outer equality comparison with NumPy:您可以通过与 NumPy 的外部相等比较来做到这一点:

pd.DataFrame(np.equal.outer(df1, df2).sum(axis=(0, 2)), 
             index=df1.columns, 
             columns=df2.columns)

    X1  X2  X3
V1   1   1   1
V2   2   0   1
V3   0   2   1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM