简体   繁体   中英

Compute overlap between columns of two DataFrames as a square co-occurrence matrix

I am looking for the overlap between two dataframes, column by column.

df1 = pd.DataFrame({'V1':['a', 'b', 'c'], 'V2':['d', 'e','f'],'V3':['g','h','i'})
df2 = pd.DataFrame({'X1':['e', 'b', 'd'], 'X2':['a', 'h','i'],'X3':['c','f','g'})

Logic:

  • V1,X1 = 1 (because b occurs once in X1,a occurs 0 an c occurs 0)
  • V1,X2 = 1 (a occurs once in X2, etc)
  • V1,X3 = 1 (c occurs once in X3, etc.)
  • V2,X1 = 2 (d and e co-occur)
  • V2,X2 = 0
  • V2,X3 = 1 (f only)
  • V3,X1 = 0
  • V3,X2 = 2 (h and i co-occur)
  • V3,X3 = 1 (g)

with one row per V and Xs as columns.

Expected result:

    X1  X2  X3
V1   1   1   1
V2   2   0   1
V3   0   2   1

I have tried a couple of variations of intersection trying to iterate over columns. Seems like wrong path.

You can do this with an outer equality comparison with NumPy:

pd.DataFrame(np.equal.outer(df1, df2).sum(axis=(0, 2)), 
             index=df1.columns, 
             columns=df2.columns)

    X1  X2  X3
V1   1   1   1
V2   2   0   1
V3   0   2   1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM