Compute overlap between columns of two DataFrames as a square co-occurrence matrix

Question

I am looking for the overlap between two dataframes, column by column.

df1 = pd.DataFrame({'V1':['a', 'b', 'c'], 'V2':['d', 'e','f'],'V3':['g','h','i'})
df2 = pd.DataFrame({'X1':['e', 'b', 'd'], 'X2':['a', 'h','i'],'X3':['c','f','g'})

Logic:

V1,X1 = 1 (because b occurs once in X1,a occurs 0 an c occurs 0)
V1,X2 = 1 (a occurs once in X2, etc)
V1,X3 = 1 (c occurs once in X3, etc.)
V2,X1 = 2 (d and e co-occur)
V2,X2 = 0
V2,X3 = 1 (f only)
V3,X1 = 0
V3,X2 = 2 (h and i co-occur)
V3,X3 = 1 (g)

with one row per V and Xs as columns.

Expected result:

    X1  X2  X3
V1   1   1   1
V2   2   0   1
V3   0   2   1

I have tried a couple of variations of intersection trying to iterate over columns. Seems like wrong path.

Answer 1

You can do this with an outer equality comparison with NumPy:

pd.DataFrame(np.equal.outer(df1, df2).sum(axis=(0, 2)), 
             index=df1.columns, 
             columns=df2.columns)

    X1  X2  X3
V1   1   1   1
V2   2   0   1
V3   0   2   1

Compute overlap between columns of two DataFrames as a square co-occurrence matrix

Question

1 answers

solution1
2 ACCPTED 2020-04-26 20:43:23

Compute overlap between columns of two DataFrames as a square co-occurrence matrix

Question

1 answers

solution1 2 ACCPTED 2020-04-26 20:43:23

solution1
2 ACCPTED 2020-04-26 20:43:23