Create a matrix from two columns

Question

I'm trying to create a matrix from two columns within an excel sheet. The first column is a key with multiple repeating instances and the second column references the different values tied to the key. I'd like to be able to create a matrix of all the values in the second column to reference the number of times they are paired together for all the key instances.

   a                b
   1               red
   1               blue
   1               green
   2               yellow
   2               red
   3               blue
   3               green
   3               yellow

and I'd like to turn this sample dataframe into

color      red   blue   yellow   green
red         0      1       1       1
blue        1      0       1       2
yellow      1      1       0       1
green       1      2       1       0

Essentially using column a as a groupby() to segment each key then making counts of the relationships encountered as a running tally. Can't quite figure out how to implement a pivot table or a cross tab to accomplish this (if that's even the best route).

Answer 1

import numpy as np
import pandas as pd


s = pd.crosstab(df.a, df.b)  # crosstabulate
s = s.T @ s  # transpose and take dot product
np.fill_diagonal(s.values, 0)  # Fill the diagonals with 0

print(s)

b       blue  green  red  yellow
b                               
blue       0      2    1       1
green      2      0    1       1
red        1      1    0       1
yellow     1      1    1       0

Answer 2

This looks like an outer join so I went with that:

df = pd.DataFrame( {'a': [1,1,1,2,2,3,3,3],
                    'b':['red', 'blue', 'green', 'yellow', 'red', 'blue', 'green', 'yellow']})

df_count = df.merge(df, on = 'a').groupby(['b_x', 'b_y']).count().reset_index().pivot(index = 'b_x', columns='b_y', values='a')
np.fill_diagonal(df_count.values, 0)

df_count.index.name='color'
df_count.columns.name=None

        blue    green   red yellow
color               
blue    0   2   1   1
green   2   0   1   1
red     1   1   0   1
yellow  1   1   1   0

Answer 3

Use how='crosstab' as parameter of pd.merge . I assume you have no ('a', 'b') duplicates like two (1, red).

out = (
  pd.merge(df, df, how='cross').query('a_x == a_y & b_x != b_y')[['b_x', 'b_y']] \
    .assign(dummy=1).pivot_table('dummy', 'b_x', 'b_y', 'count', fill_value=0) \
    .rename_axis(index=None, columns=None)
)
print(out)

# Output:
        blue  green  red  yellow
blue       0      2    1       1
green      2      0    1       1
red        1      1    0       1
yellow     1      1    1       0

Create a matrix from two columns

Question

3 answers

solution1
1 2021-12-15 22:45:56

solution2
0 2021-12-15 22:25:51

solution3
0 2021-12-15 22:28:34

Create a matrix from two columns

Question

3 answers

solution1 1 2021-12-15 22:45:56

solution2 0 2021-12-15 22:25:51

solution3 0 2021-12-15 22:28:34

solution1
1 2021-12-15 22:45:56

solution2
0 2021-12-15 22:25:51

solution3
0 2021-12-15 22:28:34