Compare the values of index with column names ; Python Pandas

Question

Hi I have two Dataframe as given below

df1 = pd.DataFrame.from_dict(({"Column":{"0":"A","1":"B","2":"C","3":"A"},"Column2":{"0":"T1","1":"T2","2":"T1","3":"T1"}}))

Then I created another dataframe using below statement

df2 = pd.DataFrame(np.zeros(shape=(df1.shape[0],df1.shape[0])), columns=df1['Column'].values, index=df1['Column'].values)

now i need to update df2 as if index is equals to column then assign value 1 if index is not equal to column then check in df1 if for that index and column value column2 value matches then assign value 2 else assign 3

Expected result:

Can we achieve it without using for loops?

Note: Shape and values of df1 can be different every time,

Answer 1

Use:

# STEP 1
df1 = df1.set_index(df1['Column'] + '_' + df1.groupby('Column').cumcount().astype(str))
df2 = pd.DataFrame(np.zeros(shape=(df1.shape[0],df1.shape[0])), columns=df1.index, index=df1.index)

# STEP 2
df2 = df2.reset_index().melt('index', var_name='column')

# STEP 3:
m1 =  df2['index'].str.replace(r'(_\d+)$', '').eq(df2['column'].str.replace(r'(_\d+)$', ''))

# STEP 4
m2 = df1.lookup(df2['index'], ['Column2']*df2.shape[0]) == df1.lookup(df2['column'], ['Column2'] * df2.shape[0])

# STEP 5
df2['value'] = np.select([m1, m2], [1, 2], 3)

# STEP 6:
df2 = df2.pivot('index', 'column', 'value').rename_axis(index=None, columns=None)

# STEP 7: RESULT
df2 = df2.reindex(index=df1.index, columns=df1.index)
df2.index = df2.index.str.replace(r'(_\d+)$', '')
df2.columns = df2.columns.str.replace(r'(_\d+)$', '')

STEPS:

STEP 1: As the original dataframe contain duplicate values, we can use use df.groupby on Column and use cumcount and concatenate it with df['Column'] to create a unique index in df1 . Then we can initialise the new dataframe df2 from the dataframe df1 .

# STEP 1
# print(df2)
     A_0  B_0  C_0  A_1
A_0  0.0  0.0  0.0  0.0
B_0  0.0  0.0  0.0  0.0
C_0  0.0  0.0  0.0  0.0
A_1  0.0  0.0  0.0  0.0

STEP 2: Use DataFrame.melt to unpivot the dataframe.

# STEP 2
# print(df2)
   index column  value
0    A_0    A_0    0.0
1    B_0    A_0    0.0
2    C_0    A_0    0.0
3    A_1    A_0    0.0
4    A_0    B_0    0.0
5    B_0    B_0    0.0
6    C_0    B_0    0.0
7    A_1    B_0    0.0
8    A_0    C_0    0.0
9    B_0    C_0    0.0
10   C_0    C_0    0.0
11   A_1    C_0    0.0
12   A_0    A_1    0.0
13   B_0    A_1    0.0
14   C_0    A_1    0.0
15   A_1    A_1    0.0

STEP 3: Using Series.equals create a boolean mask m1 , which correspond to condition where index in df2 equals to column in df2 .

# STEP 3
# print(m1)
[True, False, False, True, False, True, False, False, False, False, True, False, True, False, False, True]

STEP 4: Use DataFrame.lookup to create a boolean mask m2 which corresponds to the condition where the values corresponding to index and column of df2 in df1['Column2'] matches.

# STEP 4
# print(m2)
[True, False, True, True, False, True, False, False, True, False, True, True, True, False, True, True]

STEP 5: Use np.select to choose elements from [1, 2] based on conditions from [m1, m2] otherwise choose the default value 3 .

# STEP 5
# print(df2)
   index column  value
0    A_0    A_0      1
1    B_0    A_0      3
2    C_0    A_0      2
3    A_1    A_0      1
4    A_0    B_0      3
5    B_0    B_0      1
6    C_0    B_0      3
7    A_1    B_0      3
8    A_0    C_0      2
9    B_0    C_0      3
10   C_0    C_0      1
11   A_1    C_0      2
12   A_0    A_1      1
13   B_0    A_1      3
14   C_0    A_1      2
15   A_1    A_1      1

STEP 6: Use DataFrame.pivot to reshape dataframe based on index and column values.

# STEP 6:
# print(df2)
     A_0  A_1  B_0  C_0
A_0    1    1    3    2
A_1    1    1    3    2
B_0    3    3    1    3
C_0    2    2    3    1

STEP 7: Use DataFrame.reindex to reindex(rearrange) the index and columns of df2 according the index of df1 . Then using Series.str.replace , remove the counter portion in the index and columns which has been added in STEP 1.

# STEP 7: RESULT
# print(df2)
   A  B  C  A
A  1  3  2  1
B  3  1  3  3
C  2  3  1  2
A  1  3  2  1

Compare the values of index with column names ; Python Pandas

Question

1 answers

solution1
2 ACCPTED 2020-06-15 09:15:58

Compare the values of index with column names ; Python Pandas

Question

1 answers

solution1 2 ACCPTED 2020-06-15 09:15:58

solution1
2 ACCPTED 2020-06-15 09:15:58