Hi I have two Dataframe as given below
df1 = pd.DataFrame.from_dict(({"Column":{"0":"A","1":"B","2":"C","3":"A"},"Column2":{"0":"T1","1":"T2","2":"T1","3":"T1"}}))
Then I created another dataframe using below statement
df2 = pd.DataFrame(np.zeros(shape=(df1.shape[0],df1.shape[0])), columns=df1['Column'].values, index=df1['Column'].values)
now i need to update df2 as if index is equals to column then assign value 1 if index is not equal to column then check in df1 if for that index and column value column2 value matches then assign value 2 else assign 3
Expected result:
Can we achieve it without using for loops?
Note: Shape and values of df1 can be different every time,
Use:
# STEP 1
df1 = df1.set_index(df1['Column'] + '_' + df1.groupby('Column').cumcount().astype(str))
df2 = pd.DataFrame(np.zeros(shape=(df1.shape[0],df1.shape[0])), columns=df1.index, index=df1.index)
# STEP 2
df2 = df2.reset_index().melt('index', var_name='column')
# STEP 3:
m1 = df2['index'].str.replace(r'(_\d+)$', '').eq(df2['column'].str.replace(r'(_\d+)$', ''))
# STEP 4
m2 = df1.lookup(df2['index'], ['Column2']*df2.shape[0]) == df1.lookup(df2['column'], ['Column2'] * df2.shape[0])
# STEP 5
df2['value'] = np.select([m1, m2], [1, 2], 3)
# STEP 6:
df2 = df2.pivot('index', 'column', 'value').rename_axis(index=None, columns=None)
# STEP 7: RESULT
df2 = df2.reindex(index=df1.index, columns=df1.index)
df2.index = df2.index.str.replace(r'(_\d+)$', '')
df2.columns = df2.columns.str.replace(r'(_\d+)$', '')
STEPS:
STEP 1: As the original dataframe contain duplicate values, we can use use df.groupby
on Column
and use cumcount
and concatenate it with df['Column']
to create a unique index in df1
. Then we can initialise the new dataframe df2
from the dataframe df1
.
# STEP 1
# print(df2)
A_0 B_0 C_0 A_1
A_0 0.0 0.0 0.0 0.0
B_0 0.0 0.0 0.0 0.0
C_0 0.0 0.0 0.0 0.0
A_1 0.0 0.0 0.0 0.0
STEP 2: Use DataFrame.melt
to unpivot the dataframe.
# STEP 2
# print(df2)
index column value
0 A_0 A_0 0.0
1 B_0 A_0 0.0
2 C_0 A_0 0.0
3 A_1 A_0 0.0
4 A_0 B_0 0.0
5 B_0 B_0 0.0
6 C_0 B_0 0.0
7 A_1 B_0 0.0
8 A_0 C_0 0.0
9 B_0 C_0 0.0
10 C_0 C_0 0.0
11 A_1 C_0 0.0
12 A_0 A_1 0.0
13 B_0 A_1 0.0
14 C_0 A_1 0.0
15 A_1 A_1 0.0
STEP 3: Using Series.equals
create a boolean mask m1
, which correspond to condition where index
in df2
equals to column
in df2
.
# STEP 3
# print(m1)
[True, False, False, True, False, True, False, False, False, False, True, False, True, False, False, True]
STEP 4: Use DataFrame.lookup
to create a boolean mask m2
which corresponds to the condition where the values corresponding to index
and column
of df2
in df1['Column2']
matches.
# STEP 4
# print(m2)
[True, False, True, True, False, True, False, False, True, False, True, True, True, False, True, True]
STEP 5: Use np.select
to choose elements from [1, 2]
based on conditions from [m1, m2]
otherwise choose the default value 3
.
# STEP 5
# print(df2)
index column value
0 A_0 A_0 1
1 B_0 A_0 3
2 C_0 A_0 2
3 A_1 A_0 1
4 A_0 B_0 3
5 B_0 B_0 1
6 C_0 B_0 3
7 A_1 B_0 3
8 A_0 C_0 2
9 B_0 C_0 3
10 C_0 C_0 1
11 A_1 C_0 2
12 A_0 A_1 1
13 B_0 A_1 3
14 C_0 A_1 2
15 A_1 A_1 1
STEP 6: Use DataFrame.pivot
to reshape dataframe based on index
and column
values.
# STEP 6:
# print(df2)
A_0 A_1 B_0 C_0
A_0 1 1 3 2
A_1 1 1 3 2
B_0 3 3 1 3
C_0 2 2 3 1
STEP 7: Use DataFrame.reindex
to reindex(rearrange) the index and columns of df2
according the index of df1
. Then using Series.str.replace
, remove the counter portion in the index and columns which has been added in STEP 1.
# STEP 7: RESULT
# print(df2)
A B C A
A 1 3 2 1
B 3 1 3 3
C 2 3 1 2
A 1 3 2 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.