The Data
I've got a dataframe that has rank scores for a given ID:
>>> ranks
ID rank
0 A 6
1 B 9
2 C 6
3 D 1
4 E 1
5 F 2
I would like to turn this into a square matrix with each ID
as both an index and a column, based on several conditions: if the rank of an ID on the index is higher than the rank of the ID in the column, set it to 1, if it is lower, set it to 0, if it is equal, set it to 0.5, and if the index is the same as the column, set it to np.nan
. This is better described by looking at my desired matrix:
Desired Result
>>> mtrx
A B C D E F
A NaN 1.0 0.5 0.0 0.0 0.0
B 0.0 NaN 0.0 0.0 0.0 0.0
C 0.5 1.0 NaN 0.0 0.0 0.0
D 1.0 1.0 1.0 NaN 0.5 1.0
E 1.0 1.0 1.0 0.5 NaN 1.0
F 1.0 1.0 1.0 0.0 0.0 NaN
What I've Done (works, but is slow)
The following loop works, but with larger dataframes, it is slow. If someone can point me in the direction of a nicer more pythonic/pandorable way to achieve this, I'd love some help:
# Make an empty matrix as a dataframe
mtrx = pd.DataFrame(np.zeros((len(IDs), len(IDs))), index=IDs, columns = IDs)
# Populate it via for loop
for i in IDs:
for j in IDs:
i_rank = ranks.loc[ranks['ID'] == i].iloc[0]['rank']
j_rank = ranks.loc[ranks['ID'] == j].iloc[0]['rank']
if i == j:
mtrx.loc[i, j] = np.nan
elif i_rank < j_rank:
mtrx.loc[i, j] = 1.
elif i_rank == j_rank:
mtrx.loc[i, j] = 0.5
Code to reproduce this toy example
import pandas as pd
import numpy as np
np.random.seed(1)
IDs = list('ABCDEF')
ranks = pd.DataFrame({'ID':IDs, 'rank':np.random.randint(1,10,len(IDs))})
numpy
approach
s=ranks['rank'].values
s1=(s>s[:,None]).astype(int).astype(float)
s1[s==s[:,None]]=0.5
s1[[np.arange(len(s))]*2] = np.nan
pd.DataFrame(s1,index=ranks.ID,columns=ranks.ID)
Out[843]:
ID A B C D E F
ID
A NaN 1.0 0.5 0.0 0.0 0.0
B 0.0 NaN 0.0 0.0 0.0 0.0
C 0.5 1.0 NaN 0.0 0.0 0.0
D 1.0 1.0 1.0 NaN 0.5 1.0
E 1.0 1.0 1.0 0.5 NaN 1.0
F 1.0 1.0 1.0 0.0 0.0 NaN
pandas approach
s=ranks.assign(key=1).merge(ranks.assign(key=1),on='key')
s['New']=(s['rank_x']<s['rank_y']).astype(int)
s.loc[s['rank_x']==s['rank_y'],'New']=0.5
s.loc[s['ID_x']==s['ID_y'],'New']=np.nan
s.set_index(['ID_x','ID_y']).New.unstack()
Out[854]:
ID_y A B C D E F
ID_x
A NaN 1.0 0.5 0.0 0.0 0.0
B 0.0 NaN 0.0 0.0 0.0 0.0
C 0.5 1.0 NaN 0.0 0.0 0.0
D 1.0 1.0 1.0 NaN 0.5 1.0
E 1.0 1.0 1.0 0.5 NaN 1.0
F 1.0 1.0 1.0 0.0 0.0 NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.