简体   繁体   中英

Sort the rows of a dataframe and get the column values in pandas dataframe

My dataframe looks like this:

df
      5     1     2     4    3     0    pred_val true_value rank 
  0  0.3   0.2   0.1   0.5  0.25  0.4      4        2        6
  1  0.36  0.24  0.12  0.5  0.45  0.4      4        3        2  

I want to predict the values of rank column based on my true value. If the predicted value (pred_val) is same as the true_value then rank = 1 which can be achieved by using np.where. But if they do not match, then the true_value is searched in the all the columns named from 0-5. And this true value is given rank according to the cell value under it.

Like in 0th row true value is 2 and pred_value is 4 do not match, then we search it in the column 2 which has the value 0.1 and this is the lowest among all 0-5 column values for 0th row, hence given 6th rank.

How can I do this?

I think this is what you may be looking for

df

       5     1     2      4     3    0  pred_val    true_value
0    0.3   0.2   0.1    0.5  0.25  0.4         4             2
1   0.36  0.24  0.12    0.5  0.45  0.4         4             3 


df['rank'] = df.apply(lambda row: row[[0,1,2,3,4,5]].sort_values(ascending=False).index.get_loc(row.true_value) + 1, axis=1)
df

       5       1       2      4     3     0  pred_val   true_value  rank
0    0.3     0.2     0.1    0.5  0.25   0.4         4            2     6
1   0.36    0.24    0.12    0.5  0.45   0.4         4            3     2

If you want to use list comprehension:

# set a string
df['truevalue'] = df['truevalue'].astype(str)

# list comprehension to get (index,col) pairs
vals = [x for x in enumerate(df['truevalue'])]

# use rank and list comprehension
df['rank'] = [int(df[df.columns[:6].values].rank(1, ascending=False).loc[val]) for val in vals]

      5     1     2    4     3    0  predval truevalue  rank
0  0.30  0.20  0.10  0.5  0.25  0.4        4         2     6
1  0.36  0.24  0.12  0.5  0.45  0.4        4         3     2

Input:

    5        1       2       4   3       0  pred_val    true_value  rank
0   0.30    0.20    0.10    0.5 0.25    0.4   4           2          0
1   0.36    0.24    0.12    0.5 0.45    0.4   4           3          0

Do this,

for i in range(len(df)):
    t_val = df['true_value'][i]
    cols_vals = sorted(list(df.loc[i, ['5', '1', '2', '4', '3', '0']].values), reverse = True)
    rank = cols_vals.index(df[str(t_val)][i]) + 1
    df.loc[i, 'rank'] = rank

Output:

     5       1       2       4   3       0  pred_val true_value rank
0   0.30    0.20    0.10    0.5 0.25    0.4   4       2          6
1   0.36    0.24    0.12    0.5 0.45    0.4   4       3          2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM