简体   繁体   中英

pandas rank function 2 columns for high and low values

Here's some code to make up a pandas dataframe with 2 columns one called data and the other called hours . The data column is random int from -150 to 250. And the hours column is random floats from.5 to 15.5.

import random
import numpy as np
import pandas as pd

data = np.random.randint(-150,250,size=200)
df = pd.DataFrame(data, columns=['Data'])

#generate random floats for df2
randomFloatList = []
# Set a length of the list to length of pandas df1
for i in range(0, len(df)):
    # any random float between 5.50 to 50.50
    x = round(random.uniform(0.50, 15.50), 2)

df2 = pd.DataFrame(randomFloatList,columns=['hours'])

combined = df.join(df2)


     Data  hours
0      93   9.66
1      85  14.76
2     -82  12.55
3     -44   2.40
4      -1  13.86

Can Pandas rank function reorganize a dataframe based on the highest values in one column ( data ) and lowest values in a different column ( hours ) with rows in the dataset being preserved? Hopefully this makes sense...

If I use print(combined.rank(axis='columns'))

This returns something unwanted, I cant quite figure out if this is possible with the pandas rank or not.

     Data  hours
0     2.0    1.0
1     2.0    1.0
2     1.0    2.0
3     1.0    2.0
4     1.0    2.0

Any tips greatly appreciated.

combined['hours_rank'] = combined['hours'].rank(ascending=1)
combined['Data_rank'] = combined['Data'].rank(ascending=1)

In Data
Data hours
0 174 0.89
1 226 7.41
2 -90 13.79
3 148 3.02

Out Data
Data hours hours_rank Data_rank
0 174 0.89 1.0 3.0
1 226 7.41 3.0 4.0
2 -90 13.79 4.0 1.0
3 148 3.02 2.0 2.0
enter image description here

Because Pandas employ an internal aligning mechanism based on index, your problem can be tricky to solve. But, by using a vanilla python list you can do your sorting job and then assign also the corresponding ranks in your dataframe. If I have understand correctly your issue. Here is a code that does the job:

combined['Data_rank'] = combined['Data'].rank()
combined['hours_rank'] = combined['hours'].rank()


     Data   hours    Data_rank     hours_rank
0     242   0.61      199.5         1.0
1     242   0.71      199.5         2.0
2     241   0.82      198.0         3.0
3     238   0.88      197.0         4.0
4     236   1.01      196.0         5.0
..    ...    ...        ...         ...
195  -144  15.21        5.0       196.0
196  -145  15.22        4.0       197.0
197  -150  15.24        2.0       198.0
198  -150  15.34        2.0       199.0
199  -150  15.35        2.0       200.0

[200 rows x 4 columns]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM