Here's some code to make up a pandas dataframe with 2 columns one called data
and the other called hours
. The data
column is random int from -150 to 250. And the hours
column is random floats from.5 to 15.5.
import random
import numpy as np
import pandas as pd
data = np.random.randint(-150,250,size=200)
df = pd.DataFrame(data, columns=['Data'])
#generate random floats for df2
randomFloatList = []
# Set a length of the list to length of pandas df1
for i in range(0, len(df)):
# any random float between 5.50 to 50.50
x = round(random.uniform(0.50, 15.50), 2)
randomFloatList.append(x)
df2 = pd.DataFrame(randomFloatList,columns=['hours'])
combined = df.join(df2)
print(combined)
Returns:
Data hours
0 93 9.66
1 85 14.76
2 -82 12.55
3 -44 2.40
4 -1 13.86
Can Pandas rank function reorganize a dataframe based on the highest values in one column ( data
) and lowest values in a different column ( hours
) with rows in the dataset being preserved? Hopefully this makes sense...
If I use print(combined.rank(axis='columns'))
This returns something unwanted, I cant quite figure out if this is possible with the pandas rank or not.
Data hours
0 2.0 1.0
1 2.0 1.0
2 1.0 2.0
3 1.0 2.0
4 1.0 2.0
Any tips greatly appreciated.
combined['hours_rank'] = combined['hours'].rank(ascending=1)
combined['Data_rank'] = combined['Data'].rank(ascending=1)
In Data
Data hours
0 174 0.89
1 226 7.41
2 -90 13.79
3 148 3.02
Out Data
Data hours hours_rank Data_rank
0 174 0.89 1.0 3.0
1 226 7.41 3.0 4.0
2 -90 13.79 4.0 1.0
3 148 3.02 2.0 2.0
enter image description here
Because Pandas employ an internal aligning mechanism
based on index, your problem can be tricky to solve. But, by using a vanilla python list
you can do your sorting job and then assign also the corresponding ranks in your dataframe. If I have understand correctly your issue. Here is a code that does the job:
combined['Data']=combined['Data'].sort_values(ascending=False).tolist()
combined['hours']=combined['hours'].sort_values().tolist()
combined['Data_rank'] = combined['Data'].rank()
combined['hours_rank'] = combined['hours'].rank()
Output:
Data hours Data_rank hours_rank
0 242 0.61 199.5 1.0
1 242 0.71 199.5 2.0
2 241 0.82 198.0 3.0
3 238 0.88 197.0 4.0
4 236 1.01 196.0 5.0
.. ... ... ... ...
195 -144 15.21 5.0 196.0
196 -145 15.22 4.0 197.0
197 -150 15.24 2.0 198.0
198 -150 15.34 2.0 199.0
199 -150 15.35 2.0 200.0
[200 rows x 4 columns]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.