简体   繁体   中英

Pandas scoring system: sort_values

So my task is pretty simple. We have a.CSV file with the results of the decathlon competition. They need to be changed into tasks, ranked and assigned places. Everything works fine apart from one line:

modified_data.sort_values(by=["Total points"])

Why doesn't it sort the result for me?

My work below:

import pandas as pd
import numpy as np

# Modification of CSV file data by adding header names and splitting data
data = pd.read_csv("static/data/Decathlon.csv", delimiter=';', header=None)
data = data.assign(Total_points=0)
data = data.assign(Ranking=0)
header_list = ['Player', '100 metres', 'Long jump', 'Short put', 'High jump', '400 metres', '110 metres hurdles',
               'Discus throw', 'Pole vault', 'Javelin throw', '1500 metres', 'Total points', 'Ranking']
data.to_csv("static/data/Decathlon_modified.csv", header=header_list, index=False)
modified_data = pd.read_csv("static/data/Decathlon_modified.csv", delimiter=',')
print(modified_data)

# Conversion of CSV data into the necessary units of measurement,
# so that it can be applied to the calculation of the resulting formulas:
temporary_list = []
changed_list = []
for time in modified_data["1500 metres"]:
    temporary_list.append(time.split('.'))
for new_value in temporary_list:
    value = (int(new_value[0]) * 60) + int(new_value[1]) + int(new_value[2]) * 0.01
    changed_list.append(value)
for index, new_value in enumerate(changed_list):
    modified_data.loc[index, "1500 metres"] = new_value

# Results are calculated according to formulas:
# Points = INT(A(B — P)C) for track events (faster time produces a higher score)
modified_data["100 metres"] = round((25.4347 * (18 - modified_data["100 metres"]) ** 1.81))
modified_data["400 metres"] = round(1.53775 * (82 - modified_data["400 metres"]) ** 1.81)
modified_data["110 metres hurdles"] = round(5.74352 * (28.5 - modified_data["110 metres hurdles"]) ** 1.92)
modified_data["1500 metres"] = round(0.03768 * (480 - modified_data["1500 metres"].astype(float)) ** 1.85)

# Points = INT(A(P — B)C) for field events (greater distance or height produces a higher score)
modified_data["Long jump"] = round(0.14354 * ((modified_data["Long jump"] * 100) - 220) ** 1.4)
modified_data["Short put"] = round(51.39 * (modified_data["Short put"] - 1.5) ** 1.05)
modified_data["High jump"] = round(0.8465 * ((modified_data["High jump"] * 100) - 75) ** 1.42)
modified_data["Discus throw"] = round(12.91 * (modified_data["Discus throw"] - 4) ** 1.1)
modified_data["Pole vault"] = round(0.2797 * (modified_data["Pole vault"] * 100 - 100) ** 1.35)
modified_data["Javelin throw"] = round(10.14 * (modified_data["Javelin throw"] - 7) ** 1.08)

# Total calculation and rewriting of each player's result in a common table
total_points = modified_data["100 metres"] + modified_data["Long jump"] + modified_data["Short put"] + \
               modified_data["High jump"] + modified_data["400 metres"] + modified_data["110 metres hurdles"] + \
               modified_data["Discus throw"] + modified_data["Pole vault"] + modified_data["Javelin throw"] \
               + modified_data["1500 metres"]
for index, new_value in enumerate(total_points):
    modified_data.loc[index, "Total points"] = new_value


# Ranking according to collected points
modified_data.reset_index(drop=False)
modified_data.index = np.arange(1, len(modified_data) + 1)

# TODO
modified_data.sort_values(by=["Total points"])
print(modified_data)

modified_data["Ranking"] = modified_data["Total points"]. \
    apply(lambda score:
          modified_data.index[modified_data["Total points"] == score].astype(str)).str.join("-")
print(modified_data)

modified_data.to_json(r'static/data/Decathlon.json')

I tried:

modified_data["Total points"] = modified_data["Total points"].astype(int)
modified_data.sort_values(by=["Total points"])

AND

modified_data["Total points"] = modified_data["Total points"].astype(int)
modified_data.sort_values('Total points')

Also this: ( https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html )

You should use inplace = True or assign the dataframe to the same variable:

modified_data.sort_values(by=["Total points"], inplace=True)
# Or alternatively
modified_data = modified_data.sort_values(by=["Total points"])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM