I am working on the decathlon dataset in pandas dataframe. I calculated outliers for each year in the following code. However, I am having a problem filtering the calculated values from pandas.
Screenshot of the dataset file(transposed): Dataset
Screenshot of the boxplot of outliers: Boxplot
good = []
bad = []
for item in df['yearEvent'].unique():
value=df[df['yearEvent']==item].Totalpoints
a=value.quantile(0.25)
b=value.quantile(0.75)
c=b-a
good.append(b+1.5*c)
bad.append(a-1.5*c)
Basically, I want to create a new column which has good or bad as values depending on if Totalpoints in the dataframe. If Totalpoints less than bad value, the new columns row should be bad. The trick is good and bad values changes over years.
Your question is a big vague, and providing a screenshot of the dataset isn't the best idea. It would be better to include it as text, or link to the actual data.
However, if I understand your question correctly, you want to categorize athletes as good if they are in the 0.25 quantile for that year. You can simply do that with:
df = pd.DataFrame(dict(
year=[1990, 1990, 1990, 1991, 1991, 1991],
points=[1234, 1243, 1423, 4123, 4132, 4312],
))
good = []
for year in df.year.unique():
year_df = df[df.year == year]
cutoff = year_df.points.quantile(0.25)
good.extend(year_df.points > cutoff)
df['good'] = good
This will result in this data frame:
year points good
0 1990 1234 False
1 1990 1243 True
2 1990 1423 True
3 1991 4123 False
4 1991 4132 True
5 1991 4312 True
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.