Pandas Dataframe filtering based on a list

Question

I am working on the decathlon dataset in pandas dataframe. I calculated outliers for each year in the following code. However, I am having a problem filtering the calculated values from pandas.

Screenshot of the dataset file(transposed): Dataset

Screenshot of the boxplot of outliers: Boxplot

good = []
bad = []

for item in df['yearEvent'].unique(): 
    value=df[df['yearEvent']==item].Totalpoints
    a=value.quantile(0.25)
    b=value.quantile(0.75)
    c=b-a        
    good.append(b+1.5*c)
    bad.append(a-1.5*c)

Basically, I want to create a new column which has good or bad as values depending on if Totalpoints in the dataframe. If Totalpoints less than bad value, the new columns row should be bad. The trick is good and bad values changes over years.

Answer 1

Your question is a big vague, and providing a screenshot of the dataset isn't the best idea. It would be better to include it as text, or link to the actual data.

However, if I understand your question correctly, you want to categorize athletes as good if they are in the 0.25 quantile for that year. You can simply do that with:

df = pd.DataFrame(dict(
  year=[1990, 1990, 1990, 1991, 1991, 1991],
  points=[1234, 1243, 1423, 4123, 4132, 4312],
))
good = []
for year in df.year.unique():
  year_df = df[df.year == year]
  cutoff = year_df.points.quantile(0.25)
  good.extend(year_df.points > cutoff)
df['good'] = good

This will result in this data frame:

   year  points   good
0  1990    1234  False
1  1990    1243   True
2  1990    1423   True
3  1991    4123  False
4  1991    4132   True
5  1991    4312   True

Pandas Dataframe filtering based on a list

Question

1 answers

solution1
0 2019-12-12 22:51:15

Pandas Dataframe filtering based on a list

Question

1 answers

solution1 0 2019-12-12 22:51:15

solution1
0 2019-12-12 22:51:15