简体   繁体   中英

Pandas Dataframe filtering based on a list

I am working on the decathlon dataset in pandas dataframe. I calculated outliers for each year in the following code. However, I am having a problem filtering the calculated values from pandas.

Screenshot of the dataset file(transposed): Dataset

Screenshot of the boxplot of outliers: Boxplot

good = []
bad = []

for item in df['yearEvent'].unique(): 
    value=df[df['yearEvent']==item].Totalpoints
    a=value.quantile(0.25)
    b=value.quantile(0.75)
    c=b-a        
    good.append(b+1.5*c)
    bad.append(a-1.5*c)    

Basically, I want to create a new column which has good or bad as values depending on if Totalpoints in the dataframe. If Totalpoints less than bad value, the new columns row should be bad. The trick is good and bad values changes over years.

Your question is a big vague, and providing a screenshot of the dataset isn't the best idea. It would be better to include it as text, or link to the actual data.

However, if I understand your question correctly, you want to categorize athletes as good if they are in the 0.25 quantile for that year. You can simply do that with:

df = pd.DataFrame(dict(
  year=[1990, 1990, 1990, 1991, 1991, 1991],
  points=[1234, 1243, 1423, 4123, 4132, 4312],
))
good = []
for year in df.year.unique():
  year_df = df[df.year == year]
  cutoff = year_df.points.quantile(0.25)
  good.extend(year_df.points > cutoff)
df['good'] = good

This will result in this data frame:

   year  points   good
0  1990    1234  False
1  1990    1243   True
2  1990    1423   True
3  1991    4123  False
4  1991    4132   True
5  1991    4312   True

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM