简体   繁体   English

检查元素是否在列表中,然后在满足条件时写入 Pandas dataframe 中的新列

[英]Check if element is in list, then write to new column in Pandas dataframe if conditions met

Looking at a pandas dataframe containing information on all olympic athletes for past 150 years (Name, Weight, Country, Sport, etc).查看 pandas dataframe 包含过去 150 年所有奥运会运动员的信息(姓名、体重、国家、运动等)。 Available at https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results#athlete_events.csv .可在https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results#athlete_events.csv 获得

Preview of dataframe dataframe 预览图

Attempting to make a for loop that iterates through df rows, checks the value stored in the 'Sport' column against several lists and then adds a column to the df with a parent category within the same row.尝试创建一个遍历 df 行的 for 循环,根据多个列表检查存储在“Sport”列中的值,然后将一列添加到 df 中,父类别在同一行中。 Code so far:到目前为止的代码:

aquatic_sports = ['Swimming','Diving','Synchronized Swimming','Water Polo']
track_sports = ['Athletics','Modern Pentathlon','Triathlon','Biathlon','Cycling']
team_sports = ['Softball','Basketball','Volleyball','Beach Volleyball','Handball','Rugby','Lacrosse']
gymnastic_sports = ['Gymnastics','Rhytmic Gymnastics','Trampolining']
fitness_sports = ['Weightlifting']
combat_sports = ['Boxing','Judo','Wrestling','Taekwondo']
winter_sports = ['Short Track Speed Skating','Ski Jumping','Cross Country Skiing','Luge','Bobsleigh','Alpine Skiing','Curling','Snowboarding','Ice Hocky','Hockey','Speed Skating']

for index, row in df.iterrows():

    if df.iloc[0,11] in aquatic_sports:

        df['Sport Category'] = 'Aquatic Sport'

    elif df.iloc[0,11] in track_sports:

        df['Sport Category'] = 'Track Sport'

    elif df.iloc[0,11] in gymnastic_sports:

        df['Sport Category'] = 'Gymnastic Sport'

    elif df.iloc[0,11] in fitness_sports:

        df['Sport Category'] = 'Fitness Sport'

    elif df.iloc[0,11] in combat_sports:

        df['Sport Category'] = 'Combat Sport'

    elif df.iloc[0,11] in winter_sports:

        df['Sport Category'] = 'Winter Sport'

No errors thrown but unfortunately all values in the new column are the same.没有抛出错误,但不幸的是新列中的所有值都是相同的。 Unsure how to pass the current index to ensure each iterations returns a unique, correct value.不确定如何传递当前索引以确保每次迭代都返回唯一的正确值。

This is a map , though we need to create the appropriate dictionary.这是一个map ,尽管我们需要创建适当的字典。 Since you've already created the lists in separate variables, we can instead store them in a dictionary, with the label you want as the key:由于您已经在单独的变量中创建了列表,我们可以将它们存储在字典中,使用您想要的 label 作为键:

d = {
    'Aquatic Sport': ['Swimming', 'Diving','Synchronized Swimming', 'Water Polo'],
    'Track Sports': ['Athletics','Modern Pentathlon', 'Triathlon', 'Biathlon', 'Cycling'],
    'Team Sport': ['Softball', 'Basketball', 'Volleyball', 'Beach Volleyball',
                   'Handball', 'Rugby', 'Lacrosse'],
    'Gymnastic Sport': ['Gymnastics', 'Rhytmic Gymnastics', 'Trampolining'],
    'Fitness Sport': ['Weightlifting'],
    'Combat Sport': ['Boxing','Judo', 'Wrestling', 'Taekwondo'],
    'Winter Sport': ['Short Track Speed Skating', 'Ski Jumping', 'Cross Country Skiing',
                     'Luge','Bobsleigh', 'Alpine Skiing', 'Curling', 'Snowboarding',
                     'Ice Hockey', 'Hockey', 'Speed Skating']
    }

# unpacks lists so it's {sport: category_label}
d = {sport: cat for cat,l in d.items() for sport in l}
df['Sport Category'] = df['Sport'].map(d)

The reason for the same values in the column is because whenever you do df['Sport Category'] = <something> you are setting the entire column to that value.列中相同值的原因是因为每当您执行df['Sport Category'] = <something>时,您都将整个列设置为该值。 In your code, essentially the column gets updated multiple times, but retains the last value being set.在您的代码中,基本上该列会多次更新,但会保留最后设置的值。

While setting the value, you can try df.ix[0, 'Sport Category'] = <something> to see if the setting works.在设置值时,您可以尝试df.ix[0, 'Sport Category'] = <something>来查看设置是否有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM