简体   繁体   中英

Trying to apply a function on a Pandas DataFrame in Python

I'm trying to apply this function to fill the Age column based on Pclass and Sex columns. But I'm unable to do so. How can I make it work?

def fill_age():
    Age = train['Age']
    Pclass = train['Pclass']
    Sex = train['Sex']

    if pd.isnull(Age):
        if Pclass == 1:
            return 34.61
        elif (Pclass == 1) and (Sex == 'male'):
            return 41.2813 
        elif (Pclass == 2) and (Sex == 'female'):
            return 28.72
        elif (Pclass == 2) and (Sex == 'male'):
            return 30.74
        elif (Pclass == 3) and (Sex == 'female'):
            return 21.75 
        elif (Pclass == 3) and (Sex == 'male'):
            return 26.51 
        else:
            pass
    else:
        return Age 


train['Age'] = train['Age'].apply(fill_age(),axis=1)

I'm getting the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

You should consider using parenthesis to separate the arguments (which you already did) and change the boolean operator and for bitwise opeator & to avoid this type of errors. Also, keep in mind that if you want to use apply then you should use a parameter x for the function which will part of a lambda in the apply function:

def fill_age(x):
    Age = x['Age']
    Pclass = x['Pclass']
    Sex = x['Sex']

    if pd.isnull(Age):
        if Pclass == 1:
            return 34.61
        elif (Pclass == 1) & (Sex == 'male'):
            return 41.2813 
        elif (Pclass == 2) & (Sex == 'female'):
            return 28.72
        elif (Pclass == 2) & (Sex == 'male'):
            return 30.74
        elif (Pclass == 3) & (Sex == 'female'):
            return 21.75 
        elif (Pclass == 3) & (Sex == 'male'):
            return 26.51 
        else:
            pass
    else:
        return Age 

Now, using apply with the lambda:

train['Age'] = train['Age'].apply(lambda x: fill_age(x),axis=1)

In a sample dataframe:

df = pd.DataFrame({'Age':[1,np.nan,3,np.nan,5,6],
                   'Pclass':[1,2,3,3,2,1],
                   'Sex':['male','female','male','female','male','female']})

Using the answer provided above:

df['Age'] = df.apply(lambda x: fill_age(x),axis=1)

Output:

    Age  Pclass     Sex
0   1.00       1    male
1  28.72       2  female
2   3.00       3    male
3  21.75       3  female
4   5.00       2    male
5   6.00       1  female

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM