简体   繁体   中英

How to check for an specific string within a customized function for a pandas dataframe column?

Suppose I got the next pandas dataframe column:

import pandas as pd
import string

d = {'Name': ['Braund, Mr. Owen Harris','Cumings, Mrs. John Bradley (Florence Briggs Thayer)', 'Heikkinen, Miss.Laina']}

raw_df = pd.DataFrame(data=d)

I am trying to decode this column, if Mrs is found in the string row then return is_married , else not_married :

    def is_married_female(raw_df):
         raw_df['Name'].str.contains('Mrs').any():
            return 'married'
         else:
            return 'not_married'
        
    raw_df['is_married_female']=raw_df.apply(lambda x: is_married_female(x["Name"]), axis=1)

However I keep getting the next error:

TypeError: string indices must be integers

Expected Output could look like this:

raw_df['is_married_female']

# not_married
# married
# not_married

What am I missing in the function?

Issue:

x['Name'] is a python str not a Series or a DataFrame.

Inside the function is_married_female the variable raw_df is a string like:

'Braund, Mr. Owen Harris'

When raw_df['Name'] is run this is equivalent to:

print('Braund, Mr. Owen Harris'['Name']) # TypeError: string indices must be integers

Which is trying to access the string via index, like

print('Braund, Mr. Owen Harris'[0]) # B

Fix:

  1. Treat the function parameter as its correct type ( str ) and use in .
  2. Rename raw_df to name to avoid future confusion
import pandas as pd

d = {'Name': ['Braund, Mr. Owen Harris',
              'Cumings, Mrs. John Bradley (Florence Briggs Thayer)',
              'Heikkinen, Miss.Laina']}

raw_df = pd.DataFrame(data=d)


def is_married_female(name):
    if 'Mrs' in name:
        return 'married'
    else:
        return 'not_married'


raw_df['is_married_female'] = raw_df.apply(
    lambda x: is_married_female(x["Name"]),
    axis=1
)

print(raw_df.to_string())

A more performant solution, however, would be to use np.where :

import numpy as np
import pandas as pd

d = {'Name': ['Braund, Mr. Owen Harris',
              'Cumings, Mrs. John Bradley (Florence Briggs Thayer)',
              'Heikkinen, Miss.Laina']}

raw_df = pd.DataFrame(data=d)

raw_df['is_married_female'] = np.where(raw_df['Name'].str.contains('Mrs'),
                                       'married', 'not_married')

print(raw_df.to_string())

Output for both is:

                                                  Name is_married_female
0                              Braund, Mr. Owen Harris       not_married
1  Cumings, Mrs. John Bradley (Florence Briggs Thayer)           married
2                                Heikkinen, Miss.Laina       not_married

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM