繁体   English   中英

如何在 pandas dataframe 列的自定义 function 中检查特定字符串?

[英]How to check for an specific string within a customized function for a pandas dataframe column?

假设我得到了下一个 pandas dataframe 列:

import pandas as pd
import string

d = {'Name': ['Braund, Mr. Owen Harris','Cumings, Mrs. John Bradley (Florence Briggs Thayer)', 'Heikkinen, Miss.Laina']}

raw_df = pd.DataFrame(data=d)

我正在尝试解码此列,如果在字符串行中找到Mrs则返回is_married ,否则返回not_married

    def is_married_female(raw_df):
         raw_df['Name'].str.contains('Mrs').any():
            return 'married'
         else:
            return 'not_married'
        
    raw_df['is_married_female']=raw_df.apply(lambda x: is_married_female(x["Name"]), axis=1)

但是我不断收到下一个错误:

TypeError:字符串索引必须是整数

预期的 Output 可能如下所示:

raw_df['is_married_female']

# not_married
# married
# not_married

我在 function 中缺少什么?

问题:

x['Name']是 python str不是系列或 DataFrame。

在 function is_married_female内部,变量raw_df是一个字符串,如:

“布劳德,欧文·哈里斯先生”

raw_df['Name']运行时,这相当于:

print('Braund, Mr. Owen Harris'['Name']) # TypeError: string indices must be integers

正在尝试通过索引访问字符串,例如

print('Braund, Mr. Owen Harris'[0]) # B

使固定:

  1. 将 function 参数视为正确的类型 ( str ) 并.
  2. raw_df重命名为name以避免将来混淆
import pandas as pd

d = {'Name': ['Braund, Mr. Owen Harris',
              'Cumings, Mrs. John Bradley (Florence Briggs Thayer)',
              'Heikkinen, Miss.Laina']}

raw_df = pd.DataFrame(data=d)


def is_married_female(name):
    if 'Mrs' in name:
        return 'married'
    else:
        return 'not_married'


raw_df['is_married_female'] = raw_df.apply(
    lambda x: is_married_female(x["Name"]),
    axis=1
)

print(raw_df.to_string())

然而,更高效的解决方案是使用np.where

import numpy as np
import pandas as pd

d = {'Name': ['Braund, Mr. Owen Harris',
              'Cumings, Mrs. John Bradley (Florence Briggs Thayer)',
              'Heikkinen, Miss.Laina']}

raw_df = pd.DataFrame(data=d)

raw_df['is_married_female'] = np.where(raw_df['Name'].str.contains('Mrs'),
                                       'married', 'not_married')

print(raw_df.to_string())

两者的 Output 是:

                                                  Name is_married_female
0                              Braund, Mr. Owen Harris       not_married
1  Cumings, Mrs. John Bradley (Florence Briggs Thayer)           married
2                                Heikkinen, Miss.Laina       not_married

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM