简体   繁体   中英

str.contains to create new column in pandas dataframe

I am exploring the titanic data set and want to create a column with similar names. For example, any name that contains "Charles" will show as "ch",as I want to do some group by using those later on. I created a function using the following code:

def cont(Name):
    for a in Name:
        if a.str.contains('Charles'):
            return('Ch')

and then applied using this:

titanic['namest']=titanic['Name'].apply(cont,axis=1)

Error: 'str' object has no attribute 'str'

notebook_link

您可以使用向量化的str.contains返回布尔掩码,并将满足条件的所有行设置为所需的值,而不是使用循环或apply

titanic.loc[titanic['Name'].str.contains('Charles'), 'namest'] = 'Ch'

apply will call the cont function and pass it a value from the Name column, a value by value. That means that the Name variable inside the cont function is already a string.

Also note that every function that is being used by apply must return something, so in case the name doesn't contain 'Charles' the name itself is returned.

Also 2, Series apply method doesn't have an axis keyword argument.

def cont(Name):
    if 'Charles' in Name:
        return 'Ch'
    return Name

You don't even need to define it:

titanic['namest'] = titanic['Name'].apply(lambda x: 'Ch' if 'Charles' in x else x)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM