简体   繁体   中英

Python Pandas slice column string up to a character based on condition

I tried different ways to slice Panda column strings up to a specific character based on a condition.

For example, consider the Kaggle's Titanic data set where I would like to slice all names in column "Name" up to '(' character in case that they include that character so that there exist no brackets in names and names only include the characters before the beginning of the bracket. So you can think of it as getting rid of the brackets to stay with what was before the bracket.

Sample of my data set

I used this way:

df.loc[df['Name'].str.rfind('(') > -1, 'Name'] = df['Name'].str.slice(0, df['Name'].str.rfind('('))

which essentially when finds a name which contains '(' it proceeds into slicing it, otherwise it returns the name (which does not include the opening bracket. The slicing is all about finding and take the characters before the opening bracket.

My solution does not work since it produces "NaN", how can I fix it?

You can just use pd.Series.str.split to get everything before ' (' .

import pandas as pd

df = pd.DataFrame({'Name': ['Braund, Mr. Owen Harris',
                           'Cummings, Mrs. John Bradley (Florence Briggs)',
                           'Heikkinen, Miss. Laina',
                           'Futrelle, Mrs. Jacques Heath (Lily May Peel)',
                           'Allen, Mr. William Henry']})

df['Name'] = df.Name.str.split(' \(', expand=True)[0]

Output:

print(df)
                           Name
0       Braund, Mr. Owen Harris
1   Cummings, Mrs. John Bradley
2        Heikkinen, Miss. Laina
3  Futrelle, Mrs. Jacques Heath
4      Allen, Mr. William Henry

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM