简体   繁体   中英

Vectorizing splitting string values in a column based on strings in another column in a pandas DataFrame

Hey so I have this problem of splitting strings in a column based on another column's values, I managed to figure out a solution with df.apply but I wonder if there are any str.split related ways to vectorize this implementation?

            name field
0            b_b     b
1            b_c     b
2            b_d     b
3        a_paris     a
4  a_tokyo_ghoul     a
5           a_xx     a

I would like to convert the 'name' column into

0              b
1              c
2              d
3          paris
4    tokyo_ghoul
5             xx

and my current implementation is

df.apply(lambda row: row['name'].split(f"{row['field']}_")[-1], axis=1)

Assuming you want to extract the field after the first _ and validate that the initial string is the same as df['field'] :

df2 = df['name'].str.split('_', n=1, expand=True)

df['name2'] = df2[1].where(df2[0].eq(df['field']))

output:

            name field        name2
0            b_b     b            b
1            b_c     b            c
2            b_d     b            d
3        a_paris     a        paris
4  a_tokyo_ghoul     a  tokyo_ghoul
5           a_xx     a           xx

If all the name contains ' ' in your name column, then you can split the column on basis of ' ', this way you got prefix and postfix of data of the name column. After call the postfix data that come-after '_', this way you get desired output.

Edit*:- You can also use other columns as a base that contains the prefix data of name field and put filtter that for field str match name column then split rest part

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM