Vectorizing splitting string values in a column based on strings in another column in a pandas DataFrame

Question

Hey so I have this problem of splitting strings in a column based on another column's values, I managed to figure out a solution with df.apply but I wonder if there are any str.split related ways to vectorize this implementation?

            name field
0            b_b     b
1            b_c     b
2            b_d     b
3        a_paris     a
4  a_tokyo_ghoul     a
5           a_xx     a

I would like to convert the 'name' column into

0              b
1              c
2              d
3          paris
4    tokyo_ghoul
5             xx

and my current implementation is

df.apply(lambda row: row['name'].split(f"{row['field']}_")[-1], axis=1)

Answer 1

Assuming you want to extract the field after the first _ and validate that the initial string is the same as df['field'] :

df2 = df['name'].str.split('_', n=1, expand=True)

df['name2'] = df2[1].where(df2[0].eq(df['field']))

output:

            name field        name2
0            b_b     b            b
1            b_c     b            c
2            b_d     b            d
3        a_paris     a        paris
4  a_tokyo_ghoul     a  tokyo_ghoul
5           a_xx     a           xx

Answer 2

If all the name contains ' ' in your name column, then you can split the column on basis of ' ', this way you got prefix and postfix of data of the name column. After call the postfix data that come-after '_', this way you get desired output.

Edit*:- You can also use other columns as a base that contains the prefix data of name field and put filtter that for field str match name column then split rest part

Vectorizing splitting string values in a column based on strings in another column in a pandas DataFrame

Question

2 answers

solution1
0 2022-08-08 12:18:25

solution2
0 2022-08-08 12:40:38

Vectorizing splitting string values in a column based on strings in another column in a pandas DataFrame

Question

2 answers

solution1 0 2022-08-08 12:18:25

solution2 0 2022-08-08 12:40:38

solution1
0 2022-08-08 12:18:25

solution2
0 2022-08-08 12:40:38