with Pandas removing substring of the string in a column, when another column contains this substring

Question

From this Pandas data frame:

df = pd.DataFrame({'a': ['foo_abc', 'bar_def', 'ghi'], 'b': ['foo', 'bar', 'yah']})

    a               b
0   foo_abc         foo
1   bar_def         bar
2   ghi             yah

I want to, probably with regex, remove the string in b column from string of a column to produce

     a             b     c
0   foo_abc      foo    abc
1   bar_def      bar    def
2   ghi          yah    ghi

How could I do this with Pandas?

Answer 1

Use replace with strip in list comprehension:

df['c'] = [a.replace(b, '').strip('_') for a, b in zip(df['a'], df['b'])]
print (df)
         a    b    c
0  foo_abc  foo  abc
1  bar_def  bar  def
2      ghi  yah  ghi

Solution with re.sub :

df['c'] = [re.sub('^({}_)'.format(b), '', a) for a, b in zip(df['a'], df['b'])]

with Pandas removing substring of the string in a column, when another column contains this substring

Question

1 answers

solution1
2 ACCPTED 2018-10-18 11:05:12

with Pandas removing substring of the string in a column, when another column contains this substring

Question

1 answers

solution1 2 ACCPTED 2018-10-18 11:05:12

solution1
2 ACCPTED 2018-10-18 11:05:12