简体   繁体   中英

with Pandas removing substring of the string in a column, when another column contains this substring

From this Pandas data frame:

df = pd.DataFrame({'a': ['foo_abc', 'bar_def', 'ghi'], 'b': ['foo', 'bar', 'yah']})

    a               b
0   foo_abc         foo
1   bar_def         bar
2   ghi             yah

I want to, probably with regex, remove the string in b column from string of a column to produce

     a             b     c
0   foo_abc      foo    abc
1   bar_def      bar    def
2   ghi          yah    ghi

How could I do this with Pandas?

Use replace with strip in list comprehension:

df['c'] = [a.replace(b, '').strip('_') for a, b in zip(df['a'], df['b'])]
print (df)
         a    b    c
0  foo_abc  foo  abc
1  bar_def  bar  def
2      ghi  yah  ghi

Solution with re.sub :

df['c'] = [re.sub('^({}_)'.format(b), '', a) for a, b in zip(df['a'], df['b'])]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM