Python: Replace string in one column from list in other column

Question

I need some help please.

I have a dataframe with multiple columns where 2 are:

Content_Clean = Column filled with Content - String

Removals: list of strings to be removed from Content_Clean Column

Problem: I am trying to replace words in Content_Clean with spaces if in Removals Column: Example Image

Example:

Content Clean: 'Johnny and Mary went to the store'

Removals: ['Johnny','Mary']

Output: 'and went to the store'

Example Code:

for i in data_eng['Removals']:
    for u in i:
        data_eng['Content_Clean_II'] = data_eng['Content_Clean'].str.replace(u,' ')

This does not work as Removals columns contain lists per row.

Another Example:

data_eng['Content_Clean_II'] = data_eng['Content_Clean'].apply(lambda x: re.sub(data_eng.loc[data_eng['Content_Clean'] == x, 'Removals'].values[0], '', x))

Does not work as this code is only looking for one string.

The problem is that Removals column is a list that I want use to remove/ replace with spaces in the Content_Clean column on a per row basis.

The example image link might help

Answer 1

Here you go. This worked on my test data. Let me know if it works for you

def repl(row):
  for word in row['Removals']:
    row['Content_Clean'] = row['Content_Clean'].replace(word, '')
  
  return row

data_eng = data_eng.apply(repl, axis=1)

Answer 2

You can call the str.replace(old, new) method to remove unwanted words from a string. Here is one small example I have done.

a_string = "I do not like to eat apples and watermelons"

stripped_string = a_string.replace(" do not", "")

print(stripped_string)

This will remove "do not" from the sentence

Python: Replace string in one column from list in other column

Question

2 answers

solution1
0 ACCPTED 2022-06-06 11:19:35

solution2
-1 2022-06-06 10:02:03

Python: Replace string in one column from list in other column

Question

2 answers

solution1 0 ACCPTED 2022-06-06 11:19:35

solution2 -1 2022-06-06 10:02:03

solution1
0 ACCPTED 2022-06-06 11:19:35

solution2
-1 2022-06-06 10:02:03