简体   繁体   中英

Python: Replace string in one column from list in other column

I need some help please.

I have a dataframe with multiple columns where 2 are:

Content_Clean = Column filled with Content - String

Removals: list of strings to be removed from Content_Clean Column

Problem: I am trying to replace words in Content_Clean with spaces if in Removals Column: Example Image

Example:

Content Clean: 'Johnny and Mary went to the store'

Removals: ['Johnny','Mary']

Output: 'and went to the store'

Example Code:

for i in data_eng['Removals']:
    for u in i:
        data_eng['Content_Clean_II'] = data_eng['Content_Clean'].str.replace(u,' ')

This does not work as Removals columns contain lists per row.

Another Example:

data_eng['Content_Clean_II'] = data_eng['Content_Clean'].apply(lambda x: re.sub(data_eng.loc[data_eng['Content_Clean'] == x, 'Removals'].values[0], '', x)) 

Does not work as this code is only looking for one string.

The problem is that Removals column is a list that I want use to remove/ replace with spaces in the Content_Clean column on a per row basis.

The example image link might help

Here you go. This worked on my test data. Let me know if it works for you

def repl(row):
  for word in row['Removals']:
    row['Content_Clean'] = row['Content_Clean'].replace(word, '')
  
  return row

data_eng = data_eng.apply(repl, axis=1)

You can call the str.replace(old, new) method to remove unwanted words from a string. Here is one small example I have done.

a_string = "I do not like to eat apples and watermelons"

stripped_string = a_string.replace(" do not", "")

print(stripped_string)

This will remove "do not" from the sentence

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM