简体   繁体   中英

How to replace exact matches from a list of strings with special characters python?

I am removing all exact matches in my list_of_strings from a pandas dataframe column. I don't really understand the re.escape that is being used, however. I want to make sure this code will remove ALL matches no matter what type of character is present in my list_of_strings variable and in my dataframe column. What does re.escape really do? I've read the documentation but am newer to regex and would appreciate a more layman's terms explanation.

import pandas as pd
import re

df = pd.DataFrame(
    {
        "ID": [1, 2],
        "name": [
            "I have a %$$#form with @#$%$#%@/}\p special characters!!!!",
            "can we: remove the EXACT matches !#$#%$^%$&^(*&*)(*&)_&#",
        ],

    }
)


list_of_strings = ['can we: remove', 'with @#$%$#%@/}\p special characters!!!!','EXACT']


p = re.compile('|'.join(map(re.escape, list_of_strings)))
df['cleaned_text'] = [p.sub(' ', text) for text in df['name']] 


In regex, some symbol have a meaning and trigger some functionality, when you want to explicitly match the symbol without triggering its function, you escape it.

Now re.escape is simply a method to avoid escaping a list of character manually.

instead of escaping (adding \ ) manually like this:

"\$\[\]\^"

You can simply do like the function you write.

pattern = "|".join(map(re.escape, "[$[]^")) "\$|\[|\]|\^"

To see what do your code, simply print p.

list_of_strings = ['can we: remove', 'with @#$%$#%@/}\p special characters!!!!','EXACT']


p = '|'.join(map(re.escape, list_of_strings))
print(p)

As you will see all characters have been escaped \ .

Use for loop:

for i in list_of_strings:
    df['name'] = df['name'].str.replace(i, '', regex=False)

print(df)

   ID                                   name
0   1                     I have a %$$#form 
1   2   the  matches !#$#%$^%$&^(*&*)(*&)_&#

Maybe there is an easier way:

df.name.str.replace(list_of_strings[0],'', regex=False)\
       .str.replace(list_of_strings[1],'', regex=False)\
       .str.replace(list_of_strings[2],'', regex=False)

Output:

0                       I have a %$$#form 
1     the  matches !#$#%$^%$&^(*&*)(*&)_&#
Name: name, dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM