I am removing all exact matches in my list_of_strings from a pandas dataframe column. I don't really understand the re.escape that is being used, however. I want to make sure this code will remove ALL matches no matter what type of character is present in my list_of_strings variable and in my dataframe column. What does re.escape really do? I've read the documentation but am newer to regex and would appreciate a more layman's terms explanation.
import pandas as pd
import re
df = pd.DataFrame(
{
"ID": [1, 2],
"name": [
"I have a %$$#form with @#$%$#%@/}\p special characters!!!!",
"can we: remove the EXACT matches !#$#%$^%$&^(*&*)(*&)_&#",
],
}
)
list_of_strings = ['can we: remove', 'with @#$%$#%@/}\p special characters!!!!','EXACT']
p = re.compile('|'.join(map(re.escape, list_of_strings)))
df['cleaned_text'] = [p.sub(' ', text) for text in df['name']]
In regex, some symbol have a meaning and trigger some functionality, when you want to explicitly match the symbol without triggering its function, you escape it.
Now re.escape is simply a method to avoid escaping a list of character manually.
instead of escaping (adding \
) manually like this:
"\$\[\]\^"
You can simply do like the function you write.
pattern = "|".join(map(re.escape, "[$[]^")) "\$|\[|\]|\^"
To see what do your code, simply print p.
list_of_strings = ['can we: remove', 'with @#$%$#%@/}\p special characters!!!!','EXACT']
p = '|'.join(map(re.escape, list_of_strings))
print(p)
As you will see all characters have been escaped \
.
Use for loop:
for i in list_of_strings:
df['name'] = df['name'].str.replace(i, '', regex=False)
print(df)
ID name
0 1 I have a %$$#form
1 2 the matches !#$#%$^%$&^(*&*)(*&)_&#
Maybe there is an easier way:
df.name.str.replace(list_of_strings[0],'', regex=False)\
.str.replace(list_of_strings[1],'', regex=False)\
.str.replace(list_of_strings[2],'', regex=False)
Output:
0 I have a %$$#form
1 the matches !#$#%$^%$&^(*&*)(*&)_&#
Name: name, dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.