So I want to separate group of punctuation from the text with spaces.
my_text = "!where??and!!or$$then:)"
I want to have a ! where ?? and !! or $$ then :)
! where ?? and !! or $$ then :)
! where ?? and !! or $$ then :)
as a result.
I wanted something like in Javascript, where you can use $1
to get your matching string. What I have tried so far:
my_matches = re.findall('[!"\$%&\'()*+,\-.\/:;=#@?\[\\\]^_`{|}~]*', my_text)
Here my_matches is empty so I had to delete \\\\\\
from the expression:
my_matches = re.findall('[!"\$%&\'()*+,\-.\/:;=#@?\^_`{|}~]*', my_text)
I have this result:
['!', '', '', '', '', '', '??', '', '', '', '!!', '', '', '$$', '', '', '', '',
':)', '']
So I delete all the redundant entry like this:
my_matches_distinct = list(set(my_matches))
And I have a better result:
['', '??', ':)', '$$', '!', '!!']
Then I replace every match by himself and space:
for match in my_matches:
if match != '':
my_text = re.sub(match, ' ' + match + ' ', my_text)
And of course it's not working ! I tried to cast the match as a string, but it's not working either... When I try to put directly the string to replace it's working though.
But I think I'm not doing it right, because I will have problems with '!' et '!!' right?
Thanks :)
It is recommended to use raw string literals when defining a regex pattern. Besides, do not escape arbitrary symbols inside a character class, only \\
must be always escaped, and others can be placed so that they do not need escaping. Also, your regex matches an empty string - and it does - due to *
. Replace with +
quantifier. Besides, if you want to remove these symbols from your string, use re.sub
directly.
import re
my_text = "!where??and!!or$$then:)"
print(re.sub(r'[]!"$%&\'()*+,./:;=#@?[\\^_`{|}~-]+', r' \g<0> ', my_text).strip())
See the Python demo
Details : The []!"$%&\\'()*+,./:;=#@?[\\^_`{|}~-]+
matches any 1+ symbols from the set (note that only \\
is escaped here since -
is used at the end, and ]
at the start of the class), and the replacement inserts a space + the whole match (the \\g<0>
is the backreference to the whole match) and a space. And .strip()
will remove leading/trailing whitespace after the regex finishes processing the string.
Use sub()
method in re
library. You can do this as follows,
import re
str = '!where??and!!or$$then:)'
print re.sub(r'([!@#%\^&\*\(\):;"\',\./\\]+)', r' \1 ', str).strip()
I hope this code should solve your problem. If you are obvious with regex
then the regex part is not a big deal. Just it is to use the right function.
Hope this helps! Please comment if you have any queries. :)
References:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.