I have the following strings
1. !abc.com
2. abc.com!
3. Hey there this is .abc.com!. This is amazing
I am trying to find a way such that I can identify special characters before or after the end of the URL in the string and add in a space only if the special character is at the beginning or end of the string, eg
!abc.com -> ! abc.com
abc.com! -> abc.com !
Hey there this is .abc.com!. This is amazing -> Hey there this is . abc.com !.This is amazing
What would be a good way to handle this scenario?
I tried the following regex: re.match('^.*$',w)
. But this seems very generic. Any advice or suggestion would be greatly appreciated.
The trick is to:
This should work:
import re
import string
# Your input texts + one extreme case with multiple URLs
texts = [
"!abc.com",
"abc.com!",
"Hey there this is .abc.com!. This is amazing",
"Hey there this is .abc.com!. This is amazing... Hey there this is .abc.com!. This is amazing",
]
# From (match any URL): https://www.regextester.com/93652
pattern = r"(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?"
# Loop the texts
for text in texts:
# Start building the new text
new_text = ""
position = 0
# Loop over the matches
for match in re.finditer(pattern, text):
# Extract the start and end positions of the match (URL)
start, end = match.span()
# Add until the start of this match
new_text += text[position:start]
# Check the character just before the match
if start > 0:
if text[start - 1] in string.punctuation:
# Add a space
new_text += " "
# Add the actual match
new_text += text[start:end]
# Check the character after the match
if end < len(text):
if text[end] in string.punctuation:
# Add a space
new_text += " "
# Move to the end of the match
position = end
# Add the end of the original string
new_text += text[position:]
# Show the new string
print(new_text)
Output:
! abc.com
abc.com !
Hey there this is . abc.com !. This is amazing
Hey there this is . abc.com !. This is amazing... Hey there this is . abc.com !. This is amazing
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.