简体   繁体   中英

remove white space between specific characters using regex in python

I am trying to use regex to remove white spaces in the sequence of consecutive '?' and/or '!' in a string. One example is that "what is that ?? ? ? ?? ??? ? ! ! ! ? !" should be changed to "what is that ??????????!!!?!". That is, I want to concatenate all '?' and '!' without space in between. My current code doesn't work out well:

import re
s = "what is that ?? ? ? ?? ??? ? ! ! ! ? !"
s = re.sub("\? +\?", "??", s)
s = re.sub("\? +\!", "?!", s)
s = re.sub("\! +\!", "!!", s)
s = re.sub("\! +\?", "!?", s)

which produces 'what is that ??? ???????!! !?!', where some spaces are obviously not deleted. what is going wrong in my code and how to revise it?

You're simply trying to condense whitespace around the punctuation, yeah? How about something like this:

>>> import re
>>> s = "what is that ?? ? ? ?? ??? ? ! ! ! ? !"
>>> 
>>> re.sub('\s*([!?])\s*', r'\1', s)
'what is that??????????!!!?!'

If you're really interested in why your approach isn't working, it has to do with how regular expressions move through a string. When you write re.sub("\\? +\\?", "??", s) and run it on your string, the engine works through like this:

s = "what is that ?? ? ? ?? ??? ? ! ! ! ? !"
# first match -----^^^
# internally, we have:
s = "what is that ??? ? ?? ??? ? ! ! ! ? !"
# restart scan here -^
# next match here ----^^^
# internally:
s = "what is that ??? ??? ??? ? ! ! ! ? !"
# restart scan here ---^
# next match here ------^^^

And so on. There are ways you can prevent the cursor from advancing as it's checking for a match (check out positive look-ahead).

If you want as @gddc said and sentence pattern is same then then you can try this :

string_="what is that ?? ? ? ?? ??? ? ! ! ! ? !"
string_1=[]
symbols=[]
string_1.append(string_[:string_.index('?')])
symbols.append(string_[string_.index('?'):])
string_1.append("".join(symbols[0].split()))
print("".join(string_1))

output:

what is that ??????????!!!?!

My approach involves splitting the string into two and then handling the problem area with regex (removing spaces) and then joining the pieces back together.

import re s = "what is that ?? ? ? ?? ??? ? ! ! ! ? !" splitted = s.split('that ') # don't forget to add back in 'that' later splitfirst = splitted[0] s = re.sub("\\s+", "", splitted[1]) finalstring = splitfirst+'that '+s print(finalstring) import re s = "what is that ?? ? ? ?? ??? ? ! ! ! ? !" splitted = s.split('that ') # don't forget to add back in 'that' later splitfirst = splitted[0] s = re.sub("\\s+", "", splitted[1]) finalstring = splitfirst+'that '+s print(finalstring) output:

╭─jc@jc15 ~/.projects/tests ╰─$ python3 string-replace-question-marks.py what is that ??????????!!!?!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM