简体   繁体   中英

Python - How do I separate punctuation from words by white space leaving only one space between the punctuation and the word?

I have the following string:

input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"

All of the punctuation should be separated from the words EXCEPT for "/", " ' ", "-", "+" and "$".

So the output should be:

"I love programming with Python-3 . 3 ! Do you ? It's great . . . I give it a 10/10. It's free-to-use , no $$$ involved !"

I used the following code:

for x in string.punctuation:
    if x == "/":
        continue
    if x == "'":
        continue
    if x == "-":
        continue
    if x == "+":
        continue
    if x == "$":
        continue
    input = input.replace(x," %s " % x)

I get the following output:

I love programming with Python-3 . 3 !  Do you ?  It's great .  .  .  I give it a 10/10 .  It's free-to-use ,  no $$$ involved ! 

It works, but the problem is that it sometimes leaves TWO spaces between the punctuation and the word, such as between the first exclamation mark in the sentence and the word "Do". This is because there is already a space between them.

This problem would also occur with: input = "Hello. (hi)". The output would be:

" Hello .  ( hi ) "

Note the two spaces before the open bracket.

I need the output with only ONE space between any punctuation and the words, except for the 5 punctuations mentioned above, which are not separated from words. How can I fix this? Or, is there a better way to do this using regex?

Thanks in advance.

Looks like re can do it for you...

>>> import re
>>> re.sub(r"([\w/'+$\s-]+|[^\w/'+$\s-]+)\s*", r"\1 ", input)
"I love programming with Python-3 . 3 ! Do you ? It's great ... I give it a 10/10 . It's free-    to-use , no $$$ involved ! "

and

>>> re.sub(r"([\w/'+$\s-]+|[^\w/'+$\s-]+)\s*", r"\1 ", "Hello. (hi)")
'Hello . ( hi ) '

If the trailing space is a problem, .rtrim(theresult, ' ') should fix it for you:-)

Can i try this way:

>>> import string
>>> input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"
>>> ls = []
>>> for x in input:
...     if x in string.punctuation:
...         ls.append(' %s' % x)
...     else:
...         ls.append(x)
...
>>> ''.join(ls)
"I love programming with Python -3 .3 ! Do you ? It 's great . . . I give it a 10 /10 . It 's free -to -use , no  $ $ $ involved !"
>>>

Unable to comment due to lack of reputation, but in this case here

between the first exclamation mark in the sentence and the word "Do"

It looks like there are two spaces because there is already a space between ! and Do

! Do

So, if there is already a space after the punctuation, don't put another space.

Also, there is a similar question here: python regex inserting a space between punctuation and letters

So maybe consider using re ?

It seems to me a negated character class is simpler:

import re

input_string = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"

print re.sub(r"\s?([^\w\s'/\-\+$]+)\s?", r" \1 ", input_string)

Output:

I love programming with Python-3 . 3 ! Do you ? It's great ... I give it a 10/10 . It's free-to-use , no $$$ involved ! 
# Approach 1

import re

sample_input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"

sample_input = re.sub(r"([^\s])([^\w\/'+$\s-])", r'\1 \2', sample_input)
print(re.sub(r"([^\w\/'+$\s-])([^\s])", r'\1 \2', sample_input))

# Approach 2

import string

sample_input = "I love programming with Python-3.3! Do you? It's great... I give it a 10/10. It's free-to-use, no $$$ involved!"

punctuation = string.punctuation.replace('/', '').replace("'", '') \
        .replace('-', '').replace('+', '').replace('$', '')

i = 0

while i < len(sample_input):
    if sample_input[i] not in punctuation:
        i += 1
        continue

    if i > 0 and sample_input[i-1] != ' ':
        sample_input = sample_input[:i] + ' ' + sample_input[i:]
        i += 1

    if i + 1 < len(sample_input) and sample_input[i+1] != ' ':
        sample_input = sample_input[:i+1] + ' ' + sample_input[i+1:]
        i += 1

    i += 1

print(sample_input)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM