简体   繁体   中英

re.sub put space between punctuation and word where word starts or ends with punctuation

I am trying to put a space between the punctuation and word where the word starts or ends with the punctuation, but not where punctuation is in between. From what I've found, the closest I've been able to come up with is this:

print(re.sub(r'([.,!?()\-])([^\s]+)', r'\g<1> \g<2>', '!hello 77e!  -  -world-view- world-view'))
print(re.sub(r'([^\s]+)([.,!?()\-])', r'\g<1> \g<2>', '!hello 77e!  -  -world-view- world-view'))

The output I get is:

! hello 77e!  -  - world-view- world- view
!hello 77e !  -  -world-view - world -view

Which is close, but I want:

! hello 77e!  -  - world-view- world-view
!hello 77e !  -  -world-view - world-view

In the desired output, "world-view" stays as "world-view"

I plan on using both lines of code on the string so by the end I get something like:

! hello 77e !  -  - world-view - world-view

If there is a way to do this in one line, that would be great, but if not, then can somebody show me what to adjust for these two lines?

You could change it to

import re
print(re.sub(r'(\w) - (\w)', r'\g<1>-\g<2>', 
             re.sub(r'([!?.-])', r' \g<1> ', '!hello 77e!  -  -world-view- world-view')) )

Output:

! hello 77e !    -    - world-view -  world-view

It essentially puts spaces around any !?.- and then removes them from \\w - \\w again.

You get some extra spaces around existing ' - ' not sure if that is a deal breaker.


@WiktorStribizew 's solution is superior I suggest to take his instead - it does exactly what you wanted - as far as I was able to see in his regex101-link.

You may use

s = re.sub(r'(?<=(?<![^\W\d_])[.,!?()-])(?=[^\W\d_])|(?<=[^\W\d_])(?=[.,!?()-](?![^\W\d_]))', ' ', s)

See the regex demo .

Details

  • (?<=(?<![^\\W\\d_])[.,!?()-])(?=[^\\W\\d_]) - a location between any of the punctuation symbols in the [.,!?()-] set that are not immediately preceded with a letter ( [^\\W\\d_] ) and a letter
  • | - or
  • (?<=[^\\W\\d_])(?=[.,!?()-](?![^\\W\\d_])) - a location between a letter and any punctuation you defined in [.,!?()-] not followed with a letter.

The matches (empty strings) are replaced with a space (so, a space is just inserted into the matched locations).

Note it is OK to have nested lookaheads provided their patterns remain fixed-width.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM