简体   繁体   中英

How to add space before and after specific character using regex in Python?

I have this sentence: transportumum min kalo dari kota|tua | mau ke galeri nasional naik transjakarta jurusan apa ya? transportumum min kalo dari kota|tua | mau ke galeri nasional naik transjakarta jurusan apa ya?

As you see there are two pipe character in that sentence, I like to add space before and after pipe if it in the middle of word without space. eg: kota|tua to kota | tua kota | tua

This is my code so far:

def puncNorm(text):
    pat = re.compile(r"\D([|:])\D")
    text = pat.sub(" \\1 ", text)
    return text

text = "transportumum min kalo dari kota|tua | mau ke galeri nasional naik transjakarta jurusan apa ya?"

text = puncNorm(text)

The result add space to every pipe character. So there are double space in tua | mau tua | mau :

transportumum min kalo dari kota | tua  |  mau ke galeri nasional naik transjakarta jurusan apa ya?

My expected result is:

transportumum min kalo dari kota | tua | mau ke galeri nasional naik transjakarta jurusan apa ya?

What is the best way to solve this?

The \\D pattern matches any char other than a digit. You may use a word boundary here to make the symbols match only when inside a word:

r'\b([|:])\b'

See the regex demo

Note that you also may get rid of the (...) as you will need to replace the whole match. A backreference to the whole match is \\g<0> in Python.

See a Python demo :

import re
rx = r'\b[|:]\b'
s = "transportumum min kalo dari kota|tua | mau ke galeri nasional naik transjakarta jurusan apa ya?"
print(re.sub(rx, ' \g<0> ', s))
# => transportumum min kalo dari kota | tua | mau ke galeri nasional naik transjakarta jurusan apa ya?

You can simply use quantifiers here like `\\s*

* means 0 or more of the preceding expression

>>> text = "transportumum min kalo dari kota|tua | mau ke galeri nasional naik transjakarta jurusan apa ya?"
>>> re.sub(r'(\s*\|\s*)',' | ',text)
'transportumum min kalo dari kota | tua | mau ke galeri nasional naik transjakarta jurusan apa ya?'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM