简体   繁体   中英

Remove space delimited single characters

I have texts that look like this:

the quick brown fox 狐狸 m i c r o s o f t マ イ ク ロ ソ フ ト jumps over the lazy dog 跳過懶狗 best wishes : John Doe

What's a good regex (for python) that can remove the single-characters so that the output looks like this:

the quick brown fox 狐狸 jumps over the lazy dog 跳過懶狗 best wishes John Doe

I've tried some combinations of \s{1}\S{1}\s{1}\S{1} , but they inevitably end up removing more letters than I need.

A non-regex version might look like:

source_string = r"this is a string I created"

modified_string =' '.join([x for x in source_string.split() if len(x)>1])

print(modified_string)

You can replace the following with empty string:

(?<!\S)\S(?!\S).?

Match a non-space that has no non-spaces on either side of it (ie surrounded by spaces), plus the character after that (if any).

The reason why I used negative lookarounds is because it neatly handles the start/end of string case. We match the extra character that follows the \S to remove the space as well.

Regex101 Demo

Please try the below code using regex, where I am looking for at-least two occurrences of characters that can remove a single character problem.

s='the quick brown fox 狐狸 m i c r o s o f t マ イ ク ロ ソ フ ト jumps over the lazy dog 跳過懶狗 best wishes : John Doe'
output = re.findall('\w{2,}', s)
output = ' '.join([x for x in output])
print(output)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM