I have texts that look like this:
the quick brown fox 狐狸 m i c r o s o f t マ イ ク ロ ソ フ ト jumps over the lazy dog 跳過懶狗 best wishes : John Doe
What's a good regex (for python) that can remove the single-characters so that the output looks like this:
the quick brown fox 狐狸 jumps over the lazy dog 跳過懶狗 best wishes John Doe
I've tried some combinations of \s{1}\S{1}\s{1}\S{1}
, but they inevitably end up removing more letters than I need.
A non-regex version might look like:
source_string = r"this is a string I created"
modified_string =' '.join([x for x in source_string.split() if len(x)>1])
print(modified_string)
You can replace the following with empty string:
(?<!\S)\S(?!\S).?
Match a non-space that has no non-spaces on either side of it (ie surrounded by spaces), plus the character after that (if any).
The reason why I used negative lookarounds is because it neatly handles the start/end of string case. We match the extra character that follows the \S
to remove the space as well.
Please try the below code using regex, where I am looking for at-least two occurrences of characters that can remove a single character problem.
s='the quick brown fox 狐狸 m i c r o s o f t マ イ ク ロ ソ フ ト jumps over the lazy dog 跳過懶狗 best wishes : John Doe'
output = re.findall('\w{2,}', s)
output = ' '.join([x for x in output])
print(output)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.