简体   繁体   中英

Python single character clean

I want to remove all one-character words from a text.

For example: I want to clean all bolded characters in text below. ( a , ? , d , * , etc.), retuning the cleaned text.

Lorem Ipsum is simply a dummy ? text | of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it d to make * type specimen book. It has survived not only five centuries, but also the leap into [ electronic typesetting, remaining essentially unchanged.

Using a regular expression:

re.sub(r'((?:^|(?<=\s))\S\s|\s\S(?:$|(?=\s)))', '', inputtext)

This removes any one non-whitespace character that is either at the start of the text or preceded by whitespace, followed by one whitespace character (which is removed too), or one whitespace character followed by one non-whitespace character that is either at the end of the text or followed by whitespace.

This makes sure the whitespace around the one character is properly removed too.

Demo:

>>> import re
>>> inputtext = '''\
... Lorem Ipsum is simply a dummy ? text | of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it d to make * type specimen book. It has survived not only five centuries, but also the leap into [ electronic typesetting, remaining essentially unchanged.
... '''
>>> re.sub(r'((?:^|(?<=\s))\S\s|\s\S(?:$|(?=\s)))', '', inputtext)
"Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took galley of type and scrambled it to make type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.\n"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM