简体   繁体   中英

Regex Expression to remove multiple spaces but to keep tab space Python

I am new to regular expression and I have a document which needs to be preprocessed. The documents contains multiple/extra spaces in between which needs to be removed.I am using re.sub('\s+',' ',doc) but it is replacing the extra spaces by 1 space but at the same time it is removing tab space and also removing new lines which I don't intend to do. Can anyone please help?

Input VS Desired Output Note: There is a tab space between L: and line ahead

Here's one way:

re.sub(r'[^\S\n\t]+' ,' ',doc)

It is a negative character class ( [^..] ) matching spaces ( \S = non-spaces, and not non-space means space) except \n and \t .

But maybe it's enough to just do re.sub(r' +',' ',doc) if you don't have any special whitespace other than spaces.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM