简体   繁体   中英

How to clean up a string excluding certain characters

I want to clean up the below string but only get rid of the \\n , \\r and extra spaces but not the apostrophe (') and other characters like dash (-) and colon (:) .

Right now I am using this code but it gets rid of all special characters.

string = "\n\n\r\n            Scott Hibb's Amazing Whisky Grilled Baby Back Ribs\r\n                \n\n\n\n"
rx = re.compile('\W+')
string = rx.sub(' ', string).strip()
print(string)

How can i do this?

You can use filter() and strip() to remove \\n , \\t , \\r and extra whitespaces while preserving rest of the characters, something like this :

string = "\n\n\r\n       Scott Hibb's       Amazing    Whisky Grilled Baby Back Ribs\r\n                \n\n\n\n"
print(' '.join(filter(None, string.strip().split()))) 

This will result in :

Scott Hibb's Amazing Whisky Grilled Baby Back Ribs

The accepted answer is great but if you would like a slightly more general solution that allows you to specify the explicit set of characters that you still want to remove, add a lambda function to the filter, something like this.

>>> y = "\n\n\r\n       Scott Hibb's       Amazing    Whisky Grilled Baby Back Ribs\r\n                \n\n\n\n"
>>> ' '.join(filter(lambda x: x not in ['\n', '\r'], y).strip().split())
"Scott Hibb's Amazing Whisky Grilled Baby Back Ribs"

Please note that for your example, explicitly specifying the \\n and \\r in the lambda is overkill because strip() treats \\n and \\r as whitespace but if you wanted to remove other characters, then this a reasonable approach. For example this is how you would strip extra white space characters, remove the \\n and \\r , and remove all standard vowels (a, e, i, o, u).

>>> y = "\n\n\r\n       Scott Hibb's       Amazing    Whisky Grilled Baby Back Ribs\r\n                \n\n\n\n"
>>> ' '.join(filter(lambda x: x.lower() not in ['a', 'e', 'i', 'o', 'u', '\r'], y).strip().split())
"Sctt Hbb's mzng Whsky Grlld Bby Bck Rbs"

使用字符类,例如[abc]匹配a,b或c

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM