I have a long string which contains various combinations of \\n, \\r, \\t and spaces in-between words and other characters.
I've tried ''.join(str.split())
in various ways to no success.
What is the correct Pythonic way here?
Would the solution be different for Python 3.x?
Ex. string:
ex_str = u'Word \n \t \r \n\n\n word2 word3 \r\r\r\r\nword4\n word5'
Desired output [new new-line = \\n]:
new_str = u'Word\nword2 word3\nword4\nword5'
Use a combination str.splitlines()
and splitting on all whitespace with str.split()
:
'\n'.join([' '.join(line.split()) for line in ex_str.splitlines() if line.strip()])
This treats each line separately, removes empty lines, and then collapses all whitespace per line into single spaces.
Provided the input is a Python 3 string, the same solution works across both Python versions.
Demo:
>>> ex_str = u'Word \n \t \r \n\n\n word2 word3 \r\r\r\r\nword4\n word5'
>>> '\n'.join([' '.join(line.split()) for line in ex_str.splitlines() if line.strip(' ')])
u'Word\nword2 word3\nword4\nword5'
To preserve tabs, you'd need to strip and split on just spaces and filter out empty strings:
'\n'.join([' '.join([s for s in line.split(' ') if s]) for line in ex_str.splitlines() if line.strip()])
Demo:
>>> '\n'.join([' '.join([s for s in line.split(' ') if s]) for line in ex_str.splitlines() if line.strip(' ')])
u'Word\n\t\nword2 word3\nword4\nword5'
Use simple regexps:
import re
new_str = re.sub(r'[^\S\n]+', ' ', re.sub(r'\s*[\n\t\r]\s*', '\n', ex_str))
Use a regex:
>>> s
u'Word \n \t \r \n\n\n word2 word3 \r\r\r\r\nword4\t word5'
>>> re.sub(r'[\n\r\t ]{2,}| {2,}', lambda x: '\n' if x.group().strip(' ') else ' ', s)
u'Word\nword2 word3\nword4\nword5'
>>>
Another solution using regex which replaces tabs with a space u'word1\\t\\tword2'
, or do you really want to add a line break here too?
import re
new_str = re.sub(r"[\n\ ]{2,}", "\n", re.sub(r"[\t\r\ ]+", " ", ex_str))
'\n'.join(str.split())
输出:
u'Word\nword2\nword3\nword4\nword5'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.