[英]Python RegEx remove new lines (that shouldn't be there)
我提取了一些文本,希望通过RegEx进行清理。
我已经学习了基本的RegEx,但是不确定如何构建此RegEx:
str = '''
this is
a line that has been cut.
This is a line that should start on a new line
'''
应该转换为:
str = '''
this is a line that has been cut.
This is a line that should start on a new line
'''
这个r'\\w\\n\\w'
似乎抓住了它,但不确定如何用空格替换新行并且不触摸单词的结尾和开头
您可以将此正则表达式后面的代码用于re.sub
:
>>> str = '''
... this is
... a line that has been cut.
... This is a line that should start on a new line
... '''
>>> print re.sub(r'(?<!\.)\n', '', str)
this is a line that has been cut.
This is a line that should start on a new line
>>>
(?<!\\.)\\n
匹配所有不带点号的换行符。
如果您不希望基于点的存在进行匹配,请使用:
re.sub(r'(?<=\w\s)\n', '', str)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.