简体   繁体   English

Python RegEx删除新行(不应存在)

[英]Python RegEx remove new lines (that shouldn't be there)

I got some text extracted and wish to clean it up by RegEx. 我提取了一些文本,希望通过RegEx进行清理。

I have learned basic RegEx but not sure how to build this one: 我已经学习了基本的RegEx,但是不确定如何构建此RegEx:

str = '''
this is 
a line that has been cut.
This is a line that should start on a new line
'''

should be converted to this: 应该转换为:

str = '''
this is a line that has been cut.
This is a line that should start on a new line
'''

This r'\\w\\n\\w' seems to catch it, but not sure how to replace the new line with space and not touch the end and beginning of words 这个r'\\w\\n\\w'似乎抓住了它,但不确定如何用空格替换新行并且不触摸单词的结尾和开头

You can use this lookbehind regex for re.sub : 您可以将此正则表达式后面的代码用于re.sub

>>> str = '''
... this is
... a line that has been cut.
... This is a line that should start on a new line
... '''
>>> print re.sub(r'(?<!\.)\n', '', str)
this is a line that has been cut.
This is a line that should start on a new line
>>>

RegEx Demo 正则演示

(?<!\\.)\\n matches all line breaks that are not preceded by a dot. (?<!\\.)\\n匹配所有不带点号的换行符。

If you don't want a match based on presence of dot then use: 如果您不希望基于点的存在进行匹配,请使用:

re.sub(r'(?<=\w\s)\n', '', str)

RegEx Demo 2 RegEx演示2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM