[英]Python RegEx remove new lines (that shouldn't be there)
我提取了一些文本,希望通過RegEx進行清理。
我已經學習了基本的RegEx,但是不確定如何構建此RegEx:
str = '''
this is
a line that has been cut.
This is a line that should start on a new line
'''
應該轉換為:
str = '''
this is a line that has been cut.
This is a line that should start on a new line
'''
這個r'\\w\\n\\w'
似乎抓住了它,但不確定如何用空格替換新行並且不觸摸單詞的結尾和開頭
您可以將此正則表達式后面的代碼用於re.sub
:
>>> str = '''
... this is
... a line that has been cut.
... This is a line that should start on a new line
... '''
>>> print re.sub(r'(?<!\.)\n', '', str)
this is a line that has been cut.
This is a line that should start on a new line
>>>
(?<!\\.)\\n
匹配所有不帶點號的換行符。
如果您不希望基於點的存在進行匹配,請使用:
re.sub(r'(?<=\w\s)\n', '', str)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.