[英]regex match certain characters but not with a period at the beginning
我有一个有一些空格的字符串。 我想用句号替换它们,但不是用句号结束的句号。
例如。
text = "This is the oldest European-settled town in the continental " \
"U.S.\r\nExplore the town at your leisure\r\nUpgrade to add a " \
"scenic cruise aboard \r\n"
我试图通过使用正则表达式将其更改为下面。
text = "This is the oldest European-settled town in the continental " \
"U.S. Explore the town at your leisure. Upgrade to add" \
" a scenic cruise aboard."
我现在拥有的是:
new_text = re.sub("(( )?(\\n|\\r\\n)+)", ". ", text).strip()
但是,它没有照顾句子以句号结束。 我应该在这里使用一些外观以及如何使用?
提前致谢!!
你可以添加“。” 在正则表达式中: (( )?\\.?(\\\\n|\\\\r\\\\n)+)
。 如果有“。” 它也将被替换为“。”
好吧,我不确定你的意思是\\r\\n
是否是文字,所以...
文字:
>>> import re
>>> text = r"This is the oldest European-settled town in the continental U.S.\r\nExplore the town at your leisure\r\nUpgrade to add a scenic cruise aboard \r\n"
>>> result = re.sub(r'[ .]*(?:(?:\\r)?\\n)+', '. ', text).strip()
>>> print(result)
This is the oldest European-settled town in the continental U.S. Explore the town at your leisure. Upgrade to add a scenic cruise aboard.
ideone演示 。
不是文字的:
>>> import re
>>> text = "This is the oldest European-settled town in the continental U.S.\r\nExplore the town at your leisure\r\nUpgrade to add a scenic cruise aboard \r\n"
>>> result = re.sub(r'[ .]*(?:\r?\n)+', '. ', text).strip()
>>> print(result)
This is the oldest European-settled town in the continental U.S. Explore the town at your leisure. Upgrade to add a scenic cruise aboard.
我删除了一些不必要的组,并将其他组转换为非捕获组。
我也把(\\\\n|\\\\r\\\\n)+)
变成了一个稍微高效的形式(?:(?:\\\\r)?\\\\n)+)
如果您只是想摆脱新线路,请使用此功能
text = "This is the oldest European-settled town in the continental U.S.\r\nExplore the town at your leisure\r\nUpgrade to add a scenic cruise aboard \r\n"
text = text.replace('\r\n','')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.